For the new project I’m working on, after doing some initial very simple prototyping using MySQL (mainly because I could get from 0 to somewhere very quickly with ActiveScaffold and a few simple migrations), I started to look at alternate data stores. There are real reasons given the type of data being managed, but I have to admit that at least some of it was my desire to get a bit of hands-on experience with some of the new kids on the block, too. After exploring the alternatives, I settled on doing some prototyping with both MongoDB, and Redis. There are obviously others that are equally interesting, particularly Cassandra, but there simply isn’t time for everything! I selected Redis because I’d already done some playing with it, understood its basic concepts, and felt that its support for sets would be valuable for what I’m working on. I chose MongoDB as another option after doing some reading on it and finding it to be an interesting combination of key-value with some relational-style support. I also thought the mongoid was a nice bit of work that would be nice to use.
I want to note that I purposely did not call this “MongoDB vs Redis” — they’re different tools, and have different uses, which is one of the things I hope will be clear from these posts. This isn’t a competition, but just a summary of my experiments in looking at how I might approach my needs using the two.
The “problem” to be solved
I’m not at liberty to divulge the details of what I’m working on, so I have a sort of parallel-world simulation of the problem that replicates the types of issues I have to take care of. The idea, then, is to model a reference library, where we have Books and Authors. A Book can have multiple Authors, while an Author may have written multiple Books, so in a relational schema there would be a many-to-many relationship between them. In addition, a Book can contain references to other Books. We want to build a web app that will:
- Show all of the Books
- Show all of the Authors
- For a Book, show all of the Authors
- For a Book, show all of the Books that it references
- For a Book, show all of the Books that reference it
- For an Author, show all of the Books they’ve authored
MongoDB
As I mentioned above, I liked the look of the mongoid plugin to work with MongoDB, though I did do an initial pass using MongoMapper as well. I just felt that mongoid was a bit smoother, had more support for associations, and had somewhat more documentation, but they both did the job. Using Mongoid, my models looked something like this:
class Book
include Mongoid::Document
field :number
field :title
field :back_references, :type => Array
field :forward_references, :type => Array
index :number
has_many :authors
end
class Author
include Mongoid::Document
field :name
belongs_to :book, :inverse_of => :authors
end
As you can see, much like with ActiveRecord, you simply specify the fields you want persisted, and use a has_many/belongs_to pair to create an association. Do note that instead of extending a class as you would with AR, for mongoid you simply include Mongoid::Document. When I want to create a Book, it goes something like the following, assuming that I have the book number/title and an array of author names:
the_book = Book.new(
:number => book_number,
:title => book_title
)
authors.each do |a|
the_book.authors << Author.new(:name => a)
end
the_book.save
But what about the references, then? In the Book model above, I have two arrays, back_references (a list of books that reference this one) and forward_references (a list of books that are referenced by this one). Actually, all it takes for these is to create arrays containing the book numbers, assign them to the instance, and save. That’s one of the nice things about MongoDB, as we’ll see: you can query for items in embedded arrays.
A quick note here: I’ve glossed over the setup and configuration of MongoDB here, somewhat on purpose. Once you’ve installed it, if you’re using mongoid there are very clear instructions on setting up your Rails app to use the db so there’s not much need for me to repeat things here. Let’s just say we’re using a db called “books-development” which will then contain our collection, which is called “books”. Wait, shouldn’t we have another collection called “authors” since we have an Author model? Well, no, because the way we set up the has_many/belongs_to it means that Authors are embedded objects within Books. Let’s see what an entry looks like when we persist it. Running the mongo shell:
> db.books.find({number : "1234567890"});
{ "_id" : "4b58f90c69bef38f8f000720", "number" : "1234567890", "forward_references" : [
"6215628454",
"63107472345"
], "back_references" : [
"39848733434",
"51895763321",
"5216434662"
], "authors" : [
{
"_id" : "4b58f90569bef38f8f000091",
"name" : "Matsumoto,Yukihiro",
"_type" : "Author"
},
{
"_id" : "4b58f90569bef38f8f000092",
"name" : "Flanagan,David",
"_type" : "Author"
}
], "_type" : "Book", "title" : "The Ruby Programming Language" }
From this, you can see that Mongo has assigned “_id” values to each object, the references are both just arrays of book numbers, and the authors have become embedded objects with their own “_id” and “_type” (used by mongoid). As we’ll see in a bit, the fact that the authors are embedded objects is convenient for some purposes, but problematic for others due to the queries I needed to do. For now, though, let’s see what our queries look like for the various activities listed above.
# Inside books_controller.rb, index action to list the books
def index
@entries = Book.count
@pager = Paginator.new(@entries, 20) do |offset, per_page|
Book.criteria.skip(offset).limit(per_page).order_by([[:title, :asc]])
end
@books = @pager.page(params[:page])
end
# show action to display a single book's details
def show
@book = Book.find(:first, :conditions => { :number => params[:number] })
end
Pretty straightforward stuff, even when bringing Paginator into the picture. Being able to chain the criteria with mongoid is a nice bonus to using it. So when a single book is displayed, the page can show the list of author names by simply iterating the array:
<tr>
<td class="label">Authors</td>
<td class="show">
<% if (@book.authors)
@book.authors.each do |author| -%>
<%= author.name %> |
<% end -%>
<% else -%>
None
<% end -%>
</td>
</tr>
The backward references are exactly the same way. However, I discovered while writing the data entry scripts that the forward references (i.e. the books that reference the current book) were not available. No problem, I figured, instead of storing that I’ll just query it:
def referenced_by
Book.find(:all, :conditions => { :back_references => number })
end
There’s some nice MongoDB magic. Very simply, that will return any Book entry that contains “number” in its “back_references” attribute — even though that attribute is an array! That ability to query for contents of an array comes in very handy, needless to say. As an aside, I came across a reference that I sadly can’t find now to link to it, but it showed me how to add a super simple search. To make the books searchable, I just took the title and the author, did a split(), and created an array containing each word. I called that “search_words” and made it a new array-type attribute. The search is then a simple query:
def search_books(search_term)
Book.find(:all, :conditions => { :search_words => search_term })
end
This is obviously a very simplistic search, but given that it takes about 2 minutes to implement, who’s complaining?
The Author problem
So now we come to where I began to find problems with the approach. I wanted to display the list of all authors. Hmm, the authors are embedded documents within the books. Okay, it is possible:
def get_author_list
results = Books.criteria.only(:authors)
author_list = Hash.new
results.each do |book|
book.authors.each do |a|
if (!author_list.has_key?(a))
author_list[a] = Book.where(:authors => a)
end
end
end
return author_list
end
Pretty ugly, ain’t it? It queries all of the books and gets just the authors attribute, then iterates each book, then iterates the authors. For each one, it does a query to get the list of books (so our page can show each author followed by their books), and creates a Hash with key=author, value=books array. This obviously doesn’t do any pagination, which would make it even messier, plus the results aren’t sorted yet. Nope, I didn’t like it.
The alternative seems to be to make authors a first-level document, and link explicitly with book numbers, which isn’t horrible but means, again, multiple queries to get our list of authors with their books. This was beginning to look like it might be too relational a problem for MongoDB to make sense.
Update: as noted in the comment below by module0000, using distinct(“author”) solves this particular problem in a much cleaner way — thanks for the comment! I’ll still stand by the thought that this is really a relational problem and a document database has shortcomings in that regard (and of course strengths in other ways).
So, I set this aside, since as a prototype it did work. I made a new branch (thanks, git) and converted it to use Redis. Which I’ll cover in part 2, shortly.
37.774929
-122.419415