Comparing MongoDB and Redis, Part 1

For the new project I’m working on, after doing some initial very simple prototyping using MySQL (mainly because I could get from 0 to somewhere very quickly with ActiveScaffold and a few simple migrations), I started to look at alternate data stores. There are real reasons given the type of data being managed, but I have to admit that at least some of it was my desire to get a bit of hands-on experience with some of the new kids on the block, too. After exploring the alternatives, I settled on doing some prototyping with both MongoDB, and Redis. There are obviously others that are equally interesting, particularly Cassandra, but there simply isn’t time for everything! I selected Redis because I’d already done some playing with it, understood its basic concepts, and felt that its support for sets would be valuable for what I’m working on. I chose MongoDB as another option after doing some reading on it and finding it to be an interesting combination of key-value with some relational-style support. I also thought the mongoid was a nice bit of work that would be nice to use.

I want to note that I purposely did not call this “MongoDB vs Redis” — they’re different tools, and have different uses, which is one of the things I hope will be clear from these posts. This isn’t a competition, but just a summary of my experiments in looking at how I might approach my needs using the two.

The “problem” to be solved

I’m not at liberty to divulge the details of what I’m working on, so I have a sort of parallel-world simulation of the problem that replicates the types of issues I have to take care of. The idea, then, is to model a reference library, where we have Books and Authors. A Book can have multiple Authors, while an Author may have written multiple Books, so in a relational schema there would be a many-to-many relationship between them. In addition, a Book can contain references to other Books. We want to build a web app that will:

  • Show all of the Books
  • Show all of the Authors
  • For a Book, show all of the Authors
  • For a Book, show all of the Books that it references
  • For a Book, show all of the Books that reference it
  • For an Author, show all of the Books they’ve authored

MongoDB

As I mentioned above, I liked the look of the mongoid plugin to work with MongoDB, though I did do an initial pass using MongoMapper as well. I just felt that mongoid was a bit smoother, had more support for associations, and had somewhat more documentation, but they both did the job. Using Mongoid, my models looked something like this:

class Book
  include Mongoid::Document

  field :number
  field :title
  field :back_references, :type => Array
  field :forward_references, :type => Array
  index :number
  has_many :authors
end

class Author
    include Mongoid::Document

    field :name
    belongs_to :book, :inverse_of => :authors
end

As you can see, much like with ActiveRecord, you simply specify the fields you want persisted, and use a has_many/belongs_to pair to create an association. Do note that instead of extending a class as you would with AR, for mongoid you simply include Mongoid::Document. When I want to create a Book, it goes something like the following, assuming that I have the book number/title and an array of author names:

    the_book = Book.new(
                        :number => book_number,
                        :title => book_title
    )
    authors.each do |a|
      the_book.authors << Author.new(:name => a)
    end
    the_book.save

But what about the references, then? In the Book model above, I have two arrays, back_references (a list of books that reference this one) and forward_references (a list of books that are referenced by this one). Actually, all it takes for these is to create arrays containing the book numbers, assign them to the instance, and save. That’s one of the nice things about MongoDB, as we’ll see: you can query for items in embedded arrays.

A quick note here: I’ve glossed over the setup and configuration of MongoDB here, somewhat on purpose. Once you’ve installed it, if you’re using mongoid there are very clear instructions on setting up your Rails app to use the db so there’s not much need for me to repeat things here. Let’s just say we’re using a db called “books-development” which will then contain our collection, which is called “books”. Wait, shouldn’t we have another collection called “authors” since we have an Author model? Well, no, because the way we set up the has_many/belongs_to it means that Authors are embedded objects within Books. Let’s see what an entry looks like when we persist it. Running the mongo shell:

> db.books.find({number : "1234567890"});
{ "_id" : "4b58f90c69bef38f8f000720", "number" : "1234567890", "forward_references" : [
        "6215628454",
        "63107472345"
], "back_references" : [
        "39848733434",
        "51895763321",
        "5216434662"
], "authors" : [
        {
                "_id" : "4b58f90569bef38f8f000091",
                "name" : "Matsumoto,Yukihiro",
                "_type" : "Author"
        },
        {
                "_id" : "4b58f90569bef38f8f000092",
                "name" : "Flanagan,David",
                "_type" : "Author"
        }
],  "_type" : "Book", "title" : "The Ruby Programming Language" }

From this, you can see that Mongo has assigned “_id” values to each object, the references are both just arrays of book numbers, and the authors have become embedded objects with their own “_id” and “_type” (used by mongoid). As we’ll see in a bit, the fact that the authors are embedded objects is convenient for some purposes, but problematic for others due to the queries I needed to do. For now, though, let’s see what our queries look like for the various activities listed above.

  # Inside books_controller.rb, index action to list the books
  def index
    @entries = Book.count
    @pager = Paginator.new(@entries, 20) do |offset, per_page|
      Book.criteria.skip(offset).limit(per_page).order_by([[:title, :asc]])
    end
    @books = @pager.page(params[:page]) 
  end

  # show action to display a single book's details
  def show
    @book = Book.find(:first,  :conditions => { :number => params[:number] })
  end

Pretty straightforward stuff, even when bringing Paginator into the picture. Being able to chain the criteria with mongoid is a nice bonus to using it. So when a single book is displayed, the page can show the list of author names by simply iterating the array:

  <tr>
    <td class="label">Authors</td>
    <td class="show">
      <% if (@book.authors)
         @book.authors.each do |author| -%>
        <%= author.name %> |
      <% end -%>
      <% else -%>
        None
      <% end -%>
    </td>
  </tr>

The backward references are exactly the same way. However, I discovered while writing the data entry scripts that the forward references (i.e. the books that reference the current book) were not available. No problem, I figured, instead of storing that I’ll just query it:

  def referenced_by
    Book.find(:all, :conditions => { :back_references => number }) 
  end

There’s some nice MongoDB magic. Very simply, that will return any Book entry that contains “number” in its “back_references” attribute — even though that attribute is an array! That ability to query for contents of an array comes in very handy, needless to say. As an aside, I came across a reference that I sadly can’t find now to link to it, but it showed me how to add a super simple search. To make the books searchable, I just took the title and the author, did a split(), and created an array containing each word. I called that “search_words” and made it a new array-type attribute. The search is then a simple query:

  def search_books(search_term)
    Book.find(:all, :conditions => { :search_words => search_term }) 
  end

This is obviously a very simplistic search, but given that it takes about 2 minutes to implement, who’s complaining?

The Author problem

So now we come to where I began to find problems with the approach. I wanted to display the list of all authors. Hmm, the authors are embedded documents within the books. Okay, it is possible:

  def get_author_list
    results = Books.criteria.only(:authors)
    author_list = Hash.new
    results.each do |book|
      book.authors.each do |a|
        if (!author_list.has_key?(a))
          author_list[a] = Book.where(:authors => a)
        end
      end
    end
    return author_list
  end

Pretty ugly, ain’t it? It queries all of the books and gets just the authors attribute, then iterates each book, then iterates the authors. For each one, it does a query to get the list of books (so our page can show each author followed by their books), and creates a Hash with key=author, value=books array. This obviously doesn’t do any pagination, which would make it even messier, plus the results aren’t sorted yet. Nope, I didn’t like it.

The alternative seems to be to make authors a first-level document, and link explicitly with book numbers, which isn’t horrible but means, again, multiple queries to get our list of authors with their books. This was beginning to look like it might be too relational a problem for MongoDB to make sense.

Update: as noted in the comment below by module0000, using distinct(“author”) solves this particular problem in a much cleaner way — thanks for the comment! I’ll still stand by the thought that this is really a relational problem and a document database has shortcomings in that regard (and of course strengths in other ways).

So, I set this aside, since as a prototype it did work. I made a new branch (thanks, git) and converted it to use Redis. Which I’ll cover in part 2, shortly.

About these ads
15 comments
  1. If you have to work with authors individually, I don’t think it’s a huge issue to break them out into their own collection. Even in that scenario, which is common, there are still a lot of good reasons to use Mongo.

    Embedded documents, which Mongoid prefers, are often better when the related entity doesn’t need to be handled individually. That said, we have an open case (http://jira.mongodb.org/browse/SERVER-142) for treating embedded collections as first class collection.

    Really enjoyed the article. Thanks!

    • Thanks very much for checking out the article and commenting! I appreciate the feedback from someone with your Mongo experience. I may well get back to digging deeper into Mongo, breaking things out into their own collections more to see how that works out. Definitely, embedded documents wasn’t the way to go in this particular case, when “bidirectional” access to the entities is necessary.

  2. Awesome post! I like the flow and the code snippets. Plus you’re really bringing the key questions anyone would care about to the surface.

  3. Gavin Hughes said:

    Very useful. I think the Mongoid docs need to present the limitations (and perhaps workarounds) as well as the benefits of Mongoid. Right now it’s just benefits.

  4. i was just about to start with mongodb or/and redis… nice to know the strengths and limitations before you begin something… nice write up!

  5. Patrick said:

    You might like to try CIPl (cipl.codeplex.com) – it provides an abstraction layer over a number of different stores (including MongoDB, Cassandra and MySql) – it also has support for references allowing you to represent books as something with a reference to an author

  6. Your last problem of listing the authors uniquely was a 1 liner actually… You were looking for: db.books.distinct(“author”);
    That will return a unique list of the members of the author array.

    …Here is an example, the first 3 lines are a printout of the contents of my ‘books’ collection.
    { “_id” : ObjectId(“4f15cd56abbd5576dfb1786d”), “name” : “book #1″, “authors” : [ "bob", "mike", "hank" ] }
    { “_id” : ObjectId(“4f15cd5fabbd5576dfb1786e”), “name” : “book #2″, “authors” : [ "bob", "hank" ] }
    { “_id” : ObjectId(“4f15cd69abbd5576dfb1786f”), “name” : “book #3″, “authors” : [ "mike", "mark" ] }

    …Now call the distinct() function on your collection:
    > db.books.distinct(“authors”);
    [ "bob", "hank", "mike", "mark" ]

    There are the unique authors. Hope this helps.

    • I updated the post to reflect your comment, thanks very much for pointing it out to me, since I shouldn’t have missed that solution. I guess I was more trying to illustrate the non-relational aspects of documents, and didn’t choose the best example.

  7. databaseGuy said:

    To me, you just don’t get what mongoDB is, using fancy api is nice but you clearly should learn how to query it in javascript first.
    You tryed to find a bad point about relational aspect, but in fact there no such problem because you can solve it with the right architecture.
    However you didn’t removed or cross out your argument after module000 comment (you just commented it) knowing that people won’t probably read your article until the end, worst you write “I still stand by the thought that this is really a relational problem” without putting any new reasonning like “MY FIRST REASONNING WAS GOOD”. For me you’re just one of this horrible person that don’t really go to the bottom of things.

    • I’m not sure you quite read the post, since I think I was pretty clear that this was an early experiment. I got to where I felt I had a feel for things, and then I wanted to experiment with Redis as well, which is in part 2. The fact still remains that if you have relational-type data, you can’t really take full advantage of embedded documents without giving up some of the niceties of an RDBMS. I certainly “try to find” anything negative; MongoDB is pretty great for many systems, but wasn’t an ideal match for what I was trying to do at the time. One of the great things about all of the recent data stores is that each addresses different use cases. For the system I was working on, even after getting to know and absolutely love Redis, I had to face the fact that it was a relational problem.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: