Just a quick note that Karl Seguin has created a nice mini-Redis book, available for free download. Check his announcement here and grab the book. He’s also posted the source of the book to GitHub.
Tag Archives: redis
Updated Redis Cheat-sheet
I have finally had the chance to update my Redis cheat-sheet so that it has the latest commands, including the Hash, Multi/Exec, and Pub/Sub stuff. I’m calling this v2.0 of the cheat-sheet since there are lots of changes and it in theory puts it in-sync with Redis 2.0, though I’m sure there will be small changes to come.
I’ve also done the right thing and created a new GitHub repository for it, with the OmniGraffle file there (though not the fonts, yet, pending looking at them to see if that’s okay). So, enjoy, and let me know if I’ve missed/goofed anything, of course. The repo is here and the direct link to the PDF if that’s all you want is here.
Using redis_logger for application logging
I just pushed a new project to github, redis_logger. I decided to give this a go and see if it ended up as potentially useful as I thought it might, and I’m pretty pleased with its initial version, limited though it is.
The idea is that by installing the tiny gem you can add logging into Redis to your application, including the ability to group log entries together, and then browse the groups, including intersections between groups.
As an example: let’s say you do like I did, which is to add request logging to your Rails application. I added this to my application_controller.rb:
before_filter :log_request
def log_request
RedisLogger.debug({ "request" => request.url,
"remote_ip" => request.remote_ip,
"user_id" => current_user.id,
"username" => current_user.email
}, "requests")
end
This will create a log entry as a Hash in Redis, containing the key/value pairs that were passed in. A “timestamp” value is also automatically added for you. This entry will be added into two groups, “debug” and “requests” — the “requests” group is passed into the call as the optional second parameter.
I could also add a call in my controller to log a warning if a user tries to access a page and is redirected to the signin page, or an error if an exception is passed up. I could then view all of the log entries in the intersection of “error” and “requests” to see only errors logged in the “requests” group, and not the debug or warn messages.
One of the great things about this is that it was so easy to do using Redis, once I worked out the approach. Using Sets, the code stores the entries and then adds them to the log groups as sets. Using Hashes, each entry is stored with its key/value pairs intact. I’m thinking about things l ike adding some more standard keys, like the current “timestamp” key, and enabling some additional functionality like adding tags for additional searchability within the groups.
Having the log entries in Redis is great, but browsing them is the fun part. So, there’s also redis_logger-web, a simple Sinatra app that lists the groups and lets you view the log entries. You can click on a group name to see its entries, most recent to oldest, or you can select multiple groups and view the intersection of their entries. Right now that’s limited to just the most recent 100 entries, until I work out the best way to save temporary sets and clean them up in a cron. In reality, though, the most recent 100 entries is what’s generally useful. Adding export functionality is first on my list, because that will be extremely handy for analysis.
Scaling, testing, and adding the essential multi-threading are next, of course. Sound interesting? Please, grab it, fork it, enhance it, let me know what you think.
Redis cheat-sheet v1.0
I put together a cheat-sheet for Redis, so that I didn’t have to keep a browser tab open on the commands page all the time. This is v1.0 — please let me know any suggestions, ideas, critiques, etc. There are a number of exciting new commands coming (hashes! multi!) that I will add as they become available. I hope this proves useful for people.
Using Sunspot for Free-Text Search with Redis
After spending time to get some data into Redis (as documented in some of my previous posts here), I not surprisingly wanted to make the data searchable. After looking around at some of the full-text search solutions available for Ruby, I really liked the look of Sunspot. Well-presented, well-designed, and it even has decent documentation. It uses Solr underneath, which is a very respectable search engine, so that’s all good. Of course, it didn’t take me long to discover that the sunspot_rails plugin makes things drop-and-go when using ActiveRecord, but those of us branching off into alternatives have to put in more effort. Hence, I’ll document my findings here to hopefully make it easier for others.
I won’t bother going into the details of getting things set up, as the Sunspot wiki does a fine job of that. Suffice it to say that we install the gem (and the sunspot_rails gem if you’re going to have some ActiveRecord models as well), start the Solr server, and that’s about it. We’ve got Redis already going, right? So now it’s time to get our model indexed and searchable!
There are a few steps that we need to follow to make this happen. First, we put code in the model to tell Sunspot what fields should be indexed, which ones are just for ordering/filtering, and which ones should be stored if desired for quicker display:
class Book
require 'sunspot'
require 'sunspot_helper'
# Pretend some attributes like number, title, etc are defined here
Sunspot.setup(Book) do
text :number, :boost => 2.0
text :title, :boost => 2.0
text :excerpt
text :authors
string :title, :stored => true
string :number, :stored => true
date :publication_date
end
def save
book_key = "book:#{number}:data"
@redis[book_key] = json_data
@redis.set_add 'books', number
# Make searchable
Sunspot.index( self )
Sunspot.commit
end
def self.find_by_number(redis, number)
redis["book:#{number}:data"]
end
First, note that we need to require 'sunspot' to get access to the Sunspot class. This isn’t required for ActiveRecord models, but since we’re on our own, we have to specify that. Then, we call setup, passing the name of our model. In the code block, we specify a few text fields: the number, title, excerpt, and authors. Those fields will be indexed and searchable. Then we specify title and number again as strings, asking that they be stored for quicker retrieval. This is so we can display just that data without fetching the whole object, if we want — I won’t get into the details of doing that here because, well, fetching the objects in Redis is so fast that I found it didn’t matter. Last, the publication date is also listed, so we can filter and order by it if we want.
In our save() method, after we store a book in Redis, we tell Sunspot to index it, and commit the updated index. So far, so good. In theory, we should be able to create a Book, save it, and then search for it. Alas, if this were an ActiveRecord model we’d be pretty much done (and wouldn’t even have to do the index/commit part because those are automagically triggered on create and update). Unfortunately, we have some harder work ahead of us.
Sunspot uses what it calls “adapters” to tell it what to do when it wants to identify an object, and when it wants to fetch an object given an id. We have to provide the adapters for our model. To give credit where it’s due, this Linux Magazine article helped me figure out what to do, and then reading through the Sunspot adapter source code filled in the blanks. If you look back at our model, you’ll see that it requires ‘sunspot_helper’. That’s where we’ll put our adapters:
/app/helpers/sunspot_helper.rb:
require 'rubygems'
require 'sunspot'
module SunspotHelper
class InstanceAdapter < Sunspot::Adapters::InstanceAdapter
def id
@instance.number # return the book number as the id
end
end
class DataAccessor < Sunspot::Adapters::DataAccessor
def load( id )
Book.new(JSON.parse(Book.find_by_number(Redis.new, id)))
end
def load_all( ids )
redis = Redis.new
ids.map { |id| Book.new(JSON.parse(Book.find_by_number(redis, id))) }
end
end
end
So, what’s going on here? We provide two adapters for Sunspot: the InstanceAdapter, and the DataAccessor. The InstanceAdapter just provides a method that returns the ID of the object. Easy enough, we just return the book’s number, which is the unique identifier. The DataAccessor has to provide two methods, load() and load_all(), that take an id and a list of ids, respectively, and expect objects back. In my case, the objects are serialized JSON, so we just call our find_by_number() method to get each object, call JSON.parse() to get the Hash of data, and construct a new Book object. (Note: obviously this requires having an initializer that can take a Hash and create the object, which I’ll leave as an exercise) Now we just register our adapters, by adding a couple of lines of code right before the call to Sunspot.setup():
Sunspot::Adapters::InstanceAdapter.register(SunspotHelper::InstanceAdapter, Book) Sunspot::Adapters::DataAccessor.register(SunspotHelper::DataAccessor, Book)
Now we should be good to go, right? Okay, we construct a Book object, and call save…then search for it:
b = Book.new({ "number" => 8888888, "title" => "My test title"})
=> #<Book:blahblah...
b.save
=> nil
search = Sunspot.search(Book) { keywords 'test' }
=> <Sunspot::Search:{:rows=>1, blahblah…
r = search.results
=> [#<Book:blahblah...
r[0].title
=> "My test title"
And we’re good! Congratulations. So now we want to add the search capability to our controller, right?
# In a view, put in a search form. I have a little search image, so excuse the image_submit_tag:
<% form_for(:book, :url => { :action => "search" }) do |f| %>
<p>
<%= f.label "Search for:" %>
<input type="text" name="searchterm" id="searchterm" size="20">
<%= image_submit_tag('search.png', :width => '30', :alt => 'Search', :style => 'vertical-align:middle') %>
</p>
<% end %>
# Now in the controller. Note the pagination, which is why we store the search in the session,
# so we can grab it out again if they click forward/back through the pages.
def search
@search_term = params[:searchterm] || session[:searchterm]
if (@search_term)
session[:searchterm] = @search_term
end
page_number = params[:page] || 1
search = Sunspot.search(Book) do |query|
query.keywords @search_term
query.paginate :page => page_number, :per_page => 30
query.order_by :number, :asc
end
@books = search.results
end
# And then in our search view, display the results:
<ul>
<% @books.each do |book| %>
<li><%= book.number %>: <%= book.title %></li>
</ul>
<br />
Found: <%= @books.total_entries %> - <%= will_paginate @books %>
Yes, Sunspot is so cool that it integrates automatically with will_paginate. So, looking through the above, we have a form that posts to our action (assuming you set the routes up, which you did, yes?). The action then takes the searchterm parameter if it’s there, or extracts it from the session if it’s not there. Note that this is not robust code — if it’s called with no parm and nothing in the session, it will end up searching for an empty string, which will return every book. In any case, we store the search term in the session, so that when someone clicks through to page 2, we can re-run the search to get the second page. The more important code here, though, is the call to search.
I will give a thousand thanks to this blog post, specifically the fourth item! I was doing this:
search = Sunspot.search(Book) do
keywords @search_term
end
And it didn’t work — it was fetching every object, even though I knew that @search_term was getting set properly. As that blog post notes, though, the search is done in a new scope, so this didn’t work. The code I showed above, using the query argument, fixes that problem. It certainly took me a while to figure that out, though, because nothing is said about it anywhere in the examples in the Sunspot wiki.
So now you should be all set. Put “test” into the form, submit it, the controller will do the search, return the book, and your view will list it. You are searching! Not so bad, and the fetches from Redis are so fast that the whole thing really speeds along. Pretty simple free-text search against any objects that you put into Redis.
A Warning
I had one other hitch when I was working on this, which mysteriously went away. I hate that. So, in case someone else encounters here, I wanted to document the issue. When I got the adapters in place for the Book model, and tried to work with it, I got an error saying that there was no adapter registered for String. I was very puzzled, wondering if something about the fact that Redis was returning a JSON String was confusing Sunspot. So I made a quick change to the InstanceAdapter:
class InstanceAdapter < Sunspot::Adapters::InstanceAdapter
def id
if (@instance.class.to_s == "String")
@instance
else
@instance.number # return the book number as the id
end
end
end
And changed the register lines in my model:
Sunspot::Adapters::InstanceAdapter.register(SunspotHelper::InstanceAdapter, Book, String) Sunspot::Adapters::DataAccessor.register(SunspotHelper::DataAccessor, Book, String)
And that did the trick. I didn’t like it, and intended to try to figure out what was going on. But after getting all the rest of it working, when I put the code back to its pre-String-adapter state, the error didn’t return. Like I said, I hate that. Hopefully it was just due to something that I was unknowingly doing wrong which I fixed along the way, but…just in case, now the quick-fix is documented here for anyone else who runs into the problem.
Making a quick chart with RaphaelJS and Redis
This afternoon I wanted to add another quick report to the system I’m building, and it was so easy that I thought I’d share some of the details. As I’ve written here before, I’m using the RaphaelJS library for simple charts, and it makes it very simple to create bar charts and pie charts. So I’ve already got a couple of those, using common code as I described in that earlier posting.
When I wanted to create a new chart, then, I knew I could leverage that. First, though, I needed to get at my data. What I was getting was essentially a list of categories, and the number of items in each category. Since this is stored in Redis, the items are key-value entries, and each category is a Set to which items belong. In this particular case, each item belongs to only one set.
So for the sake of an example, let’s say that we have books, divided into categories. We’ll store the books with keys like book:#:title and each category set will be called cat:name:
- book:1234:title => “Technical Book”
- book:5678:title => “Another Tech Book”
- book:9012:title => “Gardening Book”
- cat:technical => 1234, 5678
- cat:gardening => 9012
So, for charting purposes, we want to get a list of the categories, and then for each one we want to fetch the number of books in it. In my reports_controller I have this:
def books_by_category
@chart = params[:chart] || "bar"
@chart_title = "Books by Category"
redis = Redis.new
# Get the list of the categories, which are the labels for the graph
keys = redis.keys("cat:*")
# Now let's iterate through the categories and get the counts
@labels = []
@values = []
keys.sort.each do |cat|
count = redis.set_count(cat)
@values << count
cat_name = cat.scan(/cat:(.*)/)[0][0]
if (@chart == 'pie') # Need to add counts to the labels for pie charts
cat_name << " (#{count})"
end
@labels << cat_name
end
end
First we get a Redis connection, and ask it for all of the category keys, using the pattern chosen: redis.keys("cat:*"). I want to stress something here: if you read the Redis docs (which of course you should, in depth) you’ll see that they say to never use this command in a production app! Obviously, if you have a lot of keys in the database, this is not a good command. In this particular case, I know that the database being used will not have too many keys, so I’m comfortable doing this — but be careful and make sure it’s okay for your case! If not, the solution is to create a new set that contains the names of all of the categories. Grab that set using SORT and work from there, which is simple. I also want to stress that, as with any reporting, if your data set grows (i.e. you start to have lots and lots of categories), you don’t want to run this frequently! Do the count occasionally and cache the results, create roll-up data from which to do your reporting, etc. This is a very simple case, but is a nice example of some tools.
Okay, so then we have the categories, and we iterate through them. For each, we get the count of entries using redis.set_count(cat). The redis-rb library aliases “set_count” to be the Redis command SCARD, which returns the cardinality of the set, i.e. the number of entries. We add that onto the @values array, and then create the category name by taking everything after “cat:” from the key. If we’re making a pie chart, we add the count to the labels, simply because I found that it’s very friendly that way. We add the category name to the labels array, and continue.
That’s pretty much it then! Using the previous reporting code, I just had to create a new view, which includes the partials I created before — the partials expect the @labels and @values arrays, so they’ll just graph whatever they get. Here’s the actual view:
<%= render :partial => "report_chart" %> <br/><br/> <h1>Reports : Books by Category</h1> <%= render :partial => "report_links" %> <br/><br/> <div id="holder"></div>
If you refer back to my earlier post about RaphaelJS graphing, the report_chart partial contains the Javascript to generate the chart. The report_links partial simply has code to create links to the various chart types for this data: pie, bar, and csv. The holder div is where the RaphaelJS Javascript will render the chart.
And that’s all there is to it. Thanks to the ease of Redis sets, getting the data sliced and diced as needed was extremely simple, and thanks to easy Javascript reporting from RaphaelJS, the plain old label/value charting couldn’t be much quicker.
Comparing MongoDB and Redis, Part 2
As discussed in Part 1, I reached a certain point with MongoDB, and decided that rather than fussing with things I’d move over to Redis and see how things went. The first thing was to change the model — rather than using the mongoid “:field” definitions, for Redis the model becomes a simple PORO (Plain Old Ruby Object). I chose to borrow a nice initialization technique from StackOverflow here so that I didn’t have to hand-code all of the attributes, but basically my initialize() method just sets the attributes and then creates a Redis connection via @redis = Redis.new. So the changes to the model were easy. The harder part was working out how the relationships between objects would work.
Rather than the document-style storage of MongoDB, Redis is purely based on key-value, but with more advanced data structures placed on the top. For my purposes, after some fantastic answers from Salvatore (author of Redis) and folks on the redis mailing list, I worked out how to use Sets to access the data in the ways I needed. So let’s say we have three books, ISBN numbers 123, 456, and 789. Book 123 references book 456, and book 789 references both 123 and 456. We have two authors, “Matsumoto,Yukihiro” who wrote 123 and 456, and “Flanagan,David” who wrote 456 and 789. How do we handle this in a key-value store? By using Sets:
- Create entries for each book, with key pattern “book::data”. The value is a JSON string of data like title, price, etc (see below for note on this).
- Create set called “books” which contains the number of every book.
- Create sets called “backrefs:” that contain the numbers of books referenced by book #.
- Create a set called “authors” which contains all of the authors.
- Create sets called “author:” that contains the numbers of books written by the author.
Using the set operations in Redis, then, I can display all of the books by using the “books” set; I can display all of the books by a given author by using the “author:” set; I can display all of the books referenced by a given book by using the “backrefs:” set. In the latter case, you might be thinking that I could just keep an array in the JSON string — and yes, that could work, but I wouldn’t be able to use some of the other interesting set operations, such as intersections to determine references for a given author, for example. Note that right now, since an author is just a name, there actually is no longer any Author model! If I add more meta-data about authors in the future, I can add that easily.
About that JSON string: this has advantages and disadvantages that I’m still considering. Some would say that every individual attribute (or “column” in RDBMS-speak) should be a separate key-value pair. In that approach, for example, if I have a book title and price, I’d have book:123:title => “The Ruby Programming Language” and book:123:price => “39.99″. Obviously I can then do things like add a book to sets like “Under $50″ by adding the price item to the set. The big advantage noted by some is that attributes can be added instantly by just saving a new key. Using a JSON string, adding an attribute requires reading/updating all of the existing keys. On the other hand, it is tidy to have a single key, and working with JSON is easy. For the time being, I’m giving it a try by using “book:123:data” to store the “data” about the book, and separating out certain attributes if it makes sense to use them in other data structures like sets and lists. Is this the best of both worlds or the worst of both? I’m not sure yet.
A quick note here before getting into the code: I did this using the redis-rb plugin, which has a lot of functionality but is definitely lacking in documentation. However, the code is extremely clear and easy to read through, so I strongly recommend reading through it, particularly the main lib/redis.rb file. Using it’s pretty much just a matter of installing the plugin and then calling Redis.new.
So, my save() method looks like this:
def save
book_key = "book:#{number}:data"
@redis[book_key] = json_data # creates JSON string
@redis.set_add "books", number # add to global books set
if (back_references)
back_references.each do |ref|
@redis.set_add "backrefs:#{ref}", number
end
end
if (authors) then
authors.each do |a|
a = CGI::escape(a)
@redis.set_add "authors", a # add to global authors set
@redis.set_add "author:#{a}", number
end
end
end
Improvements to be made here include handling the author names in a better way; doing a CGI::escape works, but a proper hash would be better. During prototyping, the escaping is nice because I can go in with the redis client and see human-readable names, but it makes the keys too long in my opinion.
So now the index() action in the Books controller looks like this:
def index
redis = Redis.new
@entries = redis.set_count 'books'
@pager = Paginator.new(@entries, 20) do |offset, per_page|
redis.sort('books', { :limit => [ offset, per_page ], : order => "alpha asc" })
end
@keys = @pager.page(params[:page])
@books = Hash.new
@keys.each do |k|
@books[k] = redis["book:#{k}:data"]
end
end
Here we get a redis connection, and use Paginator to do its thing — we have to get a count of the set, and then we use sort. This is a big part of the magic, and something that took me some time to work out. The sort command in redis (doc here) is the entry point to doing a lot of the interesting operations once you have things in a set. You’ll notice that in the save() method, all I do is add the book number to the set, not the actual key. That’s much more efficient (Redis is especially good with integers), and is enough. In the case above, all it does is call sort on the “books” set, with the “limit” and “order” options — “limit” as shown takes an offset and number of entries to return, which makes pagination a cinch. For “order” you’ll see that I use “alpha asc” which might seem confusing here since we’re dealing with numbers. In my actual use case the “numbers” can have alphanumerics, and I decided to leave this here because it’s a useful variant to see. In reality, the default for the sort command is ascending numeric so you wouldn’t need to even specify the option here.
Once the keys are retrieved, then I iterate on each one and get the actual data. This is very quick with Redis, but still not ideal. Redis supports an MGET command to retrieve multiple items in a single command, but it doesn’t return the keys, which would mean I’d have to data but not know which book number each one goes with. The redis-rb library provides a great mapped_mget() method, but at the moment it doesn’t support passing in an array. I would have to iterate each key and build a string of them. Presumably a fix can be made to accept an array, in which case this can all be collapsed down to a one-liner: @books = redis.mapped_mget(@keys). (By the way, in case you’re wondering why @keys is an instance variable, it’s because it contains Paginator metadata like page number, to display in my view).
Hopefully it’s pretty obvious that showing a book is pretty straightforward:
book_data = redis["book:#{@book_number}:data"]
if (book_data)
@book = JSON.parse(book_data)
end
Also simple, here’s the code to get the list of books which reference the current book — that is, the books that have the current book as one of their backward references:
begin
references = redis.sort("backrefs:#{number}")
rescue
return Array.new
end
That’s pretty easy, isn’t it? Obviously you can add in an “order” option and even a “limit” if necessary. More interesting, here we get the list of authors, with the list of books written by each:
alist = redis.sort("authors", { : order => "alpha asc" })
@authors = Hash.new
alist.each do |a|
@authors[CGI::unescape(i)] = redis.sort("author:#{a}")
end
First we do an initial call to sort to get the authors, sorted in ascending alphabetical order (note that this will be a little undependable given my current implementation since the names are CGI::escaped). Then we iterate each one and do a further sort to get each one’s books. This is fine, but it just returns the number of each book by the author — they key, not the value. Do we have to iterate yet again and do a third call to get the data for each book? Not at all, and this is one of the magic bits of the Redis sort command. If instead of the above sort call we can ask sort to return the values to us, instead of the keys. Using the redis client, the difference is like so:
$ ./redis-cli sort authors:Smith%3B+Bob limit 0 5
1. 123456789
2. 465768794
3. 344756635
4. 436485606
5. 347634767
$ ./redis-cli sort authors:Smith%3B+Bob limit 0 5 get book:*:data
1. {"title":"My Book","price":"19.99"}
…etc…
The second command, as you can see, adds a “get” option. This is a somewhat magic option that instructs Redis to get the values of the keys matching the pattern provided. So what happens, in a sense, is that Redis does the sort, and gets the keys. It then takes the keys and plugs them into the pattern, and does a get. So the first sort command is augmented with a “get 123456789″ and so on for the others, and the results are returned. This is all done on the Redis side, very quickly indeed. It is, clearly, extremely powerful. So if we change our code to get the data for the list of books, rather than just the keys:
alist = redis.sort("authors", { : order => "alpha asc" })
@authors = Hash.new
alist.each do |a|
books = Array.new
a_data = redis.sort("author:#{a}", { :get => "book:*:data" })
if (a_data)
a_data.each do |data|
books << (JSON.parse(data))
end
end
@authors[CGI::unescape(i)] = books
end
With this, my controller is passing @authors to the view, which is a Hash keyed off the unescaped author names. The value of each entry in the Hash is an Array of data (which is actually another Hash, created by the JSON.parse call). In the view, I can do something like this rather silly example:
<% @authors.keys.sort.each do |author| %>
<% books = @authors[author] %>
<tr class="<%= cycle("even", "odd") -%>">
<td><%= author %></td>
<td>
<% if (books.length > 0) -%>
<%= books.length %> :
<% books.each do |b| -%>
(<%= truncate(b["title"], :length => 25) %>) |
<% end -%>
<% else -%>
0
<% end -%>
</td>
</tr>
This page simply iterates through the authors, and for each one it displays the number of books they’ve written, and the first 25 characters of each title. If they didn’t write any books, it shows a zero.
There is one problem here, and it’s one that I’m working on a solution for: the “sort” with “get” is very cool, but it returns the value of each entry instead of the key. That means that in the above view, I have access to the book’s title, price, etc — but NOT the number! That’s because the number is embodied in the key. This is obviously a problem, since I need to display the book number. Right now, I’m working around this by storing the number in the JSONified data, but that’s not the right thing to do. Ideally, there would be a way to have the “sort get” return the key along with the data, though I’m not certain what that would look like. Alternately, the app can get the keys, and use them to do an MGET for the data. We’ll see.
In any case, we’re now able to display the books and the authors, approaching the objects from either direction to access the others. I’ll post more and/or update this post as I experiment further, but I hope this and the first part serve as a useful introduction to people interested in exploring MongoDB and Redis. For my purposes, I plan to continue forward with Redis rather than MongoDB, but as I’ve shown, they’re not at all the same thing — I can easily see cases where MongoDB might be a better fit. It’s clearly worthwhile to do quick prototyping to make sure you understand your problem set, and then see what the best tool is. One of the most exciting things about the so-called “NoSQL” data stores is that developers now have more tools to work with. If I get the time, I hope to play with Cassandra and Tokyo Cabinet to see how they might fit in. It’s always great to have more options in the tool box.
Comparing MongoDB and Redis, Part 1
For the new project I’m working on, after doing some initial very simple prototyping using MySQL (mainly because I could get from 0 to somewhere very quickly with ActiveScaffold and a few simple migrations), I started to look at alternate data stores. There are real reasons given the type of data being managed, but I have to admit that at least some of it was my desire to get a bit of hands-on experience with some of the new kids on the block, too. After exploring the alternatives, I settled on doing some prototyping with both MongoDB, and Redis. There are obviously others that are equally interesting, particularly Cassandra, but there simply isn’t time for everything! I selected Redis because I’d already done some playing with it, understood its basic concepts, and felt that its support for sets would be valuable for what I’m working on. I chose MongoDB as another option after doing some reading on it and finding it to be an interesting combination of key-value with some relational-style support. I also thought the mongoid was a nice bit of work that would be nice to use.
I want to note that I purposely did not call this “MongoDB vs Redis” — they’re different tools, and have different uses, which is one of the things I hope will be clear from these posts. This isn’t a competition, but just a summary of my experiments in looking at how I might approach my needs using the two.
The “problem” to be solved
I’m not at liberty to divulge the details of what I’m working on, so I have a sort of parallel-world simulation of the problem that replicates the types of issues I have to take care of. The idea, then, is to model a reference library, where we have Books and Authors. A Book can have multiple Authors, while an Author may have written multiple Books, so in a relational schema there would be a many-to-many relationship between them. In addition, a Book can contain references to other Books. We want to build a web app that will:
- Show all of the Books
- Show all of the Authors
- For a Book, show all of the Authors
- For a Book, show all of the Books that it references
- For a Book, show all of the Books that reference it
- For an Author, show all of the Books they’ve authored
MongoDB
As I mentioned above, I liked the look of the mongoid plugin to work with MongoDB, though I did do an initial pass using MongoMapper as well. I just felt that mongoid was a bit smoother, had more support for associations, and had somewhat more documentation, but they both did the job. Using Mongoid, my models looked something like this:
class Book
include Mongoid::Document
field :number
field :title
field :back_references, :type => Array
field :forward_references, :type => Array
index :number
has_many :authors
end
class Author
include Mongoid::Document
field :name
belongs_to :book, :inverse_of => :authors
end
As you can see, much like with ActiveRecord, you simply specify the fields you want persisted, and use a has_many/belongs_to pair to create an association. Do note that instead of extending a class as you would with AR, for mongoid you simply include Mongoid::Document. When I want to create a Book, it goes something like the following, assuming that I have the book number/title and an array of author names:
the_book = Book.new(
:number => book_number,
:title => book_title
)
authors.each do |a|
the_book.authors << Author.new(:name => a)
end
the_book.save
But what about the references, then? In the Book model above, I have two arrays, back_references (a list of books that reference this one) and forward_references (a list of books that are referenced by this one). Actually, all it takes for these is to create arrays containing the book numbers, assign them to the instance, and save. That’s one of the nice things about MongoDB, as we’ll see: you can query for items in embedded arrays.
A quick note here: I’ve glossed over the setup and configuration of MongoDB here, somewhat on purpose. Once you’ve installed it, if you’re using mongoid there are very clear instructions on setting up your Rails app to use the db so there’s not much need for me to repeat things here. Let’s just say we’re using a db called “books-development” which will then contain our collection, which is called “books”. Wait, shouldn’t we have another collection called “authors” since we have an Author model? Well, no, because the way we set up the has_many/belongs_to it means that Authors are embedded objects within Books. Let’s see what an entry looks like when we persist it. Running the mongo shell:
> db.books.find({number : "1234567890"});
{ "_id" : "4b58f90c69bef38f8f000720", "number" : "1234567890", "forward_references" : [
"6215628454",
"63107472345"
], "back_references" : [
"39848733434",
"51895763321",
"5216434662"
], "authors" : [
{
"_id" : "4b58f90569bef38f8f000091",
"name" : "Matsumoto,Yukihiro",
"_type" : "Author"
},
{
"_id" : "4b58f90569bef38f8f000092",
"name" : "Flanagan,David",
"_type" : "Author"
}
], "_type" : "Book", "title" : "The Ruby Programming Language" }
From this, you can see that Mongo has assigned “_id” values to each object, the references are both just arrays of book numbers, and the authors have become embedded objects with their own “_id” and “_type” (used by mongoid). As we’ll see in a bit, the fact that the authors are embedded objects is convenient for some purposes, but problematic for others due to the queries I needed to do. For now, though, let’s see what our queries look like for the various activities listed above.
# Inside books_controller.rb, index action to list the books
def index
@entries = Book.count
@pager = Paginator.new(@entries, 20) do |offset, per_page|
Book.criteria.skip(offset).limit(per_page).order_by([[:title, :asc]])
end
@books = @pager.page(params[:page])
end
# show action to display a single book's details
def show
@book = Book.find(:first, :conditions => { :number => params[:number] })
end
Pretty straightforward stuff, even when bringing Paginator into the picture. Being able to chain the criteria with mongoid is a nice bonus to using it. So when a single book is displayed, the page can show the list of author names by simply iterating the array:
<tr>
<td class="label">Authors</td>
<td class="show">
<% if (@book.authors)
@book.authors.each do |author| -%>
<%= author.name %> |
<% end -%>
<% else -%>
None
<% end -%>
</td>
</tr>
The backward references are exactly the same way. However, I discovered while writing the data entry scripts that the forward references (i.e. the books that reference the current book) were not available. No problem, I figured, instead of storing that I’ll just query it:
def referenced_by
Book.find(:all, :conditions => { :back_references => number })
end
There’s some nice MongoDB magic. Very simply, that will return any Book entry that contains “number” in its “back_references” attribute — even though that attribute is an array! That ability to query for contents of an array comes in very handy, needless to say. As an aside, I came across a reference that I sadly can’t find now to link to it, but it showed me how to add a super simple search. To make the books searchable, I just took the title and the author, did a split(), and created an array containing each word. I called that “search_words” and made it a new array-type attribute. The search is then a simple query:
def search_books(search_term)
Book.find(:all, :conditions => { :search_words => search_term })
end
This is obviously a very simplistic search, but given that it takes about 2 minutes to implement, who’s complaining?
The Author problem
So now we come to where I began to find problems with the approach. I wanted to display the list of all authors. Hmm, the authors are embedded documents within the books. Okay, it is possible:
def get_author_list
results = Books.criteria.only(:authors)
author_list = Hash.new
results.each do |book|
book.authors.each do |a|
if (!author_list.has_key?(a))
author_list[a] = Book.where(:authors => a)
end
end
end
return author_list
end
Pretty ugly, ain’t it? It queries all of the books and gets just the authors attribute, then iterates each book, then iterates the authors. For each one, it does a query to get the list of books (so our page can show each author followed by their books), and creates a Hash with key=author, value=books array. This obviously doesn’t do any pagination, which would make it even messier, plus the results aren’t sorted yet. Nope, I didn’t like it.
The alternative seems to be to make authors a first-level document, and link explicitly with book numbers, which isn’t horrible but means, again, multiple queries to get our list of authors with their books. This was beginning to look like it might be too relational a problem for MongoDB to make sense.
Update: as noted in the comment below by module0000, using distinct(“author”) solves this particular problem in a much cleaner way — thanks for the comment! I’ll still stand by the thought that this is really a relational problem and a document database has shortcomings in that regard (and of course strengths in other ways).
So, I set this aside, since as a prototype it did work. I made a new branch (thanks, git) and converted it to use Redis. Which I’ll cover in part 2, shortly.