Tag Archives: rails

I’ve recently had the chance and excuse to play with Elasticsearch, after reading good things about it. We’ve been using Solr with decent success, but it feels like whenever we try to do anything outside the normal index-and-search it’s more complicated than it should be. The basics are easy thanks to the terrific Sunspot gem, though. So when I had a small project to prototype that involved indexing PDFs as well as database records, I figured it was a good opportunity to try out Elasticsearch.

I quickly reached for the Tire gem, which is very similar to Sunspot if you’re using ActiveRecord. Where Sunspot has you include a “searchable” block, Tire adds a “mapping” block, but the idea is the same — that’s where you tell it what fields to index, and how to do it. For each field you can adjust the data type, boost, and more. You can also tack on a “settings” block to adjust things like the analyzers.

The documentation for Tire is pretty good, but I found that I made a number of mistakes trying to adapt the instructions on the Elasticsearch site to the Tire way of doing things, so I thought I’d write up some of the things I learned in hopes that it can help save time for others. Many thanks to the folks on StackOverflow who answered my questions and pointed me in the right direction.

One starter suggestion is to configure Tire’s debugger, which is really convenient because it will output the request being sent to the ES server as a curl command that you can copy and paste into a terminal for testing. Very handy. I added this to my config/environments/development.rb file:

  Tire.configure do
    logger STDERR, :level => 'debug'
  end

Now on to the model. I’ll call mine Publication, so inside app/models/publication.rb:

class Publication < ActiveRecord::Base
  include Tire::Model::Search
  include Tire::Model::Callbacks

  attr_accessible :title, :isbn, :authors, :abstract, :pub_date

  settings :analysis => {
    :filter  => {
      :ngram_filter => {
        :type => "nGram",
        :min_gram => 2,
        :max_gram => 12
      }
    },
    :analyzer => {
      :index_ngram_analyzer => {
        :type  => "custom",
        :tokenizer  => "standard",
        :filter  => ["lowercase", "ngram_filter"]
      },
      :search_ngram_analyzer => {
        :type  => "custom",
        :tokenizer  => "standard",
        :filter  => ["standard", "lowercase", "ngram_filter"]
      }
    }
  } do
    mapping :_source => { :excludes => ['attachment'] } do
      indexes :id, :type => 'integer'
      indexes :isbn
      [:title, :abstract].each do |attribute|
        indexes attribute, :type => 'string', :index_analyzer => 'index_ngram_analyzer', :search_analyzer => 'search_ngram_analyzer'
      end
      indexes :authors
      indexes :pub_date, :type => 'date'
      indexes :attachment, :type => 'attachment'
    end
  end

  def to_indexed_json
    to_json(:methods => [:attachment])
  end

  def attachment
    if isbn.present?
      path_to_pdf = "/Users/foobar/Documents/docs/#{isbn}.pdf"
      Base64.encode64(open(path_to_pdf) { |pdf| pdf.read })
    end
  end
end

Okay, that’s a lot of code, so let’s look things over bit by bit. Naturally the includes at the top are needed to mix-in the Tire methods. There are two includes so that you can include the calls needed for searching without the callbacks if you don’t need them. The callbacks, though, are what make things work auto-magically when you persist ActiveRecord objects. With those, whenever you call save() on an AR model, the object will be indexed into ES for you.

Next up are two blocks of code: settings, and mapping. The settings block defines a filter, and two analyzers, one for indexing and one for searching. I can’t claim to be enough of an expert yet to fully explain the ramifications of the filter/analyzer options, so rather than risk confusion I’ll just note that this code is there to set up the nGram filter and connect it with two analyzers, index and search, which differ slightly in order to ensure that the standard filter is included for searching. You may want to play with the nGram’s min and max settings to get the matching behavior you want. Note that if you don’t need the nGram filter, you can remove the settings block and let the mapping block stand on its own, in which case the default settings will be used (but you’ll have to change the mapping entry for the :title and :abstract fields, as described below).

The mapping block is the more interesting one, as it defines the fields and indexing behavior. The first line took me some searching and StackOverflow questioning to figure out. The issue that by default, Elasticsearch will put all of the fields you index into its _source storage. Because I’m indexing large PDF documents, the result was that a huge Base64-encoded field was being stored. If I wanted to serve the PDFs out of Elasticsearch that might be okay, but that’s not the plan. The :excludes instruction prevents the attachment field from being stored.

Next are the fields themselves, and I won’t spend much time on these because the Tire documentation does a fine job of explaining these. The only interesting items are the :attachment field and the entry for :title and :abstract — that one specifies that for those fields the custom analyzers defined in the settings block should be used. For :attachment it gets a little bit tricky.

When the indexing is performed, the fields themselves are gathered up by calling the method to_indexed_json(). Normally that will just do a to_json() on your model and then collect the fields. But you can also override it, which we do here. You can see that we add in the method attachment(), which is defined below. So the other fields will be JSONized as normal, as well as the output of the attachment() method. The attachment() method itself uses the ISBN number to open the PDF file, which is read and Base64-encoded. The results of that encoding will be included with the other fields and sent to ES for indexing.

Performing the searching is almost too easy, but there was one bit that threw me off initially, which was getting highlighting to work. The search block in my controller looks like this:

results = Publication.search do
  query { string query_string }
  sort { by :pub_date, 'desc' }
  highlight :title, :options => { :tag => "<strong class='highlight'>" }
end

I was trying to test the highlighting and was thrown off by the field names being case-sensitive (see my question on StackOverflow), but this now works. The other key is that the highlighted fields are returned separately from the plain fields, which was odd to see. This means that to display the highlighting I have to check for the field:

results.each do |r|
  r_title = (r.highlight.nil? ? r.title : r.highlight.title[0])
  puts "Title: #{r_title}"
end

If the highlighting is present then it’s used; if not (because the term isn’t present in that field) then the regular field is used. The other handy thing to note is that you can specify the tag with which to wrap the term. The default is “<em>” but I wanted to specify the “highlight” CSS class, as is shown here. This is a really convenient feature.

That covers the basics, though it’s also probably worth sharing just how nice it is to be able to test using curl. For example, I wanted to check how easy it is to have the search call return just a single field (to speed up certain requests), so I tried it first in curl:

curl -XPOST http://localhost:9200/publications/_search\?pretty\=true -d '{
"query": {"query_string": {"query": "Foobar"}},
"fields": ["title"]
}'

That’s of course assuming that ES is running on your local system on port 9200; if not, adjust accordingly.

There you go. I hope this writeup is helpful to folks getting started and it saves you some time.

Accessing Salesforce data from Ruby/Rails

March 28, 2012

Uncategorized

8 Comments

I’ve done some work integrating Rails apps with Salesforce over the past few years, and have been very happy to see the new databasedotcom gem take the place of the community’s older activesalesforce gem. Thanks to work from Heroku and Pivotal Labs, it’s now very easy to push and pull data between a Rails app and your Salesforce organization.

I wrote up an article about using the gem, which is now available on the DeveloperForce site. You can go and check it out at http://wiki.developerforce.com/page/Accessing_Salesforce_Data_From_Ruby. I hope it helps get your started if you’re finding a need to do this sort of work. So many companies rely on Salesforce now for at least their sales pipeline that it can be extremely useful to do things like extract data to show in an internal Rails dashboard, or do more complex reporting. In our case we’re also sourcing data from external places and pushing it into our Salesforce organization so that our sales/support folks can have easy access to it within their Salesforce world.

Devise and OmniAuth for Single Sign On

December 31, 2010

Uncategorized

Getting user info into a Radiant page

December 31, 2010

Uncategorized

1 Comment

This took me a while so I thought I should share the solution — however, see the caveat at the end, because there’s an element I haven’t tested yet.

The first requirement here is integrating Devise into Radiant. For the most part, the information at this page will get you there, though I’ll work on a separate post going through the process in detail. Once you have Devise working, then you have a user object, and naturally you’d like to display a “Logged in as…” element in your Radiant layout, right? Not so easy, it turns out.

In my testing I called the Devise model PortalUser since it has to be differentiated from the User model that Radiant uses. I put the authentication stuff into a custom extension, which we’ll call “my_auth”. So, I end up with my_auth_extension.rb:

class MyAuthExtension < Radiant::Extension
  
  SiteController.class_eval do
    include ContentManagement
    prepend_before_filter {|controller| controller.instance_eval {Thread.current[:current_portal_user] = current_portal_user} }
    prepend_before_filter {|controller| controller.instance_eval {authenticate_portal_user! if radiant_page_request?}}
  end

  # activate() method left out for brevity
end

The filter to call authenticate_portal_user! is needed to get Devise working. The other filter is the important one here, and what it does is get the current_portal_user reference in the controller and place it into the current thread for later access. This is the only way I’ve found (so far) to get something from a controller in Radiant to a tag. I’ve tried various instance variable tricks, all sorts of things, with no luck. If anyone has another solution, please do comment below, because yes, this seems like a hack.

Now we go create a new tag to display the logged-in user’s email address. In our extension we have lib/user_tags.rb:

module UserTags
  include Radiant::Taggable

  desc "Outputs the user email address"
  tag "user_email" do |tag|
    current_user = Thread.current[:current_portal_user]
    @user_email = current_user.email
    parse_template 'usertags/_email_template'
  end

  private

    def parse_template(filename)
      require 'erb'
      template = ''
      File.open("#{MyAuthExtension.root}/app/views/" + filename + '.html.erb', 'r') { |f|
        template = f.read
      }
      ERB.new(template).result(binding)
    end
end

First, let me give credit for the parse_template() method to Chris Parrish in this post. This tag simply gets the user object from the thread, and sets @user_email accordingly, which can then be used by the ERB template. parse_template() grabs the partial using the filename passed in, and renders it, which ends up being output by the tag. The partial, which lives in your extension as app/views/usertags/_email_template.html.erb, is simply:

<%= @user_email %>

So there’s nothing to that, really. If you modify your Radiant layout to include Logged in as: <r:user_email /> then you should be all set.

At the beginning I mentioned a caveat. I have not tested this yet to see what the effects of Radiant’s caching are — I am assuming that the tag contents will not be cached and thus all is well, but we will see. I’ve been bitten by the caching before in unexpected ways.

Anyway, I hope this helps someone out.

Rails and forms using accepts_nested_attributes_for

July 23, 2010

Uncategorized

5 Comments

I recently had to put together a particularly ugly web form, with dynamically-expanding multi-nested elements, and came up against some rather odd behavior from Rails’ nested forms, using Rails 2.3.8. First off, I have to give thanks to Railscasts for saving me a bunch of time creating the dynamic portion of the nested form — see that episode and the following one for a great solution which got me started. Secondly, note that in Rails 2.3.5, some nested forms behavior was simply broken, and when I upgraded to 2.3.8 it fixed a number of small issues.

Unfortunately, for the form I was working on I had to nest two levels deep, which complicated things. The relationship was something like this: the form was created to update an event, which can involve multiple companies. For each involved company, there is additional metadata. So there are three models involved: Event, CompanyEvent, and Company. CompanyEvent is more than just a join model, since it contains metadata about the relationship. In theory, the nested-nested models weren’t a problem, simply by putting the proper directive in each model:

class Event < ActiveRecord::Base

  has_many :company_events
  accepts_nested_attributes_for :company_events, :allow_destroy => true
end

class CompanyEvent < ActiveRecord::Base
  belongs_to :company
  accepts_nested_attributes_for :company
  belongs_to :event
end

class Company < ActiveRecord::Base
  has_many :company_events
  has_many :events, :through => :company_events
end

Thanks to the accepts_nested_attributes_for directives, the form for an event can easily incorporate entries for company events, which in turn incorporate companies. As an example:

<% form_for(@event, :url => event_path(@event)) do |f| %>
  <%= f.text_field :event_title %>
  <% f.fields_for :company_events do |builder| %>
      <%= builder.text_field :company_role %>
      <% builder.fields_for(:company) do |company_form| -%>
          <%= company_form.text_field :name %>
      <% end -%>
  <% end -%>
<% end -%>

The above is a slimmed-down form, of course, but serves to demonstrate the nested form approach. The event form is the outer one, which contains fields_for the company_events model — each current instance of a company_event associated with the event will be rendered with its company_role (for example, what role the company plays at the event in question). Within that nested form, another fields_for is included for the company, which will pull in the company name as a field.

As shown in the Railscast linked above, you can also include a field to mark a nested entry as deleted, such as . Check out the Railscast for a full demonstration, since there’s no need to repeat it here.

The issue I encountered, though, was mysterious: when I tried to change the name of a company in the nested form, I got an error: “Couldn’t find Company with ID=12345 for CompanyEvent with ID=6789”. This didn’t make much sense, since obviously there wouldn’t be a matching entry, because I was changing the company and thus the company id would have also changed! It was a mystery why the code would be trying to find a matching row using both ids. I actually had to go into the code for nested_attributes.rb and look into the assign_nested_attributes_for_one_to_one_association method to see what was going on. The key to it seemed to be the use of the :update_only option on the accepts_nested_attributes_for directive. I added that to the CompanyEvent model:

class CompanyEvent < ActiveRecord::Base
  belongs_to :company
  accepts_nested_attributes_for :company, :update_only => true
  belongs_to :event
end

And suddenly it worked. The slim documentation for the :update_only option wasn’t very helpful (see http://api.rubyonrails.org/classes/ActiveRecord/NestedAttributes/ClassMethods.html), as it says that “an existing record may only be updated” and “A new record may only be created when there is no existing record.” Which seems rather obvious, but almost implies that a record can’t be deleted, since it’s “update only”. Otherwise, why on earth would you not want an existing record to be updated? Perhaps this should be the default behavior, though I haven’t tested to figure out what the alternative really means. I need to look at Rails 3 and see what’s changed about the nested forms behavior, and perhaps this is mapped out more clearly there.

In any case, perhaps this will save someone else some pain, since it took me some time to work out what was going on. And I realize that I’ve skimmed over a lot of details of how to do nested forms, since this isn’t intended to be a how-to but more of a watch-out post. If anyone thinks that a general nested-forms how-to post would be useful, let me know and I can put one together.

Displaying Ruby and environment details

May 29, 2010

Uncategorized

AJAX-driven in-place add/delete with nice-looking confirmations

February 20, 2010

Uncategorized

6 Comments

I wanted to build a page with nice in-place add/delete of some items, and I wanted something nicer-looking than the usual Javascript alert popup for confirmations. Here’s what I came up with; this is in a Rails-powered app using (of course) JQuery. I’ve used Impromptu in the past as a powerful JQuery-based popup library, and I used it again in this case. I won’t go through the setup steps for it, since the site provides a good walk-through.

I started with a basic table showing the items; for the sake of this example, let’s say these are books in my library, displayed in a table. I want to remove books and add books via AJAX, and update the table dynamically.

<table id="books" width="90%" class="booklist">
  <tr>
    <th>Book #</th>
    <th>Title</th>
    <th>Remove</th>
  </tr>
  <% @library.books.each do |book| -%>
    <tr id="<%= book.number %>" class="<%= cycle("even", "odd") -%>">
      <td><%= book.number %></td>
      <td><%= book.title %></td>
      <td align="center">
        <a href="#" onClick="removeBook('<%= book.number %>');">
        <%= image_tag('delete.png', :width => '20') %>
        </a>
      </td>
    </tr>
  <% end -%>
</table>

<form id="add_form" action="#">
  <input type="text" id="newbook" size="20" />
  <%= image_tag 'button-add.png', :id => 'add_button', :width => '20', :style => 'vertical-align:middle' %>
</form>

Pretty straightforward: After the header row, we loop through the library, and for each book we output a row. Note that we use the book’s number (pretend with me that it’s something like the ISBN that has no spaces or weird characters) as the id of each row — this number is also passed in to the removeBook() function when someone clicks the delete button, which allows us to dynamically remove the table row.

After the table, we have a little form for adding a new book — this assumes that we already have a database of books, and we’re just going to look up the new book by number, and add it to our library. All we have here is a button image, to which we will attach a JQuery click action. It would be nice to use the JQuery click action on the remove button as well, but it complicates things when it comes to determining which row was clicked on (yes, we could put the number in as an id on the image and look it up, which I’ll probably look at doing later).

So, when someone clicks on the remove button, what happens? The removeBook() function is called:

  function removeBook(number) {
    $.prompt('Are you sure you want to remove book #' + number +
        ' from the library?<input type="hidden" name="num" id="num" value="' + number + '"/>',{
      callback: removeCallback,
      buttons: { Yes: 'Remove', No: 'Cancel' }
    });
  }

  function removeCallback(button, msg, formvals) {
      if (button == 'Remove') {
        // Call server to remove the book
        $.ajax({
            type: "PUT",
            url: "/libraries/<%= @library.id %>.js",
            data: ({remove : 1, book_num : formvals.num}),
            success: function(msg){
                // Remove the row from the table
                $('#' + formvals.num).remove();
            },
            error: function(response, textStatus, errorThrown){
                var msg = (response.status == 403) ? response.responseText : "unknown error.";
                $.prompt('Error removing book from library: ' + msg);
            }
        });
      }
  }

The removeBook() function uses the Impromptu library to display a message via $.prompt(), specifying the removeCallback() callback function and two buttons. If the user clicks the “No” button, the callback won’t do anything. If they click the “Yes” button, then the callback will make a JQuery AJAX call to the server.

The Impromptu callbacks receive three parameters: the button clicked, the message if any, and the values of any form fields, if any. We were tricky and placed a hidden form field into the popup in the removeBook() function, called “num”. Note that Impromptu uses the NAME of the field, not the ID, for looking up the values; in this case both are “num” to make life easy. We then check the button value to see if it’s “Remove”, and we get the book’s number from the form using formvals.num.

Being somewhat RESTful, we do a PUT to the libraries controller with the ID of the library we’re changing. Note that we specify “.js” for the call, so that our controller will respond appropriately, since this is calling the update() action. As the AJAX call’s data, we pass in “remove” as a flag, and “book_num” to tell it which book to remove. If everything goes well, the success function is called. All it does is to remove the row from the table, using the fact (as noted earlier) that the row id is the book number.

The relevant lines from the controller’s update() method:

  # Have we been called to remove a book?
  if (params[:remove])
    success = @library.remove_book(params[:book_num])
  elsif (params[:add])
    # We've been called to add a book
    success = @library.add_book(params[:book_num])
    if (success)
      # Now let's look up the book so we can get the title for display
      book = Book.find(params[:book_num])
      if (book)
        result = { "number", book.number, "title", book.title}
      else
        # It's a book we don't have info about, but that's okay
        result = { "number", params[:book_num], "title", "Unknown"}
      end
    end
  end

  . . .  # Do other work as needed

respond_to do |format|
  if (success)
    flash[:notice] = 'Library was successfully updated.'
    format.html { redirect_to(@library) }
    format.xml  { head : ok }
    format.js   { render :text => "OK" }
    format.json { render :json => result }
  else
    format.html { render :action => "edit" }
    format.xml  { render : xml => @library.errors, :status => :unprocessable_entity }
    format.js   { render :text => "Failed to save update", :status => 403 }
    format.json { render :json => { "msg", "Failed to save update" }, :status => 403 }
  end
end

(Note that in lines 25 and 30 I had to add a space after the colons, because WordPress is stupid and for some reason thinks that I want damn smilies inside a sourcecode block. Uh, okay, sure.)

This is treating a book removal as a specialized variation of an update; an argument could certainly be made that instead it should be a new action. My jury’s still out, but there are things I like about doing it this way, including keeping the routes simple and encapsulating all library update-related activities in one place. Regardless, if the removal goes well, we return an “OK” — if not, we return a failure message, via the format.js within the respond block. Notice that in the Javascript above, the error-handler will display response.responseText if the response code is 403, and here we set the status to 403 if an application error occurred during the update. In that case the text we return will be displayed.

While you’re looking at this, check out the JSON responses, because we’ll be using those for adding a new book to the library, below.

That’s it. When a user clicks on the Remove button, it will invoke the removeBook() function, which will display a nice-looking confirmation popover using Impromptu. If the user confirms, then the callback function will make the AJAX call. The controller will remove the book from the library, and return a text “OK” if all goes well. The success function will then remove the associated row from the table, and we’re done!

On to adding a new book, which is trickier, though it uses much the same ideas of course. A user enters a book number into the form and clicks the button. What makes something happen then? The JQuery click function we’ve associated with it. Here’s the Javascript:

  $('#add_button').click(function() {
      var add_number = $('#newbook').val();
      if ((!add_number) || (add_number == '')) {
          return false;
      }
      // Call server to add the book to the library
      $.ajax({
          type: "PUT",
          url: "/libraries/<%= @library.id %>.json",
          data: ({add : 1, book_num : add_number}),
          success: function(data){
              // Add a new row to the table
              addTableRow('#books', data['number'], data['title']);
              $('#newbook').val('');   // Clear the input field
          },
          error: function(response, textStatus, errorThrown){
              var msg = (response.status == 403) ? response.responseText : "unknown error.";
              $.prompt('Error adding book to library: ' + msg);
          }
      });
  });

  function addTableRow(book_table, book_number, book_title){
      var row_class = $('tr:last', book_table).attr("class");
      var new_class = (row_class == 'odd' ? 'even' : 'odd');
      var tds = '<tr id="' + book_number + '" class="' + new_class + '">';
      tds += '<td>' + book_number + '</td>';
      tds += '<td>' + book_title + '</td>';
      tds += '<td align="center">' +
        '<a href="#" onClick="removeBook(\'' + book_number +
        '\');"><img src="/images/delete.png" width="20"></a></td></tr>';
      if($('tbody', book_table).length > 0){
          $('tbody', book_table).append(tds);
      }else {
          $(book_table).append(tds);
      }
  }

Whew, that’s a fair amount of Javascript; but it’s pretty straightforward, nonetheless. The first block is of course the click function that we attach to the button. When a user enters a number into the text field and clicks the button, this function will be called, and the first thing it does is to grab the value from the text field. And excuse me while I take a brief moment to rant in a minor way about the fact that the function is called val() instead of value(). Really, are two more letters too much to ask in exchange for better clarity? Every single time I have to write something like this I start with value() and waste a few minutes reminding myself that it’s shortened for no apparent reason. Okay, rant over, sorry.

We do a quick check to make sure that there’s actually something in the text field and return if not. Otherwise, we make our AJAX call, to the same URL as the remove function but with parameters “add” and “book_num”. If you go back now and look at the controller code again, you’ll see where it checks for params[:add] and, if it’s there, it adds the book to the library. After that it does a quick query to grab the book, so that it can return the title. It actually makes a Hash called “result” with the number and the title, which is then used in the render :json => result line. Within render, it will actually JSONize the Hash and send it back to the caller.

Just to prevent any confusion, I’ll take a second here to note that in most cases you probably won’t need to do the Book.find call that’s shown here, because you’ll likely be using ActiveRecord and you might be able to look in the library instance or use some other workaround to have the book available. In my case (remembering that this is a ‘cleansed’ version of my real work, which isn’t about books and libraries at all), I’m using Redis as the data store so adding a book to a library is actually a matter of adding its key to a set, so I don’t have the actual object until I do the find — which is super-fast anyway.

Okay, so if all goes well, the success function is called, which needs to add the new row to the table (you can see that it also clears the value of the text field so it’s empty again). To add the row, it calls the addTableRow() function, passing in the table, book number, and title. To give credit where it’s due, this function is a stripped-down version of the one shown in this blog post, which was quite helpful. This version does some things specifically for this purpose, and isn’t as generic as the original.

The first two lines determine what the class should be for the row, because the table alternates ‘even’ and ‘odd’ in order to have a new striped appearance. Here we grab the class of the last row, and then set our new_class to be whatever the current last row isn’t. That is, if the last row is ‘even’ then our new one will be ‘odd’ and vice-versa. Then we build the HTML for our new row, carefully setting the id of the row to the book number, inserting the number and title, and then adding on the somewhat ugly final table element to have the Remove button with its proper onClick. The last few lines determine whether the table has a tbody or not, and appends our new row accordingly to the body or the table itself, so this works if we start with an empty library.

And that is that — when a user enters a number and clicks the add button, JQuery has attached the click event to it, so our “add” function is called, which gets the value, does the AJAX call which does the server-side adding and returns the book number and title as JSON. Our function takes that and gets a row added to the bottom of the table accordingly.

There’s only one minor improvement which I will make to this shortly, though it’s a minor use case: when a user deletes a book, the corresponding table row is removed but the classes of the remaining rows aren’t being updated. That means we lose our nice even/odd striping. As an exercise for the reader, I’ll let you add the code to the success function inside removeCallback() to iterate the table rows and set the class attributes after removing the row.

I hope this proves useful to some folks out there. The combination of Impromptu for pretty confirmation popovers (and error messaging), with in-place add/remove, provides a really nice user experience. And the patterns here end up being useful all over the place, so you’ll likely want to do the next step, which is to make this code more generic and put it into partials that you can re-use in multiple pages. Enjoy.

Getting form_for() to work with non-ActiveRecord models

February 16, 2010

Uncategorized

Using Sunspot for Free-Text Search with Redis

February 6, 2010

Uncategorized

8 Comments

After spending time to get some data into Redis (as documented in some of my previous posts here), I not surprisingly wanted to make the data searchable. After looking around at some of the full-text search solutions available for Ruby, I really liked the look of Sunspot. Well-presented, well-designed, and it even has decent documentation. It uses Solr underneath, which is a very respectable search engine, so that’s all good. Of course, it didn’t take me long to discover that the sunspot_rails plugin makes things drop-and-go when using ActiveRecord, but those of us branching off into alternatives have to put in more effort. Hence, I’ll document my findings here to hopefully make it easier for others.

I won’t bother going into the details of getting things set up, as the Sunspot wiki does a fine job of that. Suffice it to say that we install the gem (and the sunspot_rails gem if you’re going to have some ActiveRecord models as well), start the Solr server, and that’s about it. We’ve got Redis already going, right? So now it’s time to get our model indexed and searchable!

There are a few steps that we need to follow to make this happen. First, we put code in the model to tell Sunspot what fields should be indexed, which ones are just for ordering/filtering, and which ones should be stored if desired for quicker display:

class Book
  require 'sunspot'
  require 'sunspot_helper'

  # Pretend some attributes like number, title, etc are defined here

  Sunspot.setup(Book) do
    text :number, :boost => 2.0
    text :title, :boost => 2.0
    text :excerpt
    text :authors
    string :title, :stored => true
    string :number, :stored => true
    date :publication_date
  end

  def save
    book_key = "book:#{number}:data"
    @redis[book_key] = json_data
    @redis.set_add 'books', number
    # Make searchable
    Sunspot.index( self )
    Sunspot.commit
  end

  def self.find_by_number(redis, number)
    redis["book:#{number}:data"]
  end

First, note that we need to require 'sunspot' to get access to the Sunspot class. This isn’t required for ActiveRecord models, but since we’re on our own, we have to specify that. Then, we call setup, passing the name of our model. In the code block, we specify a few text fields: the number, title, excerpt, and authors. Those fields will be indexed and searchable. Then we specify title and number again as strings, asking that they be stored for quicker retrieval. This is so we can display just that data without fetching the whole object, if we want — I won’t get into the details of doing that here because, well, fetching the objects in Redis is so fast that I found it didn’t matter. Last, the publication date is also listed, so we can filter and order by it if we want.

In our save() method, after we store a book in Redis, we tell Sunspot to index it, and commit the updated index. So far, so good. In theory, we should be able to create a Book, save it, and then search for it. Alas, if this were an ActiveRecord model we’d be pretty much done (and wouldn’t even have to do the index/commit part because those are automagically triggered on create and update). Unfortunately, we have some harder work ahead of us.

Sunspot uses what it calls “adapters” to tell it what to do when it wants to identify an object, and when it wants to fetch an object given an id. We have to provide the adapters for our model. To give credit where it’s due, this Linux Magazine article helped me figure out what to do, and then reading through the Sunspot adapter source code filled in the blanks. If you look back at our model, you’ll see that it requires ‘sunspot_helper’. That’s where we’ll put our adapters:

/app/helpers/sunspot_helper.rb:

require 'rubygems'
require 'sunspot'

module SunspotHelper

  class InstanceAdapter < Sunspot::Adapters::InstanceAdapter
    def id
      @instance.number  # return the book number as the id
    end
  end

  class DataAccessor < Sunspot::Adapters::DataAccessor
    def load( id )
      Book.new(JSON.parse(Book.find_by_number(Redis.new, id)))
    end

    def load_all( ids )
      redis = Redis.new
      ids.map { |id| Book.new(JSON.parse(Book.find_by_number(redis, id))) }
    end
  end

end

So, what’s going on here? We provide two adapters for Sunspot: the InstanceAdapter, and the DataAccessor. The InstanceAdapter just provides a method that returns the ID of the object. Easy enough, we just return the book’s number, which is the unique identifier. The DataAccessor has to provide two methods, load() and load_all(), that take an id and a list of ids, respectively, and expect objects back. In my case, the objects are serialized JSON, so we just call our find_by_number() method to get each object, call JSON.parse() to get the Hash of data, and construct a new Book object. (Note: obviously this requires having an initializer that can take a Hash and create the object, which I’ll leave as an exercise) Now we just register our adapters, by adding a couple of lines of code right before the call to Sunspot.setup():

  Sunspot::Adapters::InstanceAdapter.register(SunspotHelper::InstanceAdapter, Book)

  Sunspot::Adapters::DataAccessor.register(SunspotHelper::DataAccessor, Book)

Now we should be good to go, right? Okay, we construct a Book object, and call save…then search for it:

b = Book.new({ "number" => 8888888, "title" => "My test title"})
=> #<Book:blahblah...
b.save
=> nil
search = Sunspot.search(Book) { keywords 'test' }
=> <Sunspot::Search:{:rows=>1, blahblah…
r = search.results
=> [#<Book:blahblah...
r[0].title
=> "My test title"

And we’re good! Congratulations. So now we want to add the search capability to our controller, right?

# In a view, put in a search form. I have a little search image, so excuse the image_submit_tag:
<% form_for(:book, :url => { :action => "search" }) do |f| %>
    <p>
      <%= f.label "Search for:" %>
      <input type="text" name="searchterm" id="searchterm" size="20">
      <%= image_submit_tag('search.png', :width => '30', :alt => 'Search', :style => 'vertical-align:middle') %>
    </p>
<% end %>

# Now in the controller. Note the pagination, which is why we store the search in the session,
# so we can grab it out again if they click forward/back through the pages.
  def search
    @search_term = params[:searchterm] || session[:searchterm]
    if (@search_term)
      session[:searchterm] = @search_term
    end
    page_number = params[:page] || 1
    search = Sunspot.search(Book) do |query|
      query.keywords @search_term
      query.paginate :page => page_number, :per_page => 30
      query.order_by :number, :asc
    end

    @books = search.results
  end

# And then in our search view, display the results:
<ul>
<% @books.each do |book| %>
    <li><%= book.number %>: <%= book.title %></li>
</ul>
<br />
Found: <%= @books.total_entries %> - <%= will_paginate @books %>

Yes, Sunspot is so cool that it integrates automatically with will_paginate. So, looking through the above, we have a form that posts to our action (assuming you set the routes up, which you did, yes?). The action then takes the searchterm parameter if it’s there, or extracts it from the session if it’s not there. Note that this is not robust code — if it’s called with no parm and nothing in the session, it will end up searching for an empty string, which will return every book. In any case, we store the search term in the session, so that when someone clicks through to page 2, we can re-run the search to get the second page. The more important code here, though, is the call to search.

I will give a thousand thanks to this blog post, specifically the fourth item! I was doing this:

    search = Sunspot.search(Book) do
      keywords @search_term
    end

And it didn’t work — it was fetching every object, even though I knew that @search_term was getting set properly. As that blog post notes, though, the search is done in a new scope, so this didn’t work. The code I showed above, using the query argument, fixes that problem. It certainly took me a while to figure that out, though, because nothing is said about it anywhere in the examples in the Sunspot wiki.

So now you should be all set. Put “test” into the form, submit it, the controller will do the search, return the book, and your view will list it. You are searching! Not so bad, and the fetches from Redis are so fast that the whole thing really speeds along. Pretty simple free-text search against any objects that you put into Redis.

A Warning

I had one other hitch when I was working on this, which mysteriously went away. I hate that. So, in case someone else encounters here, I wanted to document the issue. When I got the adapters in place for the Book model, and tried to work with it, I got an error saying that there was no adapter registered for String. I was very puzzled, wondering if something about the fact that Redis was returning a JSON String was confusing Sunspot. So I made a quick change to the InstanceAdapter:

  class InstanceAdapter < Sunspot::Adapters::InstanceAdapter
    def id
      if (@instance.class.to_s == "String")
        @instance
      else
        @instance.number  # return the book number as the id
      end
    end
  end

And changed the register lines in my model:

  Sunspot::Adapters::InstanceAdapter.register(SunspotHelper::InstanceAdapter, Book, String)

  Sunspot::Adapters::DataAccessor.register(SunspotHelper::DataAccessor, Book, String)

And that did the trick. I didn’t like it, and intended to try to figure out what was going on. But after getting all the rest of it working, when I put the code back to its pre-String-adapter state, the error didn’t return. Like I said, I hate that. Hopefully it was just due to something that I was unknowingly doing wrong which I fixed along the way, but…just in case, now the quick-fix is documented here for anyone else who runs into the problem.

Making a quick chart with RaphaelJS and Redis

February 5, 2010

Uncategorized

2 Comments

This afternoon I wanted to add another quick report to the system I’m building, and it was so easy that I thought I’d share some of the details. As I’ve written here before, I’m using the RaphaelJS library for simple charts, and it makes it very simple to create bar charts and pie charts. So I’ve already got a couple of those, using common code as I described in that earlier posting.

When I wanted to create a new chart, then, I knew I could leverage that. First, though, I needed to get at my data. What I was getting was essentially a list of categories, and the number of items in each category. Since this is stored in Redis, the items are key-value entries, and each category is a Set to which items belong. In this particular case, each item belongs to only one set.

So for the sake of an example, let’s say that we have books, divided into categories. We’ll store the books with keys like book:#:title and each category set will be called cat:name:

book:1234:title => “Technical Book”
book:5678:title => “Another Tech Book”
book:9012:title => “Gardening Book”
cat:technical => 1234, 5678
cat:gardening => 9012

So, for charting purposes, we want to get a list of the categories, and then for each one we want to fetch the number of books in it. In my reports_controller I have this:

  def books_by_category
    @chart = params[:chart] || "bar"
    @chart_title = "Books by Category"

    redis = Redis.new
    # Get the list of the categories, which are the labels for the graph
    keys = redis.keys("cat:*")

    # Now let's iterate through the categories and get the counts
    @labels = []
    @values = []
    keys.sort.each do |cat|
      count = redis.set_count(cat)
      @values << count
      cat_name = cat.scan(/cat:(.*)/)[0][0]
      if (@chart == 'pie')  # Need to add counts to the labels for pie charts
        cat_name << " (#{count})"
      end
      @labels << cat_name
    end
  end

First we get a Redis connection, and ask it for all of the category keys, using the pattern chosen: redis.keys("cat:*"). I want to stress something here: if you read the Redis docs (which of course you should, in depth) you’ll see that they say to never use this command in a production app! Obviously, if you have a lot of keys in the database, this is not a good command. In this particular case, I know that the database being used will not have too many keys, so I’m comfortable doing this — but be careful and make sure it’s okay for your case! If not, the solution is to create a new set that contains the names of all of the categories. Grab that set using SORT and work from there, which is simple. I also want to stress that, as with any reporting, if your data set grows (i.e. you start to have lots and lots of categories), you don’t want to run this frequently! Do the count occasionally and cache the results, create roll-up data from which to do your reporting, etc. This is a very simple case, but is a nice example of some tools.

Okay, so then we have the categories, and we iterate through them. For each, we get the count of entries using redis.set_count(cat). The redis-rb library aliases “set_count” to be the Redis command SCARD, which returns the cardinality of the set, i.e. the number of entries. We add that onto the @values array, and then create the category name by taking everything after “cat:” from the key. If we’re making a pie chart, we add the count to the labels, simply because I found that it’s very friendly that way. We add the category name to the labels array, and continue.

That’s pretty much it then! Using the previous reporting code, I just had to create a new view, which includes the partials I created before — the partials expect the @labels and @values arrays, so they’ll just graph whatever they get. Here’s the actual view:

<%= render :partial => "report_chart" %>

<br/><br/>

<h1>Reports : Books by Category</h1>
<%= render :partial => "report_links" %>

<br/><br/>

<div id="holder"></div>

If you refer back to my earlier post about RaphaelJS graphing, the report_chart partial contains the Javascript to generate the chart. The report_links partial simply has code to create links to the various chart types for this data: pie, bar, and csv. The holder div is where the RaphaelJS Javascript will render the chart.

And that’s all there is to it. Thanks to the ease of Redis sets, getting the data sliced and diced as needed was extremely simple, and thanks to easy Javascript reporting from RaphaelJS, the plain old label/value charting couldn’t be much quicker.

—While I Pondered…

Over many a quaint and curious volume of forgotten lore.

Archive

Tag Archives: rails

Elasticsearch with Rails and Tire

Accessing Salesforce data from Ruby/Rails

Devise and OmniAuth for Single Sign On

Getting user info into a Radiant page

Rails and forms using accepts_nested_attributes_for

Displaying Ruby and environment details

AJAX-driven in-place add/delete with nice-looking confirmations

Getting form_for() to work with non-ActiveRecord models

Using Sunspot for Free-Text Search with Redis

A Warning

Making a quick chart with RaphaelJS and Redis