Spree n+1 Queries

Recently while working on a project that uses the Spree Commerce Platform, my team began to notice that an image heavy view was running way too slowly. We initially thought that the slowdown was a result of rendering a large amount of images on the view, but after checking the logs, we noticed that the slowdown was largely caused by an n+1 query.

A n+1 query is a problem that arises with the use of an Object Relational Mappper or ORM to select a dataset (the 1), and then while iterating over that dataset, unintentionally make another query for each record in the dataset you are iterating over (the n). Here is an example of some code that causes an n+1 query:

# an example of a n+1 query
# A User has many Posts
users = User.all

users.each { |user| puts user.posts.count }

This example is a n+1 query because after the initial query for all of the users, it then iterates over the users and prints the count of all of their posts. While the code looks harmless enough, the ORM masks the fact that it is running a query each time user.posts.count is called. When working with a relational database, it is much faster to make a query for 100 records than it is to make 100 queries for 1 record, and because of this, a n+1 query can really become a major performance issue if not reconciled.

So now that we know what a n+1 query is and how it affects your systems, how do you go about fixing them. In this case we can use eager loading to prevent an n+1 query from happening. Eager loading is when you load all of the needed records ahead of time instead of sending a new query for each record when it is called. If you are working with Active Record, it provides a few methods of eager loading the records that you need. Here is an example of how we can prevent the above example from becoming a n+1 query:

# an example using eager load
# A User has many Posts
users = User.all.eager_load(:posts)

users.each { |user| puts user.posts.count }

It's a pretty simple change to use eager loading, and the performance improvements if you have a large n+1 query would be pretty impressive. In this case Active Record will do a single query to retrieve all of the Users and their associated posts instead of one query to get all of the users and n queries to get their posts count.

Now that we have gone over what a n+1 query is and how to fix them, let's get back to the original problem I was having with Spree. The code that we had looked similar to this:

# retrieves the products
products = products.all

# in the view
<% products.each do |product| %>
  <%= image_tag(product.image) %>
<% end %>

In order to fix this issue, we used eager loading to load all of the images for each product, but it was not as simple as adding eager_load(:images) to our query. The way spree works is that each product has many variants, and each variant has a picture. This caused us to not just be doing an n+1 query, but it was actually a 2n+1 query because 2 queries were being made for each product. Our solution to this problem was to use nested eager loading, and to use only the master variant for each product. The resulting query looked like this:

products = products.all.eager_load(master: [:images])

Making this change reduced thousands of queries to just a single query, and reduced the loading time for the page considerably.

Conclusion

Whenever you are working with an ORM, make sure that you know when you are issuing multiple queries, and make sure that you know how to fix it if you end up doing so.

Legacy Code No Time Have To Change

Outside of work I have continued reading Michael Feathers' book Working Effectively with legacy code, and storing this knowledge away for the day when I encounter a piece of legacy code that I have to work with. In the section I read last night, Michael goes over how to make a quick change to some code without breaking dependencies or breaking the functionality of the legacy code.

Sprout Method

The first technique that Michael brings up is called Sprout Method, and it involves replacing an existing local variable that the method uses with the return of a new method you wrote. This allows you to write a tested method that returns a new value to the legacy method. Here is an example: Lets say that you have an untested method that iterates over an array of Car objects, and then counts how many have fully inflated tires. That looks something like this:

def tire_pressure_counter(car_array)
  counter = 0
  car_array.each do | car |
    if car.full_tires?
      counter += 1
  end
end

Now lets say that the requirements for this method have changed, and you have been tasked with making the method now only count cars that are the color yellow. Without any tests surrounding this method how would you make that change?

def tire_pressure_counter(car_array)
  counter = 0
  car_array.each do | car |
    if car.full_tires? && car.color == yellow
      counter += 1
  end
end

While this seems like it was a pretty easy fix, it was actually an invasive change that does not have any tests that back it up, but we can do this another way that will get some of the code under test.

def tire_pressure_counter(car_array)
  car_array = yellow_cars(car_array)
  counter = 0
  car_array.each do | car |
    if car.full_tires? && car.color == yellow
      counter += 1
  end
end

def yellow_cars(car_array)
  car_array.select { |car| car.color == yellow }
end

This last example is using the sprout method technique to add a new feature to a legacy method. Since the new behavior is encapsulated in its own method, we can test the new behavior and insert it into the old method without changing its behavior. This may seem like overkill for such a small method, but if there is a time when you need to insert new behavior into a legacy method quickly and effectively, this technique may save you a lot of headache.

Wrap Method

Another technique that Michael describes in his book is called "wrap method," and it is used like the Sprout Method technique: when you have little time to make the change without breaking the application. For this example your boss's boss wants to be alerted every time someone completes a transaction at the drive through window. Currently the application keeps track of the transactions in a single method that does not have any tests around it. How would you make this change?

def drive_through_transaction(purchase)
  total = 0
  purchase.each { |purchase| total += purchase.price }
  payment = recieve_payment
  change = payment - total 
  return change
end

Like in the Sprout Method example you could make a change like this in order to alert your boss's boss:

def drive_through_transaction(purchase)
  total = 0
  purchase.each { |purchase| total += purchase.price }
  payment = recieve_payment
  change = payment - total 
  alert_supreme_boss
  return change
end

def alert_supreme_boss
 # alert code
end

This change tightly couples the alert code to the drive through transaction code, and it would make it more difficult for someone else to come along and make a change to this method. There is a different and more effective way to make this change in a short amount of time:

def complete_transaction(purchase)
  total = 0
  purchase.each { |purchase| total += purchase.price }
  payment = recieve_payment
  change = payment - total 
  alert_supreme_boss
  return change
end

def alert_supreme_boss
 # alert code
end

def drive_through_transaction(purchase)
  complete_transaction(purchase)
  alert_supreme_boss
end

This change is done using the Wrap Method technique that Michael uses in his book, and it is a more effective change because it decouples the transaction code from the alert code while maintaining the same interface as before. This wrap method change could be extended even further by separating the alerted transaction with the normal transaction in case the boss gets annoyed with all of the alerts.

def complete_transaction(purchase)
  total = 0
  purchase.each { |purchase| total += purchase.price }
  payment = recieve_payment
  change = payment - total 
  alert_supreme_boss
  return change
end

def alert_supreme_boss
 # alert code
end

def alerted_drive_through_transaction(purchase)
  complete_transaction(purchase)
  alert_supreme_boss
end

def drive_through_transaction(purchase)
  complete_transaction(purchase)
end

conclusion

While these methods are great to use when you are short on time and do not have a test suite of any kind to support you, they are not always the best choice because they can make code more difficult to read because of too much abstraction. This risk is acceptable for small changes, but in order to make a large change in a system, you will need to break some dependencies and work to bring more of the legacy system into test coverage.

Squashing, Not Only for Bugs

When you think of the word squash in a software context, is bug the first thing that comes to your mind? That is definitely the first thing that comes to my mind too, but today I learned about another thing that can be squashed, git commits. After I submitted a pull request for the project that I am currently working on, the pull request reviewer asked me to squash my commits. At first I had no idea what squashing was, so I did the usual and searched "squash commit" in google. The result of that search informed me that by using the git interactive rebase tool, I could combine many commits into a single commit.

Well that answered one question, but it presented me we another one: why do I want to squash all my commits into just a single one? I did some more googling, and I think I came up with a very good reason: keeping your master branch clean and providing a clear path of what features are being merged. While it is important to keep detailed track of your progress while building a feature, pulling many small commits into the master branch can pollute the log with unneeded commits. Is it better to have a log that is succinct or overly detailed to the point of being unreadable?

So now that I know why it is sometimes a good reason to squash your commits, how do I go about doing so? If I wanted to squash my last 2 commits, first I would enter:

$ git rebase -i HEAD~2

pick f7f3f6d Add new feature
pick 310154e Fix some spelling

# Rebase 710f0f8..310154e onto 710f0f8
#
# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out

After you enter the commad "git rebase -i ", you are taken to the interactive rebase tool, shown under the command. This tool shows you the commit range that you have selected with oldest at the top and newest at the bottom. Now to get to the squashing! In order to squash your new erroneous commit into the older more important commit, all you have to do is change the word "pick" to "squash." After changing the word, save and quit your editor, and a new screen will come up in your editor. This screen is the verbose commit screen, and this is where you will write a new commit message for the squashed commit. Thats it!

Active Record Scopes

Recently while I was working on a rails project, I came across a method called Active Record Scope. An active record scope allows you to assign a method to an active record model to return a subset of records that match the conditions of the scope. This gives you a very simple way to reach a group of records, and it prevents you from having to write the query in multiple locations. Using a scope shortens something like this:

Your::Model.where(name: "Joe", model: true)

to

# on models/model.rb
scope :joe?, -> { where(name: "Joe", model: true) }
# where ever you call the query

Your::Model.joe?

Other than preventing you from having to write the same query over and over again, scopes have some other benefits than just being syntactic sugar for a class method. One benefit is that a scope will always return an active record relation. If you were to write a class method instead of a scope that returned with no matched records, the class method would return nil and break the chainability of the method. This is not a problem that would have to worried about while using a scope because it will always return an active record relation.

Scopes have been very helpful in reducing the amount of queries I need to write, and they are something I always look to use in my rails projects.

The Seam Model

As I continued reading Michael Feathers' book Working Effective With Legacy code, I reached a chapter on something called The Seam Model. Michael starts off this chapter with an idea that I have run into before, and that is that existing code is poorly suited to testing. I have run into the issue that if I do not plan ahead of time or do not practice TDD, that I will get to a piece of code that is poorly suited for testing. One reason that this code is not suited to being tested is that certain dependencies do not work in a test environment, or that the classes intended behavior does not suit a test environment.

In order to get around this issue, Michael brings up the idea of the code seam. A code seam is a place where you can alter behavior in your program without editing its source code. Immediately I thought of the dependency inversion principle because relying on an interface instead of a dependency is a code seam. Because there is an interface, we can pass in any object we want that implements the interface. This is the code seam that I am most familiar with in OOP, but Michael goes into great detail about code seams in C++ as well.

The next few chapters start go into a FAQ style, and I look forward into finding sections that I can apply to my everyday programming toolkit.