06 Feb 2015
Recently while working on a project that uses the Spree Commerce Platform, my
team began to notice that an image heavy view was running way too slowly. We
initially thought that the slowdown was a result of rendering a large amount of images
on the view, but after checking the logs, we noticed that the slowdown was largely
caused by an n+1 query.
A n+1 query is a problem that arises with the use of an Object Relational
Mappper or ORM to select a dataset (the 1), and then while iterating over that
dataset, unintentionally make another query for each record in the dataset you
are iterating over (the n). Here is an example of some code that causes an n+1
query:
# an example of a n+1 query
# A User has many Posts
users = User.all
users.each { |user| puts user.posts.count }
This example is a n+1 query because after the initial query for all of the
users, it then iterates over the users and prints the count of all of their
posts. While the code looks harmless enough, the ORM masks the fact that it is
running a query each time user.posts.count is called. When working with a
relational database, it is much faster to make a query for 100 records than it is to make 100 queries for 1
record, and because of this, a n+1 query can really become a major performance
issue if not reconciled.
So now that we know what a n+1 query is and how it affects your systems, how do
you go about fixing them. In this case we can use eager loading to prevent an
n+1 query from happening. Eager loading is when you load all of the needed
records ahead of time instead of sending a new query for each record when it is
called. If you are working with Active Record, it provides a few methods of
eager loading the records that you need. Here is an example of how we can
prevent the above example from becoming a n+1 query:
# an example using eager load
# A User has many Posts
users = User.all.eager_load(:posts)
users.each { |user| puts user.posts.count }
It's a pretty simple change to use eager loading, and the performance
improvements if you have a large n+1 query would be pretty impressive. In this
case Active Record will do a single query to retrieve all of the Users and their
associated posts instead of one query to get all of the users and n queries to
get their posts count.
Now that we have gone over what a n+1 query is and how to fix them, let's get
back to the original problem I was having with Spree. The code that we had
looked similar to this:
# retrieves the products
products = products.all
# in the view
<% products.each do |product| %>
<%= image_tag(product.image) %>
<% end %>
In order to fix this issue, we used eager loading to load all of the images for
each product, but it was not as simple as adding eager_load(:images) to our
query. The way spree works is that each product has many variants, and each
variant has a picture. This caused us to not just be doing an n+1 query, but it
was actually a 2n+1 query because 2 queries were being made for each product.
Our solution to this problem was to use nested eager loading, and to use only
the master variant for each product. The resulting query looked like this:
products = products.all.eager_load(master: [:images])
Making this change reduced thousands of queries to just a single query, and
reduced the loading time for the page considerably.
Conclusion
Whenever you are working with an ORM, make sure that you know when you are
issuing multiple queries, and make sure that you know how to fix it if you end
up doing so.
29 Jan 2015
Outside of work I have continued reading Michael Feathers' book Working
Effectively with legacy code, and storing this knowledge away for the day when I
encounter a piece of legacy code that I have to work with. In the section I read
last night, Michael goes over how to make a quick change to some code without
breaking dependencies or breaking the functionality of the legacy code.
Sprout Method
The first technique that Michael brings up is called Sprout Method, and it
involves replacing an existing local variable that the method uses with the
return of a new method you wrote. This allows you to write a tested method that
returns a new value to the legacy method. Here is an example: Lets say that you
have an untested method that iterates over an array of Car objects, and then counts
how many have fully inflated tires. That looks something like this:
def tire_pressure_counter(car_array)
counter = 0
car_array.each do | car |
if car.full_tires?
counter += 1
end
end
Now lets say that the requirements for this method have changed, and you have
been tasked with making the method now only count cars that are the color
yellow. Without any tests surrounding this method how would you make that
change?
def tire_pressure_counter(car_array)
counter = 0
car_array.each do | car |
if car.full_tires? && car.color == yellow
counter += 1
end
end
While this seems like it was a pretty easy fix, it was actually an invasive
change that does not have any tests that back it up, but we can do this another
way that will get some of the code under test.
def tire_pressure_counter(car_array)
car_array = yellow_cars(car_array)
counter = 0
car_array.each do | car |
if car.full_tires? && car.color == yellow
counter += 1
end
end
def yellow_cars(car_array)
car_array.select { |car| car.color == yellow }
end
This last example is using the sprout method technique to add a new feature to a
legacy method. Since the new behavior is encapsulated in its own method, we can
test the new behavior and insert it into the old method without changing its
behavior. This may seem like overkill for such a small method, but if there is a
time when you need to insert new behavior into a legacy method quickly and
effectively, this technique may save you a lot of headache.
Wrap Method
Another technique that Michael describes in his book is called "wrap method,"
and it is used like the Sprout Method technique: when you have little time to
make the change without breaking the application. For this example your boss's
boss wants to be alerted every time someone completes a transaction at the drive
through window. Currently the application keeps track of the transactions in a
single method that does not have any tests around it. How would you make this
change?
def drive_through_transaction(purchase)
total = 0
purchase.each { |purchase| total += purchase.price }
payment = recieve_payment
change = payment - total
return change
end
Like in the Sprout Method example you could make a change like this in order to
alert your boss's boss:
def drive_through_transaction(purchase)
total = 0
purchase.each { |purchase| total += purchase.price }
payment = recieve_payment
change = payment - total
alert_supreme_boss
return change
end
def alert_supreme_boss
# alert code
end
This change tightly couples the alert code to the drive through transaction
code, and it would make it more difficult for someone else to come along and
make a change to this method. There is a different and more effective way to
make this change in a short amount of time:
def complete_transaction(purchase)
total = 0
purchase.each { |purchase| total += purchase.price }
payment = recieve_payment
change = payment - total
alert_supreme_boss
return change
end
def alert_supreme_boss
# alert code
end
def drive_through_transaction(purchase)
complete_transaction(purchase)
alert_supreme_boss
end
This change is done using the Wrap Method technique that Michael uses in his
book, and it is a more effective change because it decouples the transaction
code from the alert code while maintaining the same interface as before. This
wrap method change could be extended even further by separating the alerted
transaction with the normal transaction in case the boss gets annoyed with all
of the alerts.
def complete_transaction(purchase)
total = 0
purchase.each { |purchase| total += purchase.price }
payment = recieve_payment
change = payment - total
alert_supreme_boss
return change
end
def alert_supreme_boss
# alert code
end
def alerted_drive_through_transaction(purchase)
complete_transaction(purchase)
alert_supreme_boss
end
def drive_through_transaction(purchase)
complete_transaction(purchase)
end
conclusion
While these methods are great to use when you are short on time and do not have
a test suite of any kind to support you, they are not always the best choice
because they can make code more difficult to read because of too much
abstraction. This risk is acceptable for small changes, but in order to make a
large change in a system, you will need to break some dependencies and work to
bring more of the legacy system into test coverage.
28 Jan 2015
When you think of the word squash in a software context, is bug the first thing
that comes to your mind? That is definitely the first thing that comes to my
mind too, but today I learned about another thing that can be squashed, git
commits. After I submitted a pull request for the project that I am currently
working on, the pull request reviewer asked me to squash my commits. At first I
had no idea what squashing was, so I did the usual and searched "squash commit"
in google. The result of that search informed me that by using the git
interactive rebase tool, I could combine many commits into a single commit.
Well that answered one question, but it presented me we another one: why do I
want to squash all my commits into just a single one? I did some more googling,
and I think I came up with a very good reason: keeping your master branch clean
and providing a clear path of what features are being merged. While it is important to keep
detailed track of your progress while building a feature, pulling many small
commits into the master branch can pollute the log with unneeded commits. Is it
better to have a log that is succinct or overly detailed to the point of being
unreadable?
So now that I know why it is sometimes a good reason to squash your commits, how
do I go about doing so? If I wanted to squash my last 2 commits, first I would
enter:
$ git rebase -i HEAD~2
pick f7f3f6d Add new feature
pick 310154e Fix some spelling
# Rebase 710f0f8..310154e onto 710f0f8
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out
After you enter the commad "git rebase -i ", you are taken to the
interactive rebase tool, shown under the command. This tool shows you the commit
range that you have selected with oldest at the top and newest at the bottom.
Now to get to the squashing! In order to squash your new erroneous commit into
the older more important commit, all you have to do is change the word "pick" to
"squash." After changing the word, save and quit your editor, and a new screen
will come up in your editor. This screen is the verbose commit screen, and this
is where you will write a new commit message for the squashed commit. Thats it!
23 Jan 2015
Recently while I was working on a rails project, I came across a method called
Active Record Scope. An active record scope allows you to assign a method to an
active record model to return a subset of records that match the conditions of
the scope. This gives you a very simple way to reach a group of records, and it
prevents you from having to write the query in multiple locations. Using a scope
shortens something like this:
Your::Model.where(name: "Joe", model: true)
to
# on models/model.rb
scope :joe?, -> { where(name: "Joe", model: true) }
# where ever you call the query
Your::Model.joe?
Other than preventing you from having to write the same query over and over
again, scopes have some other benefits than just being syntactic sugar for a
class method. One benefit is that a scope will always return an active record
relation. If you were to write a class method instead of a scope that returned with
no matched records, the class method would return nil and break the chainability
of the method. This is not a problem that would have to worried about while
using a scope because it will always return an active record relation.
Scopes have been very helpful in reducing the amount of queries I need to write,
and they are something I always look to use in my rails projects.
13 Jan 2015
As I continued reading Michael Feathers' book Working Effective With Legacy
code, I reached a chapter on something called The Seam Model. Michael starts off
this chapter with an idea that I have run into before, and that is that existing
code is poorly suited to testing. I have run into the issue that if I do not
plan ahead of time or do not practice TDD, that I will get to a piece of code
that is poorly suited for testing. One reason that this code is not
suited to being tested is that certain dependencies do not work in a test
environment, or that the classes intended behavior does not suit a test
environment.
In order to get around this issue, Michael brings up the idea of the code seam.
A code seam is a place where you can alter behavior in your program without
editing its source code. Immediately I thought of the dependency inversion
principle because relying on an interface instead of a dependency is a code
seam. Because there is an interface, we can pass in any object we want that
implements the interface. This is the code seam that I am most familiar with in
OOP, but Michael goes into great detail about code seams in C++ as well.
The next few chapters start go into a FAQ style, and I look forward into finding
sections that I can apply to my everyday programming toolkit.