How to systematically eliminate N+1 queries from a Rails application

Kyle d'Oliveira
Clio Labs
Published in
7 min readNov 10, 2017

--

At Clio, we’ve managed to remove approximately 80% of N+1 queries overnight, and have the tools to remove the more complicated ones whenever we detect them. The solution is, at the end of the day, pretty simple.

Let’s begin this post with the pretty standard question:

What is an N+1 query?

Consider a situation where you have users that each have many tasks. We may have models in a Rails application that looks like this:

class User < ActiveRecord::Base
has_many :tasks
end
class Task < ActiveRecord::Base
belongs_to :user
end

Now, imagine we want to get a list of tasks for each user. Naively, we may write some code that does this:

User.all.each do |user|
user.tasks.each do |task|
#Do something with each task
end
end

Do you see the problem here? This generates an N+1 query.

Every time you access a user’s tasks, it goes back to the database to load them. If you looked at the SQL that was generated, you would see:

SELECT * FROM users
SELECT * FROM tasks WHERE user_id = 1
SELECT * FROM tasks WHERE user_id = 2
SELECT * FROM tasks WHERE user_id = 3
SELECT * FROM tasks WHERE user_id = 4
SELECT * FROM tasks WHERE user_id = 5
SELECT * FROM tasks WHERE user_id = 6
SELECT * FROM tasks WHERE user_id = 7
SELECT * FROM tasks WHERE user_id = 8

Notice that in this case, we had 8 users, so we executed 9 queries (1 for the user, 8 for the tasks). If we had N users, we would have executed N+1 queries (hence the name).

N+1 queries are not a new problem, but they are often silent killers of performance. They also do not influence behavior, so it is really easy for tests to let them slip through. Further, the individual queries are all fast, so they will not show up in any slow query log either.

How does Rails solve this?

As mentioned, this is not a new problem. There are solutions in ActiveRecord that can remove these problems. Their solution is to eagerly load the data, and this is accomplished with a simple method includes. We could very simply change the above code to this:

User.includes(:tasks).all.each do |user|
user.tasks.each do |task|
#Do something with each task
end
end

And now the queries that get executed are:

SELECT * FROM users
SELECT * FROM tasks WHERE user_id IN (1, 2, 3, 4, 5, 6, 7, 8)

Huzzah! No more N+1 query. Instead we always make 2 queries: one for the users, and one for all tasks that belong to that user.

Aside: includes uses either eager_load or preload. Most of the time, for large tables, preload is far superior. Check out this blog post for further reading on some of the differences.

Why is that not sufficient?

For very trivial applications, it is likely sufficient. However, for larger applications it is not because:

  1. The place where the includes must be added is detached from the code that actually uses the objects.

1.1) If the code using the objects needs access to a new association, an N+1 query will be reintroduced

User.includes(:tasks).all.each do |user|
user.tasks.each do |task|
# Do something with each task
end
user.reminders.each do |reminder|
# Do something with each reminder
end

end
SELECT * FROM users
SELECT * FROM tasks WHERE user_id IN (1, 2, 3, 4, 5, 6, 7, 8)
SELECT * FROM reminders WHERE user_id = 1
SELECT * FROM reminders WHERE user_id = 2
SELECT * FROM reminders WHERE user_id = 3
SELECT * FROM reminders WHERE user_id = 4
SELECT * FROM reminders WHERE user_id = 5
SELECT * FROM reminders WHERE user_id = 6
SELECT * FROM reminders WHERE user_id = 7
SELECT * FROM reminders WHERE user_id = 8

That’s not good.

1.2) If the code using the objects stops using an association then it will still make the eager load query. This is not an N+1 query, but it is a SQL query that is being executed and then not being used.

1.3) If the querying code changes and removes one of the associations from the list of associations to eager load, those N+1 queries will come back.

2) It can only handle associations. It cannot deal with aggregations, or if a filtered result set needs to be used.

With our users & tasks example, imagine wanting to get a count of how many tasks each user had, or wanting to know the today’s tasks for each user.

User.includes(:tasks).all.each do |user|
user.tasks.each do |task|
# Do something with the tasks
end
puts users.tasks.count
puts users.tasks.where(date: Date.today).to_a

end
SELECT * FROM users
SELECT * FROM tasks WHERE user_id IN (1, 2, 3, 4, 5, 6, 7, 8)
SELECT COUNT(*) FROM tasks WHERE user_id = 1
SELECT * FROM tasks WHERE user_id = 1 AND date = ‘2017–11–08’
SELECT COUNT(*) FROM tasks WHERE user_id = 2
SELECT * FROM tasks WHERE user_id = 2 AND date = ‘2017–11–08’
SELECT COUNT(*) FROM tasks WHERE user_id = 3
SELECT * FROM tasks WHERE user_id = 3 AND date = ‘2017–11–08’
SELECT COUNT(*) FROM tasks WHERE user_id = 4
SELECT * FROM tasks WHERE user_id = 4 AND date = ‘2017–11–08’
SELECT COUNT(*) FROM tasks WHERE user_id = 5
SELECT * FROM tasks WHERE user_id = 5 AND date = ‘2017–11–08’
SELECT COUNT(*) FROM tasks WHERE user_id = 6
SELECT * FROM tasks WHERE user_id = 6 AND date = ‘2017–11–08’
SELECT COUNT(*) FROM tasks WHERE user_id = 7
SELECT * FROM tasks WHERE user_id = 7 AND date = ‘2017–11–08’
SELECT COUNT(*) FROM tasks WHERE user_id = 8
SELECT * FROM tasks WHERE user_id = AND date = ‘2017–11–08’

What other tools exist?

  1. Bullet is an neat little gem. It does help detect N+1 queries and unused eager loading in development and test environments. Unfortunately, it only really helps with detection of N+1 queries and is not intended to be run in a production environment. It is also a manual effort to hunt and fix every N+1 query.
  2. Batch-loader (Blog post) allows for the elimination of N+1 but also requires lots of custom methods to be written. In an ideal situation, we should only write code when we are doing something non-standard.
  3. Jit preloader. (More about this below!)

How can we eliminate N+1 queries?

We need to do preloading, but we need to do it at the time of access instead of at the time of the query. This is the important distinction, and one that requires some base code to start with. To do this, we need to hold a reference to all of the objects that we want to group together and when we load the data for one object in that group, we will load the data for all of them. This is best explained in code so let’s look at an example:

class User < ActiveRecord::Base
has_many :tasks
def tasks(records = [self])
return association(:tasks).target if association(:tasks).loaded?
ids = records.map(&:id)
tasks = Task.where(user_id: ids).group_by(&:user_id)

records.each do |record|
record.association(:tasks).target = tasks[record.id] || []
end
association(:tasks).target
end
end

This code is complicated. We’d first need to return the value of the association if it was loaded. If it isn’t loaded, we can manually load the tasks we are interested in into memory and assign them to the associations of the records. Then finally return the contents of the association. However, it would let us do this:

users = User.all
users.each do |user|
user.tasks(users).each do |task|
# Do something with each task
end
end
SELECT * FROM users
SELECT * FROM tasks WHERE user_id IN (1, 2, 3, 4, 5, 6, 7, 8)

There are three things you may notice here:

1) The tasks method is super ugly, and we need to pass the user group around to the task method. (Booo. However, we can abstract all of that work away)

2) There is no N+1 query here. (Yay!)

3) The task query does not get executed until at least one of the task methods on the user model is called. (Yaaay!)

This is just the general strategy though. This is where we bring in some tools to take care of the heavy lifting for us. Enter: jit_preloader

This gem us gives us the tools we need to tackle all kinds of N+1 queries. In fact, out of the box, it automatically preloads associations at the time of access instead of at the time of the initial query.

And this gives the tools to handle some custom logic as well. We had an example above that that wanted to load task counts and today’s tasks. These can be preloaded using this general pattern.

class User < ActiveRecord::Base  attr_writer :todays_tasks, :task_count  def todays_tasks
return @todays_tasks if @todays_tasks
records = jit_preloader&.records || [self]
tasks = Task
.where(date: Date.today, user_id: records.map(&:id))
.group_by(&:user_id)
records.each do |record|
record.todays_tasks = tasks[record.id] || []
end
@todays_tasks
end
def task_count
return @task_count if @task_count
records = jit_preloader&.records || [self]
tasks = Task
.where(user_id: records.map(&:id)
.group(:user_id)
.count
records.each do |record|
record.task_count = tasks[record.id] || []
end
@task_count
end
end

These methods are a little complicated, but they all follow the same pattern and can easily be extended to handle endless custom logic. Finally, we can write:

User.all.each do |user|
user.tasks.each do |task|
# Do something with the tasks
end
puts users.task_count
puts users.todays_tasks
end

Which gives us this SQL:

SELECT * FROM users
SELECT * FROM tasks WHERE user_id IN (1, 2, 3, 4, 5, 6, 7, 8)
SELECT * FROM tasks WHERE date = ‘2017–11–10’ AND user_id IN (1, 2, 3, 4, 5, 6, 7, 8)
SELECT COUNT(*) AS count_all, `tasks`.`user_id` AS tasks_user_id FROM `tasks` WHERE `tasks`.`user_id` IN (1, 2, 3, 4, 5, 6, 7, 8) GROUP BY `tasks`.`user_id`

Now notice:

1) This code is free of N+1 queries

2) The query (User.all) doesn’t need to know about how it is being used at all

3) If we stopped using a method likeUser#task_count, that query would just go away. We are no longer paying the cost every time

4) The cost of ensuring there is no N+1 query is paid at the time the method was written not when the method is used. It is must easier to catch this in code review. One less thing to worry about!

The jit_preloader gem has a couple other goodies such as N+1 query events via ActiveSupport::Notifications, and helps for working with aggregation that need to be preloaded.

Check it out and remove the N+1 queries from your application!

--

--