The Real Cost of ActiveRecord

Rails developers love ActiveRecord, and for good reason. It turns database access into something that feels like plain Ruby. You write User.where(active: true) and you get back objects you can call methods on, mutate, validate, and save. The SQL stays out of your way.

But that convenience isn't free. ActiveRecord is an abstraction, and every abstraction trades away some control for ease of use. The trouble is that the bill for that trade rarely arrives when you make the call. It shows up later, in a slow endpoint, a memory spike, or a background job that quietly takes ten minutes longer than it should.

This article is about making that cost visible, so you can decide when it's worth paying and when it isn't. None of this is an argument against ActiveRecord. It's an argument for using it with your eyes open.

A mental model: ActiveRecord is two libraries

Before getting into specifics, it helps to hold a clearer mental model. ActiveRecord is really doing two distinct jobs at once.

The first job is query building: turning method chains like where, joins, and order into SQL. This part is genuinely cheap. A relation is lazy — it doesn't touch the database until you force it to — and the SQL it generates is usually fine.

The second job is object mapping: taking the rows the database returns and turning them into model instances. This is the expensive part. It's also the part developers forget is happening, because it's invisible. You see the query in your logs; you don't see the 50,000 object allocations that followed it.

Almost every performance technique in this article comes down to one idea: control the second job. Decide deliberately how many objects you instantiate, how heavy they are, and whether you need them at all.

What happens when you call `.all`

Consider a deceptively simple line:

users = User.all.to_a

When this query runs, ActiveRecord doesn't just hand you rows. For every single row returned, it does roughly the following:

Allocates a new model object.
Reads the raw column values from the database result.
Type-casts each value to its Ruby equivalent — strings, integers, BigDecimal, Time objects, booleans, parsed JSON.
Populates the attributes hash on the instance.
Sets up dirty tracking, so the object can later report which attributes changed.
Wires up the machinery for callbacks, validations, and association proxies, whether or not you use them.

For a query returning 10 rows, none of this matters. For one returning 50,000, you've just asked Ruby to allocate tens of thousands of fairly heavy objects, each with its own attributes hash and change-tracking state, and you've handed the garbage collector a large mess to clean up afterward.

This is the key insight: the SQL query might be fast. A well-indexed SELECT over 50,000 rows can return in tens of milliseconds. The cost is in everything ActiveRecord does after the database responds. That work happens in Ruby, single-threaded, on your application server, and it scales linearly with the number of rows.

It's worth getting a feel for the magnitude. Instantiating a model object is on the order of a few microseconds, but "a few microseconds" times 50,000 is a noticeable fraction of a second of pure CPU — before you've rendered anything or done any actual work. And the GC pressure has a second-order cost: more allocations mean more frequent garbage collection pauses, which affect every request the process handles, not just the slow one.

The fix isn't to stop using ActiveRecord. It's to notice when you're instantiating objects you don't actually need, and the rest of this article is a tour of the ways to do that.

The dirty tracking tax

Every ActiveRecord model instance keeps track of which of its attributes have changed since it was loaded. That's what makes user.changed?, user.name_changed?, and user.name_was work, and it's genuinely essential when you intend to save the record back — it's how Rails knows which columns to put in the UPDATE statement.

But step back and think about how often you actually save a record you've loaded. In most applications, the overwhelming majority of records loaded in a request are read and then discarded. A JSON API endpoint loads records, serializes them, and returns. A reporting page loads records, sums some numbers, and renders. A search results page loads records, shows titles, and moves on. In none of these cases does the dirty tracking state get used. It was built up, kept in memory, and thrown away.

For a single request, this overhead is negligible — not worth a moment's thought. The reason it's worth knowing about is aggregate load. On an endpoint serving thousands of requests a minute, every small per-object cost is multiplied by the number of objects and again by the request rate. Things that are invisible in isolation become a measurable slice of CPU time in aggregate.

When a path is genuinely read-only, you have a few escape hatches. Marking a relation as readonly doesn't eliminate dirty tracking, but it does signal intent and will raise if anyone tries to save. More importantly, the techniques later in this article — pluck, select, and dropping to raw values — sidestep the cost entirely by not building full instances in the first place. The cheapest dirty tracking is the kind that never gets set up.

Select only what you need

By default, ActiveRecord generates SELECT *. Every column comes back, every time.

This is easy to forget about until you're working with a wide table — one with dozens of columns, large text fields, or serialized blobs. A posts table with a body column holding long-form articles is the classic example. If you're rendering a list of post titles, fetching the full body of every post is pure waste twice over: once on the wire between the database and your app, and again in Ruby memory where every one of those strings has to live.

# Pulls every column, including that big body field
Post.where(published: true)


# Pulls only what the index page needs
Post.where(published: true).select(:id, :title, :slug, :published_at)

The difference is not subtle on a wide table. A row that might be several kilobytes with the body included can drop to a couple hundred bytes when you select only the display columns. Multiply across a page of 50 posts and you've cut the data the request handles by an order of magnitude.

The tradeoff with select is that you get partial objects. Calling a method that touches an unselected column raises ActiveModel::MissingAttributeError. This is genuinely a feature: it forces every code path to be honest about what it depends on. But it does mean select is best used in narrow, well-understood paths — a specific index action, a specific export — rather than sprinkled everywhere. A partial object that escapes into a shared helper or a serializer expecting full records will fail at runtime, and possibly only for some inputs.

A related point: select can also compute columns. select("posts.*, COUNT(comments.id) AS comments_count") with a join lets the database do aggregation that you'd otherwise do with an N+1 of post.comments.count calls. The computed value is available as a method on the returned objects. This is one of the cleaner ways to attach a count to a list without a separate query per row.

`pluck`, `pick`, and skipping objects entirely

Sometimes you don't need objects at all. You need values.

If you want a list of user email addresses to feed into another query, a CSV, or an external API call, instantiating a User object for each one is a detour you can skip entirely:

# Builds a User object per row, then reads one attribute off each and discards the rest
User.where(active: true).map(&:email)


# Goes straight from the database result to an array of strings
User.where(active: true).pluck(:email)

pluck runs the query and returns raw, type-cast values — no model instances, no attributes hash, no dirty tracking, no callbacks, no association machinery. For a result set of any real size, the difference in allocations and memory is dramatic. The map(&:email) version allocates one full object per row and then throws all of them away. The pluck version allocates only the array and its values.

pluck takes multiple columns and gives you back arrays of arrays:

User.where(active: true).pluck(:id, :email)
# => [[1, "[email protected]"], [2, "[email protected]"], ...]

pick is the single-row companion, added in Rails 6. It's effectively pluck plus limit(1), returning just the value (or values) for the first row:

User.where(active: true).pick(:email)
# => "[email protected]"

The rule of thumb: if you're loading records only to read a column or two off them, pluck and pick are almost always the better call. The moment you find yourself needing model methods, validations, or several columns' worth of behavior, that's the signal to go back to full objects.

One caveat worth internalizing: because pluck returns plain values, it bypasses any logic you've defined in attribute readers or methods. If your model overrides email to downcase it, pluck(:email) gives you the raw database value, not the result of your method. That's usually what you want for performance-sensitive paths, but it's a sharp edge if you forget.

N+1 queries: the cost that compounds

No discussion of ActiveRecord's cost is complete without N+1 queries, because they're the single most common performance problem in Rails applications, and they're a direct consequence of how pleasant the abstraction is to use.

The pattern looks completely innocent:

posts = Post.where(published: true).limit(25)
posts.each do |post|
  puts post.author.name
end

One query loads 25 posts. Then, for each post, post.author triggers another query to load that post's author. That's 1 + 25 = 26 queries where 2 would do. At 25 rows the page is merely slower than it should be. On a page showing 200 rows, it's 201 queries, and the latency is now firmly in "users notice" territory — each query carries its own round-trip overhead to the database, and they run one after another.

The fix is includes, which tells ActiveRecord to load the association up front:

posts = Post.where(published: true).includes(:author).limit(25)
posts.each do |post|
  puts post.author.name
end

Now it's 2 queries: one for the posts, one for all the needed authors at once. includes is smart about strategy — it will use a separate query by default, or a LEFT JOIN via eager_load when conditions require it.

The reason N+1s are so persistent isn't that the fix is hard. It's that the broken code works. It returns correct results, it passes tests, it looks clean, and nothing about reading it screams "this is 200 queries." The problem is invisible at the point you write it and only shows up under real data volumes. That invisibility is the recurring theme of this entire article, and it's why the section near the end on instrumentation matters as much as the techniques themselves.

A couple of related sharp edges. First, includes followed by a where that references the included table will silently switch to eager_load (a JOIN), which can change behavior in subtle ways — be deliberate when you do this. Second, includes eager-loads the whole association; if you only need a count, a counter cache or a select with a grouped count is lighter than loading every associated row into memory.

Batch processing and memory cliffs

Here's a mistake that hides well in development and bites hard in production:

User.where(active: true).each do |user|
  user.send_weekly_digest
end

In development, with a few hundred users in your database, this is completely fine. In production, with 200,000 active users, where(...).each loads all 200,000 records into memory at once, as a single array of full objects, before the loop body runs even once. That's a memory cliff, and it's one of the most common reasons background jobs get killed by the OOM reaper or send a process's memory usage permanently higher (Ruby doesn't always return freed memory to the OS).

ActiveRecord provides find_each and find_in_batches for exactly this situation:

User.where(active: true).find_each do |user|
  user.send_weekly_digest
end

find_each loads records in batches — 1,000 at a time by default, configurable with batch_size: — and yields them to your block one at a time. From your code's perspective it looks identical to each, but memory stays flat regardless of whether the table has 10,000 rows or 10,000,000. Only one batch is in memory at any moment.

find_in_batches does the same batching but yields each batch as an array rather than yielding individual records. That's the one you want when you can hand a whole batch to something that works in bulk:

User.where(active: true).find_in_batches(batch_size: 500) do |batch|
  emails = batch.map(&:email)
  ExternalService.bulk_notify(emails)
end

Two gotchas are worth knowing. First, find_each and find_in_batches impose their own ordering by primary key, because that's how they paginate efficiently (each batch is "rows with id greater than the last one I saw"). If you pass a custom order, it is ignored — sometimes with a warning, but ignored either way. When ordering matters for correctness, batch iteration this way is the wrong tool, and you need cursor-based pagination on your sort column instead. Second, the batching adds query round-trips: 200,000 rows at a batch size of 1,000 is 200 queries. That's almost always a worthwhile trade against running out of memory, but it's not free, and very small batch sizes on very large tables can make it noticeable.

When to skip ActiveRecord entirely

For bulk writes, full ActiveRecord can be the wrong tool altogether. Saving 10,000 records by calling create or save in a loop means 10,000 separate INSERT statements, plus validations running 10,000 times, plus callbacks firing 10,000 times, plus 10,000 objects instantiated. Even wrapped in a transaction, the per-row overhead dominates.

Rails 6 introduced insert_all and upsert_all for this:

Product.insert_all(
  [
    { name: "Widget", price_cents: 999, created_at: Time.current, updated_at: Time.current },
    { name: "Gadget", price_cents: 1499, created_at: Time.current, updated_at: Time.current }
  ]
)

This produces a single bulk INSERT statement. The performance difference against a row-by-row loop is not incremental — it's often one or two orders of magnitude for large imports.

The catch, and it's a real one, is what gets skipped: insert_all and upsert_all do not run validations, do not fire callbacks, and do not instantiate models. They also don't automatically set created_at / updated_at — you pass those yourself, as above. For data you've already validated, or imports where you fully control the input shape, that's usually a fair trade for a massive speedup. The danger is when a validation or callback was load-bearing — a callback that maintained a counter, normalized a field, or enqueued a downstream job. Skipping it silently means the data lands but the side effects don't. Before reaching for bulk inserts, it's worth taking sixty seconds to list what the model's callbacks actually do.

Then there's the read side: reports, dashboards, and analytics. Aggregations, multi-table joins, window functions, common table expressions. Trying to express these through ActiveRecord method chains often produces something that is both slower and harder to read than the equivalent SQL — and sometimes it can't be expressed in the chain at all. ActiveRecord::Base.connection.execute and select_all are there for exactly this. Dropping to SQL for a genuinely SQL-shaped problem isn't a failure of Rails knowledge; it's a sign of it. The instinct to keep everything in ActiveRecord chains for consistency's sake is the thing to question, not the SQL.

One middle-ground worth knowing: select_all returns an ActiveRecord::Result, a lightweight wrapper over rows as hashes, with none of the model-instantiation cost. It's a good fit for report queries where you want raw SQL but still want a tidy object to iterate.

Seeing the cost before it hurts

If there's a thread running through every section here, it's this: the expensive thing is almost always invisible at the point you write it. The N+1 returns correct results. The SELECT * works fine on a narrow table in development. The where(...).each runs instantly against a seed database of 200 rows. Every one of these is a problem that only exists at production data volume, which is exactly where you're not looking when you write the code.

That's why instrumentation isn't a nice-to-have on top of these techniques — it's what makes them usable. Knowing that pluck is cheaper than map(&:email) is useless if you don't know which endpoint is the one allocating half a million objects. You can't optimize a query you don't know is slow, and you can't fix an N+1 you can't see.

Concretely, the things worth having visibility into are: which endpoints are slowest and why; which actions are firing N+1 query patterns; which requests allocate the most objects or hold the most memory; and how all of that changes after you deploy a fix. With that feedback loop in place, performance work stops being guesswork — "the app feels slow" becomes "the dashboard endpoint loads 12,000 comment records it never renders" — and a fix becomes a measurable before-and-after rather than a hopeful change.

This is the niche DeadBro is built for: Rails-focused application performance monitoring, with N+1 query detection, error tracking, and real-time dashboards, aimed at developers who want this visibility without an enterprise-scale budget. Whatever tool you reach for, the principle holds — the techniques in this article only pay off when you can see where to apply them.

The takeaway

ActiveRecord is not the enemy. For the vast majority of CRUD work — loading a record, updating it, saving it, following an association once — it's the right tool, and rewriting that work as raw SQL would be a slow, error-prone step backward. The abstraction earns its keep every day.

The point is to use it deliberately. Every call you make has a cost: objects allocated, columns type-cast, change-tracking wired up, callbacks queued, associations made ready. Most of the time that cost is trivial and the convenience is overwhelmingly worth it. But in the hot paths, the wide tables, the large batch jobs, and the heavy reports, that cost is no longer noise — it's the actual thing slowing your application down.

Mid-level Rails developers tend to learn ActiveRecord as a set of things you can do. The shift that comes with experience is learning it as a set of things that each have a price, and developing the judgment to know when that price is worth paying. pluck over map, select over SELECT *, includes over N+1, find_each over each, insert_all over a save loop, raw SQL over a tortured chain — none of these are tricks. They're just the same judgment applied over and over: instantiate what you need, and nothing more.

A fast Rails app isn't one that avoids ActiveRecord. It's one that knows when not to use it.