Skip to content
Go back

Debugging Thread Safety in Legacy Rails Applications

Debugging Thread Safety in Legacy Rails Applications

Working on legacy software often feels like archaeology. You dig through layers of history, trying to understand how things worked. You never know what you’re going to find, until you stumble upon something unexpected.

While maintaining a Rails 5 application recently, I encountered a race condition involving ActiveResource headers that would vanish intermittently. This post details the debugging process and why cross-cutting concerns like authentication require extra care in multi-threaded environments.

At first glance, everything appeared normal: the credentials were valid, the token hadn’t expired, and the request logic hadn’t changed. That inconsistency suggested something deeper: either concurrency or a shared-state issue.

Tracing the Failure Pattern

The failures were only seen in two places:

Both run multiple threads. That pointed to a possible thread-safety problem. We traced the issue to where authentication headers were being set before each ActiveResource call — a standard around_action pattern:

ActiveResource::Base.headers.merge!(auth_headers)
yield
ensure
  ActiveResource::Base.headers.clear

This pattern should isolate headers per request and clean up afterwards. However, putting a debugger into the header assignment revealed unexpected behavior:

ActiveResource::Base.headers returned a new, empty hash on each call.

That meant our merge! was being applied to a temporary hash that wasn’t used by subsequent requests.

The Root Cause

We checked the gem version: activeresource 5.1.1. A dive into the gem’s GitHub history for this older version led us to the source of the problem: a pull request that had refactored the headers method. The intention was likely to clean up the implementation, but a side effect was that it no longer returned the persistent, thread-local hash we relied on. It created an ephemeral one on every call.

This meant our merge! was adding a header to a temporary hash that was immediately discarded. The next ActiveResource call would get another new, empty hash, sending a request with no authentication.

Here is an example demonstrating how this behavior breaks the expectation of shared state under concurrency:

require 'concurrent'

ActiveResource::Base.headers["X-THREAD"] = "A"

results = Concurrent::Array.new

threads = 2.times.map do |i|
  Thread.new do
    results << ActiveResource::Base.headers.object_id
  end
end

threads.each(&:join)
results.uniq.size
# Expected: 1
# Actual: 2

The result was subtle but critical:

This regression only surfaced in multi-threaded scenarios, which explained why it appeared inconsistent.

Fixing the Problem Correctly

Part 1: A Safe Monkey Patch

We restored the correct behavior with a monkey patch guarded by version checks. One key detail was handling subclass inheritance — naïvely sharing a single headers hash across classes could cause cross-resource contamination.

The patch duplicates a parent’s headers for subclasses on first access:

if defined?(ActiveResource) && ActiveResource::VERSION::STRING == "5.1.1"
  module ActiveResource
    class Base
      class << self
        def headers
          if _headers
            if _headers_defined?
              _headers
            elsif superclass != Object && superclass.headers
              # This correctly handles inheritance by duplicating the parent's headers
              # on first access, allowing subclasses to have their own distinct headers.
              self._headers = superclass.headers.dup
            else
              self._headers = {}
            end
          end
        end
      end
    end
  end
end

This ensured:

Part 2: Handling Background Jobs

Sidekiq workers were also affected, but they don’t go through controller middleware. We introduced a custom Sidekiq server middleware to set and clear headers per job. This centralized the logic and removed the duplicated header manipulation code from workers.

module Middleware
  class SidekiqAuthMiddleware
    def call(worker, job, queue)
      auth_headers = job['args'].last.is_a?(Hash) && job['args'].last.key?('X-SESSION-TOKEN') ? job['args'].last : nil

      if auth_headers
        ActiveResource::Base.headers.merge!(auth_headers)
      end

      yield
    ensure
      if auth_headers
        ActiveResource::Base.headers.clear
      end
    end
  end
end

This acts exactly like the controller logic: intercept each job, set the headers from the job’s arguments, run the job, and then reliably clear the headers, even if the job fails.

Conclusion

This experience was a harsh reminder that thread safety is not optional. Code that appears safe in a single-threaded development environment can fail unpredictably under concurrent load, making it critical to assume your code will run in threaded environments like Puma or Sidekiq. It also highlighted the risks of relying on library internals rather than public APIs—our implicit reliance on ActiveResource’s internal implementation broke with a minor version change. Ultimately, we learned that execution context defines state scope. Whether handling web requests or background jobs, any management of global or thread-local state requires ensuring it is correctly initialized and torn down in every execution context your application uses.

You might also like

If this saved you time or helped you reason about a trade-off, feel free to reply on Twitter or email me .


Share this post on:

Previous Post
What A Philosophy of Software Design Taught Me About Writing Better Software
Next Post
Migrating from Gridsome to Astro: A Developer Journey