Shipping When You're Not Sure

I recently had to write a query planner for a graph store. The query planner wasn’t terribly complicated - it used some simple heuristics and estimates to decide whether it would be better to execute a query term against the underlying data store or perform in-memory filtering. The hope was that this would improve the vast majority of queries without introducing any regressions.

Normally in a situation like this, I would ship the query planner behind a feature flag and instrument the code with some metric or trace that I could use to compare query performance from before and after the query planner was enabled. The trouble is, our product was not yet generally available, so query traffic was sporadic and highly variable. Some queries would take 200 ms without the planner, and some would timeout at 30 seconds. This was not an ideal dataset to test performance improvements.

What I wanted was to compare the query planners performance against a baseline using the same queries under similar conditions. This required a different approach. Luckily, I had a couple of advantages: load wasn’t a concern, and all of the queries were read-only.

The pattern

I figured it would be pretty easy to execute both code paths in parallel: the new query planner and the baseline. Go makes this especially easy - I wrapped the actual query logic in a method that took a flag as a parameter indicating whether to use the new planner or not:

func execQuery(ctx context.Context, query string, usePlanner bool) *queryResult {
    result := u.execQueryVariant(ctx, query, usePlanner)
    go func(primary *queryResult, primaryIsPlanner bool) {
        ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
        defer cancel()
        baseline := u.execQueryVariant(ctx, query, !usePlanner)
        // log results, compare primary vs baseline
    }(result, usePlanner)
    return result
}

The primary path returns immediately. The comparison runs in a goroutine, logs both results, and records the delta. I also set up an email alert so that I would be notified in the case that results diverged - latency differences could be inspected manually, but I wanted to know immediately if there were correctness issues.

Load wasn’t a concern in this case. If it were, this approach wouldn’t work. Also, all of the queries are read-only and therefore free from side effects. When these constraints hold, this approach is very helpful for comparing apples to apples.

The payoff

I deployed this with the feature flag turned off at first. User traffic was processed using the old query execution path and the query planner was executed asynchronously. This paid off immediately - I spotted several cases where the planner was making suboptimal decisions. Because I was comparing against baseline in real time, I could tune the planner incrementally, deploy, and watch the metrics improve.

This isn’t shadow testing in the traditional sense. Shadow testing typically routes traffic to a secondary system to validate it can handle load. This is closer to a continuous correctness check: same input, both paths, compared outputs.

Cleaning up

After a couple of weeks running the system with the query planner enabled and seeing good results, I removed the dual execution path so nobody tripped over it. These experiments have a tendency to become permanent complexity so it’s important to remember to remove the experiment code.

A Complementary Technique

As I was doing this work, I remembered a similar situation. I’ve written before about shipping on a spent error budget – using traffic isolation to keep iterating when you can’t afford to break things. Swimlanes and parallel runs solve similar problems differently:

  • Swimlanes work when you can route traffic separately and tolerate the new path being broken (no real users yet)
  • Parallel runs work when you need to validate against real production queries but can afford the extra compute and ensure no side effects.

Both are tools for building confidence, hopefully others find them helpful!