Just enough abstraction / Markus Ylisiurunen

Take a look at the code below. Do you notice anything wrong?

async function getBestSellers() {
  const BANNED_CATEGORIES = ["drugs"];
  const resp = await fetch("https://amazon.com/api/best-sellers");
  const data = (await resp.json()) as BestSeller[];
  return data.filter((i) => !BANNED_CATEGORIES.includes(i.category));
}

It's not the lack of error handling, the use of as, the hard-coded endpoint URL, or the non-configurable list of banned categories. It's the fetch call.

Abstractions are often associated with unnecessary complexity. However, I'd argue that targeted domain-specific abstractions often make code significantly simpler. Sometimes it is hard to strike the right balance between over- and under-abstraction:

Over-abstraction: This often happens because a piece of code is designed to fulfill a broad range of use cases. A simple abstraction would not be configurable enough.
Under-abstraction: I think this is much more common. It's easy to not spend enough time thinking about a proper abstraction boundary and jump into implementation too early.

In this post, I walk through an example using the above snippet of code as the starting point. Before explaining why the fetch call is much worse than it seems, let's set the scene by defining the context of what we are trying to do.

Context

For the sake of this post, let's say we are responsible for building an endpoint for rendering the front page of Amazon's store. Specifically, let's focus on a single feature on the front page: a list of best-selling products.

To obtain the list of best-selling products, we need to use another API endpoint (/api/best-sellers). This endpoint responds with a list of currently best-selling products that we render on the front page. We don't need to concern ourselves with how the endpoint is implemented. However, there are a few things to keep in mind:

The endpoint has a varying response time, typically ranging between 0.1 and 2 seconds.
Approximately 1% of the time, a load balancer in front of the endpoint responds with a timeout.
The database behind the endpoint is outdated and cannot handle excessive load.

The website is already up and running. Here's the current implementation:

async function getBestSellers() {
  const BANNED_CATEGORIES = ["drugs"];
  const resp = await fetch("https://amazon.com/api/best-sellers");
  const data = (await resp.json()) as BestSeller[];
  return data.filter((i) => !BANNED_CATEGORIES.includes(i.category));
}

app.get("/", async (request, reply) => {
  const bestSellers = await getBestSellers();
  const html = renderToStaticMarkup(<FrontPage bestSellers={bestSellers} />);
  reply.send(html);
});

Even though the implementation above works, it's far from optimal. With a bit of thought, we can make a few key improvements to make the front page more reliable.

Caching

It's fairly pointless to re-fetch the best-selling products every time someone visits the store. They likely don't change that often. The first logical step is to add simple caching logic so that we only occasionally fetch the best-selling products from the API.

Let's implement a naive in-memory cache:

let bestSellersCache: {
  data: BestSeller[];
  expiresAt: Date;
} | null = null;

async function getBestSellers() {
  if (bestSellersCache && Date.now() < bestSellersCache.expiresAt.getTime()) {
    return bestSellersCache.data;
  }
  const BANNED_CATEGORIES = ["drugs"];
  const resp = await fetch("https://amazon.com/api/best-sellers");
  const allBestSellers = (await resp.json()) as BestSeller[];
  const bestSellers = allBestSellers.filter(
    (i) => !BANNED_CATEGORIES.includes(i.category),
  );
  bestSellersCache = {
    data: bestSellers,
    expiresAt: new Date(Date.now() + 60 * 1000),
  };
  return bestSellers;
}

This greatly reduces the number of requests the /api/best-sellers endpoint receives. However, we still have a problem. The getBestSellers function's execution time is quite unpredictable. Most of the time it returns immediately, but occasionally, it might take up to 2 seconds.

One fix for this would be to add a "soft" expiration time for the cached value. The getBestSellers function could then use the cached value until its hard expiration time is reached, but re-fetch a new value in the background after the soft expiration time.

Here's one way to implement this:

let bestSellersCache: {
  data: BestSeller[];
  staleAt: Date;
  expiresAt: Date;
} | null = null;

async function _getBestSellers() {
  const BANNED_CATEGORIES = ["drugs"];
  const resp = await fetch("https://amazon.com/api/best-sellers");
  const allBestSellers = (await resp.json()) as BestSeller[];
  const bestSellers = allBestSellers.filter(
    (i) => !BANNED_CATEGORIES.includes(i.category),
  );
  bestSellersCache = {
    data: bestSellers,
    staleAt: new Date(Date.now() + 60 * 1000),
    expiresAt: new Date(Date.now() + 10 * 60 * 1000),
  };
  return bestSellers;
}

async function getBestSellers() {
  if (bestSellersCache && Date.now() < bestSellersCache.expiresAt.getTime()) {
    if (Date.now() >= bestSellersCache.staleAt.getTime()) {
      void Promise.resolve()
        .then(() => _getBestSellers())
        .catch((error) => console.log(error));
    }
    return bestSellersCache.data;
  }
  return _getBestSellers();
}

It's already much better than what we started with. There are still issues, though. What happens when we go past the staleAt threshold? For every call to getBestSellers, we make an HTTP request against the /api/best-sellers endpoint. If our store gets 1000 page views per second, we will potentially "leak" up to 2000 requests against the /api/best-sellers endpoint until we write a new value to the in-memory cache.

So, how do we fix this? There are many options, but let's go with a simple one for now:

let bestSellersCache: {
  data: BestSeller[];
  staleAt: Date;
  expiresAt: Date;
  isReFetching: boolean;
} | null = null;

async function _getBestSellers() {
  const BANNED_CATEGORIES = ["drugs"];
  const resp = await fetch("https://amazon.com/api/best-sellers");
  const allBestSellers = (await resp.json()) as BestSeller[];
  const bestSellers = allBestSellers.filter(
    (i) => !BANNED_CATEGORIES.includes(i.category),
  );
  bestSellersCache = {
    data: bestSellers,
    staleAt: new Date(Date.now() + 60 * 1000),
    expiresAt: new Date(Date.now() + 10 * 60 * 1000),
    isReFetching: false,
  };
  return bestSellers;
}

async function getBestSellers() {
  if (bestSellersCache && Date.now() < bestSellersCache.expiresAt.getTime()) {
    if (
      Date.now() >= bestSellersCache.staleAt.getTime() &&
      !bestSellersCache.isReFetching
    ) {
      bestSellersCache.isReFetching = true;
      void Promise.resolve()
        .then(() => _getBestSellers())
        .catch((error) => console.log(error))
        .finally(() => {
          if (bestSellersCache) {
            bestSellersCache.isReFetching = false;
          }
        });
    }
    return bestSellersCache.data;
  }
  return _getBestSellers();
}

Alright, looking better. What if the endpoint is down and our cached value has expired, though? We would be bombarding the struggling endpoint with lots of volume. It's even worse than usual because the endpoint's resources are most likely only meant to be capable of handling the regular load with the cache in place.

Let's not implement a circuit breaker for now and come back to it later on. Now is a good time to write some tests for our changes.

Testing

How do we test our code when it's written in this way? We should probably test at least the following scenarios:

No products from the drugs category are shown on the front page.
When the in-memory cache is empty, the request is sent to the API.
If the value is cached, no request is sent to the API.
Cache (both time-to-stale and time-to-live) works as expected.
We never have more than one in-flight request against the API.

If you look closely, only one of these test cases is actually related to the feature we are implementing. The others are testing implementation-specific details that are in no way related to what getBestSellers is supposed to be doing. Our code is quite a mess and it's really hard to write tests for each of these scenarios in isolation.

At the very beginning, I mentioned that the biggest problem was the direct fetch call. We can now see why. If we wanted to write even one test, we would need to mock a global function. That's really bad.

Here's one option for mocking the fetch call:

describe("getBestSellers", () => {
  let BEST_SELLERS: BestSeller[] = [];
  const unmockedFetch = global.fetch;

  beforeAll(() => {
    global.fetch = jest.fn(() =>
      Promise.resolve({
        json: () => Promise.resolve(BEST_SELLERS),
      }),
    );
  });

  afterAll(() => {
    global.fetch = unmockedFetch;
  });

  it("returns an empty list", async () => {
    BEST_SELLERS = [];
    const bestSellers = await getBestSellers();
    expect(bestSellers).toEqual([]);
  });
});

Mocking global functions like this should honestly involve jail time. It's the absolute worst possible way to write tests (or more like writing such bad code that it requires mocking global functions). Why? Here are a few reasons:

You cannot run your tests in parallel anymore. Every test case needs its own mocked fetch which responds with the data you need for the current test case.
Mocking global functions (or the runtime's internals) often involves some extremely questionable and shady mechanisms. Take a look at nock or Jest's timer mocks.
It's way too easy to leave mocked functions behind. You likely end up with flaky tests.
The inverse is also true. It's way too easy to forget to mock some functions. What happens if you forget to mock the global fetch? Your tests will make real network calls.

If writing tests like this is bad, what would be better? We need to introduce just enough abstraction. Let's introduce the concept of a Fetcher:

interface Fetcher {
  <T>(url: string, init?: RequestInit): Promise<T>;
}

function fetchFetcher(): Fetcher {
  return async <T>(url: string, init?: RequestInit): Promise<T> => {
    const resp = await fetch(url, init);
    return (await resp.json()) as T;
  };
}

We can now replace the direct fetch call with a call to an injected Fetcher:

async function _getBestSellers(fetcher: Fetcher) {
  const BANNED_CATEGORIES = ["drugs"];
  const allBestSellers = await fetcher<BestSeller[]>(
    "https://amazon.com/api/best-sellers",
  );
  const bestSellers = allBestSellers.filter(
    (i) => !BANNED_CATEGORIES.includes(i.category),
  );
  bestSellersCache = {
    data: bestSellers,
    staleAt: new Date(Date.now() + 60 * 1000),
    expiresAt: new Date(Date.now() + 10 * 60 * 1000),
    isReFetching: false,
  };
  return bestSellers;
}

Writing tests for this implementation is much easier. We don't need to mock any global functions in our tests anymore. Here's how the example test would look:

describe("getBestSellers", () => {
  it("returns an empty list", async () => {
    const bestSellers = await getBestSellers(async <T>() => [] as T);
    expect(bestSellers).toEqual([]);
  });
});

So much better. We still have the same problem as before; we still need to write tests for implementation details. It doesn't make much sense to have to test caching in this context. In fact, caching shouldn't even be the concern of getBestSellers but be implemented elsewhere.

Abstractions

Even in our tiny example, we had to implement quite a bit of ad-hoc logic just to have an extremely simple cache with trivial issues fixed (and still many bugs left behind). It also made our business logic code noisier. Let's revert back to the initial version and re-examine how we could implement the same features in a better way:

async function getBestSellers(fetcher: Fetcher) {
  const BANNED_CATEGORIES = ["drugs"];
  const data = await fetcher<BestSeller[]>(
    "https://amazon.com/api/best-sellers",
  );
  return data.filter((i) => !BANNED_CATEGORIES.includes(i.category));
}

app.get("/", async (request, reply) => {
  const bestSellers = await getBestSellers(fetchFetcher());
  const html = renderToStaticMarkup(<FrontPage bestSellers={bestSellers} />);
  reply.send(html);
});

On top of the implementation above, we need the following features:

Caching: We want to cache the responses from the API and re-use them as much as possible. For the best-selling products, it doesn't matter if we show slightly stale data.
Request coalescing: There's no reason to send multiple parallel identical requests to the API. We want to only send one request and re-use its response for every caller.
Circuit breaking: If the upstream API goes down, we want to prevent excessive load against it. Hence, we need a circuit breaker which blocks requests if the API seems to be down.

Do these features seem like the responsibility of getBestSellers? I don't think so. They are much more internal implementation details of the Fetcher interface.

getBestSellers: Implements the business logic of getting, filtering, and manipulating the best-selling products. It is not concerned with the implementation details of actually making the HTTP requests.
Fetcher: The "how" in getting a resource from an API. It is not concerned with how the response is used afterwards.

With this separation of concerns in mind, let's implement a cached fetcher:

type CachedFetcherOpts = {
  tts: number;
  ttl: number;
};

function cachedFetcher(fetcher: Fetcher, opts: CachedFetcherOpts): Fetcher {
  type CachedValue = { data: unknown; staleAt: Date; expiredAt: Date };
  const cache = new Map<string, CachedValue>();
  return async <T>(url: string, init?: RequestInit): Promise<T> => {
    if ((init?.method ?? "GET") !== "GET") {
      return fetcher<T>(url, init);
    }
    const cachedResp = cache.get(url);
    if (cachedResp) {
      if (Date.now() < cachedResp.expiredAt.getTime()) {
        if (Date.now() >= cachedResp.staleAt.getTime()) {
          void fetcher(url, init)
            .then((data) => {
              const staleAt = new Date(Date.now() + opts.tts);
              const expiredAt = new Date(Date.now() + opts.ttl);
              cache.set(url, { data, staleAt, expiredAt });
            })
            .catch((error) => {
              console.log(error);
            });
        }
        return cachedResp.data as T;
      }
      cache.delete(url);
    }
    const data = await fetcher<T>(url, init);
    const staleAt = new Date(Date.now() + opts.tts);
    const expiredAt = new Date(Date.now() + opts.ttl);
    cache.set(url, { data, staleAt, expiredAt });
    return data;
  };
}

By having this Fetcher implementation be responsible for only a single responsibility (caching the response), writing relevant tests for it is fairly easy:

describe("cachedFetcher", () => {
  it("caches the response", async () => {
    let count = 0;
    const fetcher = cachedFetcher(
      async <T>() => {
        count += 1;
        return {} as T;
      },
      { tts: 1000, ttl: 1000 },
    );
    await fetcher("https://amazon.com/api/ping");
    await fetcher("https://amazon.com/api/ping");
    expect(count).toEqual(1);
  });
});

With the minimal Fetcher abstraction, it's now much easier to build composable logic. Let's leave the implementation for request coalescing out for brevity, but implement the circuit breaker Fetcher:

type CircuitBreakerFetcherOpts = {
  count: number;
  delay: number;
};

function circuitBreakerFetcher(
  fetcher: Fetcher,
  opts: CircuitBreakerFetcherOpts,
): Fetcher {
  const errors = new Map<string, { count: number; lastErrorAt: Date }>();
  return async <T>(url: string, init?: RequestInit): Promise<T> => {
    if ((init?.method ?? "GET") !== "GET") {
      return fetcher<T>(url, init);
    }
    const error = errors.get(url);
    if (error && error.count >= opts.count) {
      const isRecent = Date.now() - error.lastErrorAt.getTime() < opts.delay;
      if (isRecent) {
        throw new Error("upstream unavailable");
      }
    }
    try {
      const data = await fetcher<T>(url, init);
      errors.delete(url);
      return data;
    } catch (error) {
      errors.set(url, {
        count: (errors.get(url)?.count ?? 0) + 1,
        lastErrorAt: new Date(),
      });
      throw error;
    }
  };
}

Finally, composing the different fetchers to work together could look something like this:

const bestSellerFetcher = cachedFetcher(
  requestCoalescingFetcher(
    circuitBreakerFetcher(fetchFetcher(), {
      count: 8,
      delay: seconds(60),
    }),
  ),
  {
    tts: seconds(10),
    ttl: minutes(15),
  },
);

app.get("/", async (request, reply) => {
  const bestSellers = await getBestSellers(bestSellerFetcher);
  const html = renderToStaticMarkup(<IndexPage bestSellers={bestSellers} />);
  reply.send(html);
});

Conclusions

If we avoid spending time thinking about proper abstractions, our code can quickly become unmaintainable, leading to a higher likelihood of bugs and making it difficult to extend or modify functionality in the future. Our code becomes more complex, not simpler.

Even in the simple and naive example, we saw how introducing a small, targeted Fetcher abstraction allowed us to incrementally improve the quality and resilience of our code without adding unnecessary complexity. By keeping our abstractions focused and composable, we created a more robust and maintainable system.

Striking the right balance when it comes to abstractions is crucial. While the wrong abstraction can be worse than no abstraction at all, it's equally important to recognize that a lack of abstractions can lead to bugs, code duplication, tight coupling, and difficulty in testing and maintaining the codebase over time.

The key is to strive for "just enough abstraction" – identifying the right level of abstraction for the domain-specific problem at hand. By doing so, we can create more modular, reusable, and easier-to-reason-about code, ultimately leading to a more maintainable and bug-free system.