Markus Sosnowski
Markus Sosnowski

Caching Strategies for REST APIs: From Client, CDN, Gateway, to the Server

A deep dive how to best cache your REST API in the network: from HTTP Headers, Shared CDNs, Reverse Proxies, to Application-Aware Edge Caching. Learn when to use each strategy.

Caching Strategies for REST APIs: From Client, CDN, Gateway, to the Server

REST APIs are intended to be cacheable by default. However, in modern applications dealing with dynamic, authorized data, developers often hit a wall. Driven by the fear of data inconsistency—serving stale data to the wrong user—caching is frequently disabled entirely. The result? Unnecessary latency and avoidable server load.

To solve this, we need to move beyond simple Cache-Control headers and look at the entire network topology. This article analyzes different layers of API caching—from the Client to Proxies, CDNs, Gateway, and the Server. We will explore when standard HTTP protocols suffice, when application logic is required, and how modern Edge Computing and Tag-Based Invalidation can deliver dynamic content globally. In this article, we will focus only on write-through caching.

The Core Challenge: Standard vs. Application Caching

When designing API caching, we generally deal with two different approaches that often get confused.

  1. Standard HTTP Caching (RFC 9111): This relies on headers like Cache-Control, Vary, and ETag. The cache (whether a browser or a proxy) is “agnostic.” It doesn’t understand your business logic; it simply follows rules like “keep this file for 1 hour.” This is excellent for static assets or public data (like a generic weather API) but risky for private user data.
  2. Application-aware Caching: This is driven by code and configuration. The cache understands User IDs, JWT claims, and database states. For example, a frontend framework deciding that data is “stale” because a mutation occurred, or an Edge Worker checking a token before serving a cached response.

Let’s break down how these strategies apply across four layers of the network: from the client, to in-network solutions, to gatewys, and the application server.

Layer 1: The Client (Browser & Application)

The “Client” layer consists of two distinct parts: the browser’s internal engine and your JavaScript application code.

The Browser Cache (RFC 9111)

The private browser cache is persistent and aggressive.

  • The Trap: If you accidentally send Cache-Control: public, max-age=604800 for a user-specific endpoint (e.g., GET /user/profile), that JSON is locked in the user’s disk cache for a week. Even if they update their profile, the browser won’t ask the server for new data.
  • Heuristic Caching: Even if you send no headers, browsers might guess a cache duration based on the Last-Modified date. Always send no-store or no-cache if you want to be safe.
  • Hard to clear: Once a browser has cached a response, you cannot force it to clear it from the server side easily.

Client-Side Application

Modern SPAs (Single Page Applications) rarely rely on the raw browser cache for API data. Instead, they use state management libraries like Tanstack Query, SWR, or Apollo.

  • Control: These libraries allow for strategies like stale-while-revalidate (showing the user cached data immediately while fetching updates in the background) or fine-granular invalidations. They usually work via user-defined cache keys (e.g., [“user”, 42]) and invalidation of single or group of keys (e.g., all keys starting with “user”)
  • Lifecycle Management: Unlike the HTTP cache, this state is in memory (or IndexedDB). A crucial implementation detail is ensuring this state is cleared on Logout. Otherwise, the next user on the same device might have access to the previous user’s sensitive data.

Layer 2: In-Network (CDN, Edge & Web Proxies)

This layer sits between the user and your infrastructure. Historically, transparent web proxies (often at ISPs) handled this. However, the ubiquitous adoption of HTTPS means transparent proxies can no longer inspect or cache traffic.

Fun fact: Steam downloads games still per default via HTTP. This allows LAN party organizers or ISPs to the cache massive game files transparently.

However, the ubiquitous adoption of HTTPS means transparent proxies can no longer inspect or cache traffic, especially for APIs. The “In-Network” layer has moved to Managed Caches (CDNs and Edge Compute) explicitly configured by the developer.

Shared Caches (i.e., CDNs)

Shared caches work similarly to the browser cache, but the content is accessed by multiple users. This means they are great for static assets, like your html and css files, but not ideal for dynamic or user-specific data (like APIs). Different Cache-Control headers apply to shared caches, if you add private, no shared cache will store the result. You can also define a different TTL with the s-maxage directive. Such caches can also collapse multiple requests into a single one, reducing the number of requests to the origin server. You can also configure a shared cache to always validate a stored responses with your API server with ETag headers, potentially saving bandwidth and complex computations. The [Mozilla HTTP Caching] guide is a great resource to learn more, also how to handle localized content with the Vary header.

Modern CDNs sometimes provide advanced caching features like tag-based invalidation with cache tags or surrogate keys. Basically, an origin can add tags to its responses with an HTTP header, and later, specific keys can be invalidated via a CDN-specific API. Cloudflare and Fastly state their caches can be purged within 150ms worldwide.

For APIs, this method has two issues. Imagine a user in Sydney makes a change to a resource, the server is in europe, and issues a purge request to the CDN. If the user immediately requests the resource again, the CDN server in Sydney might not have received the purge request yet and the user received outdated data. Additionally, access control cannot be enforced with this method; hence, we need a more application-aware solution.

Edge Computing

This is where the biggest innovation in API caching is happening. By running code at the Edge (e.g., in Serverless Functions like Cloudflare Workers), we can implement:

  • Distributed Authentication: Custom functions on the Edge can validate a JWT signature and check permissions before serving from the cache. This can even allow caching and serving authenticated data.
  • Tag-Based Invalidation: We directly issue a purge request of the modified resource at the datacenter close to end-users, so they can immediately see their own changes.

At Aproxymade, we want to assist developers in using these methods and provide two core abstractions:

  1. Dynamic Cache Scopes. Basically, a user defines part of the JWT token as scope; ideally that part contains permissions like "admin": true or more fine-granular "read-access": ["tenant-42"], and we shard the cache based on these scopes. The result is that users with the same permissions access the same cache and if they do not have the permission, nothing is cached and it is a miss; hence, the request is forwarded to the origin for detailed error handling.
  2. Entity-based tagging. Inspired by the JSON:API standard, a REST API response is interpreted to contain entities of a certain type and id. The Edge Worker can then tag the resource with these values on GET requests and invalidate affected entites on PATCH, DELETE, etc. requests.

This abstraction has another benefit, we can use AI to automatically derive the configuration from user traffic.

Layer 3: The Gateway

Closer to your backend, we encounter the (optinoal) Gateway layer.

Reverse Proxies (Nginx / Varnish)

These are often used for protecting the application server from traffic spikes, load-balancing, and TLS termination. They can apply caching based on standard HTTP headers inteded for shared caches. While powerful, configuring complex caching rules here often involves writing obscure configuration files.

API Gateways (Kong / Apisix)

In microservice architectures, API Gateways often handle policies like rate-limiting, versioning, and authorization. While they can cache, it is often policy-based (e.g., “Cache this route for 10 seconds”). They lack the deep application awareness required for complex invalidation strategies, leading developers to often set very short cache times. In more complex architectures, the API gateways

Layer 4: The Application Server

Once a request hits the server, the goal of caching shifts from saving bandwidth to saving CPU and Database I/O.

  • Object Caching: We can use key-value stores like Redis or Memcached to save the result of expensive database queries or calculations.
  • Framework Support: Tools like Spring Boot (@Cacheable) or similar annotations in NestJS make this easy to implement.
  • Scope: Unlike the other layers, this cache is part of your backend-infrastructure. It doesn’t help with network latency, but it is essential for scaling the database.

Conclusion: Abstracting the Complexity with Aproxymade

Normally, you would start implementing caching on the client and the server, but if this is not enough, it get complex. Building a globally distributed caching strategy can be challenging. You not only need to implement it but also monitor its effectiveness.

Aproxymade is designed to solve this by providing Managed API-Edge-Caching as a Service.

Instead of manually stitching together these layers, Aproxymade acts as a smart, application-aware shield in front of your API. It abstracts the complexity of:

  • Cache Monitoring: Ensuring that your cache is working as expected and can be scaled accordingly.
  • Tag-Based Invalidation: Allowing you to cache dynamic content for days, not seconds, with instant purging when data changes.
  • Security Integration: Ensuring that cached content respects your authentication rules.

By offloading the network complexity to a specialized service, you can keep your API implementation simple, focus on your core product, and let us deal with caching nuances.

Share on: LinkedIn X (Twitter)

Join Our Pilot Program

Be among the first to revolutionize REST API performance and observability with Aproxymade

Apply