Skip to content

Varnish provided VCL causes high Transient memory usage #40641

@dooblem

Description

@dooblem

Preconditions and environment

  • Full Page Cache enabled, with Varnish
  • Magento version : all versions

On several of our production environments, we experienced memory problems with Varnish.
Varnish Transient memory will increase, and eventually take all of the server memory until it is killed.

Limiting the Transient storage can be done, but we read that it's not recommended. When the Transient storage is full there can be some stability issues. See here for basic info on Transient Storage.

We think that limiting the Transient storage is still better than having Varnish be OOM killed by Linux, but only in case you are able to carefully monitor the Transient usage. If you do not monitor it you may experience unexpected errors without knowing it. At least when the server memory increase we have a system probe alerting us.

We propose a patch to help fix this issue.

Steps to reproduce

We were able to reproduce the Transient memory increase using the following.

You will need a Magento running with a varnish. You can use a docker stack for that.

Requests put in the transient cache are requests returned from magento with Cache-Control: no-store, no-cache, must-revalidate, max-age=0.

In order to help reproduce the Varnish VCL problem more quickly, we cheated a bit, adding this on top of pub/index.php :

header("Cache-Control: no-store, no-cache, must-revalidate, max-age=0");

for ($i = 1; $i <= 20; $i++) {
  header("x-test-$i: ". str_repeat("A", 1500));
}

echo "mytest";
die();

Then, from the Varnish server, run the following, in order to launch queries endlessly (100 in parallel):

while true; do seq 1 10000 | xargs -n 1 -P 100 -I {} curl -s -o /dev/null localhost/$RANDOM; done

Open another window, and watch the Transient memory (SMA.Transient.g_bytes) increase:

watch 'varnishstat -1|grep -i transient'

You can also take a look at htop and see the memory increase after a few minutes.

Expected result

The Transient memory should stay stable, and not exhaust system resources.

Varnishd should keep a reasonable memory footprint.

Actual result

Transient memory increases endlessly, and varnishd memory as well.

At some point Varnish will be killed by Linux, and possibly cause a production outage.

Additional information

Here is our patch bellow.

Without the patch, pass objects are kept for around 3 days.

After applying the patch, you will still see the Transient increase, but after 2 minutes (the ttl of 120s), the objects will expire and the Transient storage will stay at the same level.

--- varnish6.vcl	2026-04-02 16:13:02.309521560 +0200
+++ varnish6_grace_patch.vcl	2026-04-02 16:18:06.289198734 +0200
@@ -148,6 +148,11 @@
 
 sub vcl_backend_response {
 
+    # grace is set to 3d: this is ok for cached object, but bad for uncacheable objects
+    # pass requests will be stored in the transient storage, for a time of ttl+grace+keep
+    # when uncacheable, grace is useless for pass object
+    # bellow we set grace = 0 when uncacheable
+
     set beresp.grace = 3d;
 
     if (beresp.http.content-type ~ "text") {
@@ -165,7 +170,10 @@
     # cache only successfully responses and 404s that are not marked as private
     if ((beresp.status != 200 && beresp.status != 404) || beresp.http.Cache-Control ~ "private") {
         set beresp.uncacheable = true;
-        set beresp.ttl = 86400s;
+        set beresp.grace = 0s
+        #set beresp.ttl = 86400s;
+        # why 24h ? reduce it to 4h
+        set beresp.ttl = 4h;
         return (deliver);
     }
 
@@ -182,6 +190,7 @@
          && beresp.http.set-cookie ~ "X-Magento-Vary=") {
            set beresp.ttl = 0s;
            set beresp.uncacheable = true;
+           set beresp.grace = 0s
         }
         unset beresp.http.set-cookie;
     }
@@ -195,12 +204,14 @@
         # Mark as Hit-For-Pass for the next 2 minutes
         set beresp.ttl = 120s;
         set beresp.uncacheable = true;
+        set beresp.grace = 0s
     }
 
     # If the cache key in the Magento response doesn't match the one that was sent in the request, don't cache under the request's key
     if (bereq.url ~ "/graphql" && bereq.http.X-Magento-Cache-Id && bereq.http.X-Magento-Cache-Id != beresp.http.X-Magento-Cache-Id) {
         set beresp.ttl = 0s;
         set beresp.uncacheable = true;
+        set beresp.grace = 0s
     }
 
     return (deliver);

We are also changing a ttl from 24 to 4h, for some objects.

Release note

No response

Triage and priority

  • Severity: S0 - Affects critical data or functionality and leaves users without workaround.
  • Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
  • Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
  • Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
  • Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Ready for Confirmation

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions