-
Notifications
You must be signed in to change notification settings - Fork 9.4k
Varnish provided VCL causes high Transient memory usage #40641
Description
Preconditions and environment
- Full Page Cache enabled, with Varnish
- Magento version : all versions
On several of our production environments, we experienced memory problems with Varnish.
Varnish Transient memory will increase, and eventually take all of the server memory until it is killed.
Limiting the Transient storage can be done, but we read that it's not recommended. When the Transient storage is full there can be some stability issues. See here for basic info on Transient Storage.
We think that limiting the Transient storage is still better than having Varnish be OOM killed by Linux, but only in case you are able to carefully monitor the Transient usage. If you do not monitor it you may experience unexpected errors without knowing it. At least when the server memory increase we have a system probe alerting us.
We propose a patch to help fix this issue.
Steps to reproduce
We were able to reproduce the Transient memory increase using the following.
You will need a Magento running with a varnish. You can use a docker stack for that.
Requests put in the transient cache are requests returned from magento with Cache-Control: no-store, no-cache, must-revalidate, max-age=0.
In order to help reproduce the Varnish VCL problem more quickly, we cheated a bit, adding this on top of pub/index.php :
header("Cache-Control: no-store, no-cache, must-revalidate, max-age=0");
for ($i = 1; $i <= 20; $i++) {
header("x-test-$i: ". str_repeat("A", 1500));
}
echo "mytest";
die();
Then, from the Varnish server, run the following, in order to launch queries endlessly (100 in parallel):
while true; do seq 1 10000 | xargs -n 1 -P 100 -I {} curl -s -o /dev/null localhost/$RANDOM; done
Open another window, and watch the Transient memory (SMA.Transient.g_bytes) increase:
watch 'varnishstat -1|grep -i transient'
You can also take a look at htop and see the memory increase after a few minutes.
Expected result
The Transient memory should stay stable, and not exhaust system resources.
Varnishd should keep a reasonable memory footprint.
Actual result
Transient memory increases endlessly, and varnishd memory as well.
At some point Varnish will be killed by Linux, and possibly cause a production outage.
Additional information
Here is our patch bellow.
Without the patch, pass objects are kept for around 3 days.
After applying the patch, you will still see the Transient increase, but after 2 minutes (the ttl of 120s), the objects will expire and the Transient storage will stay at the same level.
--- varnish6.vcl 2026-04-02 16:13:02.309521560 +0200
+++ varnish6_grace_patch.vcl 2026-04-02 16:18:06.289198734 +0200
@@ -148,6 +148,11 @@
sub vcl_backend_response {
+ # grace is set to 3d: this is ok for cached object, but bad for uncacheable objects
+ # pass requests will be stored in the transient storage, for a time of ttl+grace+keep
+ # when uncacheable, grace is useless for pass object
+ # bellow we set grace = 0 when uncacheable
+
set beresp.grace = 3d;
if (beresp.http.content-type ~ "text") {
@@ -165,7 +170,10 @@
# cache only successfully responses and 404s that are not marked as private
if ((beresp.status != 200 && beresp.status != 404) || beresp.http.Cache-Control ~ "private") {
set beresp.uncacheable = true;
- set beresp.ttl = 86400s;
+ set beresp.grace = 0s
+ #set beresp.ttl = 86400s;
+ # why 24h ? reduce it to 4h
+ set beresp.ttl = 4h;
return (deliver);
}
@@ -182,6 +190,7 @@
&& beresp.http.set-cookie ~ "X-Magento-Vary=") {
set beresp.ttl = 0s;
set beresp.uncacheable = true;
+ set beresp.grace = 0s
}
unset beresp.http.set-cookie;
}
@@ -195,12 +204,14 @@
# Mark as Hit-For-Pass for the next 2 minutes
set beresp.ttl = 120s;
set beresp.uncacheable = true;
+ set beresp.grace = 0s
}
# If the cache key in the Magento response doesn't match the one that was sent in the request, don't cache under the request's key
if (bereq.url ~ "/graphql" && bereq.http.X-Magento-Cache-Id && bereq.http.X-Magento-Cache-Id != beresp.http.X-Magento-Cache-Id) {
set beresp.ttl = 0s;
set beresp.uncacheable = true;
+ set beresp.grace = 0s
}
return (deliver);
We are also changing a ttl from 24 to 4h, for some objects.
Release note
No response
Triage and priority
- Severity: S0 - Affects critical data or functionality and leaves users without workaround.
- Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
- Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
- Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
- Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.
Metadata
Metadata
Assignees
Type
Projects
Status