Skip to content

Fixing awful.spawn read_lines closure leak#4105

Open
soderstrom-rikard wants to merge 1 commit into
awesomeWM:masterfrom
soderstrom-rikard:bugfx/awful-spawn-read_lines-closure-leak
Open

Fixing awful.spawn read_lines closure leak#4105
soderstrom-rikard wants to merge 1 commit into
awesomeWM:masterfrom
soderstrom-rikard:bugfx/awful-spawn-read_lines-closure-leak

Conversation

@soderstrom-rikard

Copy link
Copy Markdown

Affects: awesome v4.3 and likely all prior versions
Tested on: awesome 4.3-6, lua-lgi 0.9.2-14, Lua 5.3.6

PROBLEM

awful.spawn.read_lines() has a memory leak caused by a mutual closure reference
cycle between start_read and finish_read that survives multiple Lua GC cycles.

OBSERVED SYMPTOM
awesome memory grows from ~400 MB at startup to 4+ GB over 1-2 days when any
widget using awful.widget.watch (or anything that calls easy_async on a timer)
is in use.

Forcing two GC cycles from awesome-client confirms the leak is entirely in
Lua-tracked objects, not native memory:

awesome-client 'return string.format("%.1f MB", collectgarbage("count")/1024)'
-- returns e.g. "2429.8 MB"

awesome-client 'collectgarbage("collect"); collectgarbage("collect"); \
  return string.format("%.1f MB", collectgarbage("count")/1024)'
-- returns e.g. "336.3 MB"  (2.1 GB freed)

ROOT CAUSE

In spawn.read_lines(), start_read and finish_read are declared as upvalues and
then close over each other:

local start_read, finish_read
start_read = function()
    stream:read_line_async(GLib.PRIORITY_DEFAULT, nil, finish_read)
end
finish_read = function(obj, res)
    ...
    start_read()   -- closes over start_read
end

This creates a mutual reference cycle:
finish_read <--> start_read (each is an upvalue of the other)

Additionally, LGI's ffi closure for finish_read (created by read_line_async)
holds a Lua registry reference (target_ref) to finish_read. This reference is
released only after:

Step 1: LGI's autodestroy guard is finalized by Lua GC
(guard was created in closure_callback when the callback fired)
Step 2: Lua GC detects finish_read is now unreachable

Because GI_SCOPE_TYPE_ASYNC callbacks always use LGI's autodestroy guard
mechanism (see lgi/marshal.c:marshal_2c_callable), there is an unavoidable
one-cycle delay between the callback firing and the Lua reference being
released. After that, the mutual cycle between start_read and finish_read
requires ANOTHER cycle to detect as unreachable.

In practice this means the closures, their upvalues (stream, line_callback,
done_callback, accumulated stdout strings etc.) survive 3+ GC cycles per
easy_async call. With awful.widget.watch running at 1 Hz (volume widget
default), this accumulates ~50-200 KB of Lua objects per second faster than
Lua's incremental GC (with default gcpause=200) reclaims them, resulting in
~180 MB per hour or 4.32 GB per day of retained memory.

The ffi closures themselves (via ffi_closure_alloc) are NOT tracked by Lua's
allocator, so growing ffi memory does not increase GC pressure, making the
problem worse.

FIX

Break all upvalue references inside done() before calling done_callback. This
eliminates both the start_read/finish_read mutual cycle and all other upvalue
chains (stream, line_callback, done_callback) in a single step. After this:

  • LGI's guard finalizes -> releases target_ref -> finish_read unreachable
  • finish_read is collected in the very next GC cycle (no cycle detection
    needed since the mutual reference was already broken)
  • stream, line_callback, done_callback follow in the same sweep

This reduces from 3+ GC cycles to 2, and eliminates the mutual-cycle detection
overhead entirely.

SUGGESTED GC TUNING (companion change, rc.lua or early in awesome config)

Even with this patch, Lua's default GC settings (gcpause=200, meaning GC starts
a new cycle when memory reaches 200% of the post-collection size) allow large
amounts of short-lived objects to accumulate before collection. For a desktop
WM with frequent async spawns, more aggressive settings are advisable:

-- In rc.lua, near the top:
collectgarbage("setpause",   110)  -- start new cycle at 110% (near-continuous)
collectgarbage("setstepmul", 400)  -- GC does 4x work per allocation step

This is independent of the spawn.lua fix and provides a safety margin for any
other sources of short-lived object accumulation.

awful.spawn.read_lines() has a memory leak caused by a mutual closure reference
cycle between start_read and finish_read that survives multiple Lua GC cycles.

This happens because GI_SCOPE_TYPE_ASYNC callbacks always use LGI's autodestroy
guard mechanism (see lgi/marshal.c:marshal_2c_callable), there is an unavoidable
one-cycle delay between the callback firing and the Lua reference being
released. After that, the mutual cycle between start_read and finish_read
requires ANOTHER cycle to detect as unreachable.

In practice this means the closures, their upvalues (stream, line_callback,
done_callback, accumulated stdout strings etc.) survive 3+ GC cycles per
easy_async call. With awful.widget.watch running at 1 Hz (volume widget
default), this accumulates ~50-200 KB of Lua objects per second faster than
Lua's incremental GC (with default gcpause=200) reclaims them, resulting in
~180 MB per hour or 4.32 GB per day of leaked memory.

The ffi closures themselves (via ffi_closure_alloc) are NOT tracked by Lua's
allocator, so growing ffi memory does not increase GC pressure, making the
problem worse.

This patch fixes this by breaking the start_read <-> finish_read mutual
cycle. This is achieved by clearing all upvalues and making local copy
of the done_callback (for use in the protected_call).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant