Skip to content

rt-app: Add memrun event with read, write, and chase workload types#147

Open
sarav wants to merge 2 commits intoscheduler-tools:masterfrom
sarav:memset
Open

rt-app: Add memrun event with read, write, and chase workload types#147
sarav wants to merge 2 commits intoscheduler-tools:masterfrom
sarav:memset

Conversation

@sarav
Copy link
Copy Markdown

@sarav sarav commented Apr 13, 2026

Add a new "memrun" event that supports three memory workload types beyond the existing "mem" (memset) and "iorun" (write syscall):

  • "write": byte-at-a-time volatile stores over a sized buffer
  • "read": byte-at-a-time volatile reads over a sized buffer
  • "chase": pointer-chasing over a linked list for memory latency measurement, inspired by lmbench's lat_mem_rd

The read and write implementations use nested loops to avoid modulo overhead — the outer loop counts full buffer passes, the inner loop does sequential access. This gives symmetric read/write performance (unlike the legacy "mem" event which uses vectorized memset).

The chase workload supports configurable buffer size, stride, and pattern. The "random" pattern uses bit-reversed pointer ordering (from lmbench) to defeat hardware prefetchers, while "sequential" uses a simple stride-based chain. Buffer size controls which cache level is stressed.

JSON syntax:
"memrun": { "type": "chase", "size": 262144, "stride": 64,
"pattern": "random", "count": 50000000 }
"memrun": { "type": "read", "size": 10485760, "count": 10485760 }
"memrun": { "type": "write", "size": 10485760, "count": 10485760 }

The existing "mem" integer event is unchanged for backward compatibility.

Verified on ARM64 with simpleperf showing clear differentiation across workload types in instruction and bus-access stats.

Comment thread src/rt-app.c Outdated
add_cgroups();

/* Initialize mem_chase pointer chains for all resources */
for (i = 0; i < opts.resources->nresources; i++) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is not part of init_resources ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to keep all resources init in same place

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this should be moved. But should I move this to parse_task_event_data or does it have to be init_resources_data()? Problem with the latter is that we don't have the event details (buf size, stride, pattern, etc) in there.

So I either have to put it in parse... or pass these args (union of per resource args) into init_resource_data().

Which approach do you prefer?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the union of per resource arg into init_resource_data is the cleanest solution IMO

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let me do that

@vingu-linaro
Copy link
Copy Markdown
Member

Also I forgot to mention that you should update the doc/tutorial.txt with the description of the new events

Saravana Kannan added 2 commits April 24, 2026 10:50
Add a new "memrun" event that supports three memory workload types
beyond the existing "mem" (memset) and "iorun" (write syscall):

  - "write": byte-at-a-time volatile stores over a sized buffer
  - "read": byte-at-a-time volatile reads over a sized buffer
  - "chase": pointer-chasing over a linked list for memory latency
    measurement, inspired by lmbench's lat_mem_rd

The read and write implementations use nested loops to avoid modulo
overhead — the outer loop counts full buffer passes, the inner loop
does sequential access. This gives symmetric read/write performance
(unlike the legacy "mem" event which uses vectorized memset).

The chase workload supports configurable buffer size, stride, and
pattern. The "random" pattern uses bit-reversed pointer ordering
(from lmbench) to defeat hardware prefetchers, while "sequential"
uses a simple stride-based chain. Buffer size controls which cache
level is stressed.

JSON syntax:
  "memrun": { "type": "chase", "size": 262144, "stride": 64,
              "pattern": "random", "count": 50000000 }
  "memrun": { "type": "read", "size": 10485760, "count": 10485760 }
  "memrun": { "type": "write", "size": 10485760, "count": 10485760 }

The existing "mem" integer event is unchanged for backward
compatibility.

Verified on ARM64 with simpleperf showing clear differentiation
across workload types in instruction and bus-access stats.

Signed-off-by: Saravana Kannan <saravanak@kernel.org>
Add POSIX semaphore-based synchronization events. Unlike suspend/resume
and signal/wait which are stateless (wakeups are lost if the target
thread isn't blocked yet), semaphores accumulate posts — a sem_post
increments the counter and a sem_wait decrements it or blocks if zero.

This enables reliable cross-thread wakeup patterns where the poster
may run ahead of the waiter, such as pipeline stages where an upstream
thread can complete multiple frames before a preempted downstream
thread gets to run.

JSON syntax:
  "sem_post": "sem_name"   // increment, never blocks
  "sem_wait": "sem_name"   // decrement or block until > 0

Both sem_post and sem_wait events sharing the same name reference
the same underlying semaphore, initialized to 0 at startup.

Verified on ARM64 with two test cases:
  - Basic producer/consumer with synchronized handoff
  - Early-post test where waker posts 5 times before sleeper starts
    (3s delay), confirming sleeper consumes accumulated posts
    without blocking

Signed-off-by: Saravana Kannan <saravanak@kernel.org>
@sarav
Copy link
Copy Markdown
Author

sarav commented Apr 24, 2026

Found a bunch of bugs in the previous implementation when I tested esoteric combinations of these memrun events. Fixed them all.

  • Memory leaks in memrun read/write.
  • Wrong buffers when the names collide but the params are different.
  • Unnecessary math in memrun read slowing it down.
  • Refactored to addresses comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants