rt-app: Add memrun event with read, write, and chase workload types#147
rt-app: Add memrun event with read, write, and chase workload types#147sarav wants to merge 2 commits intoscheduler-tools:masterfrom
Conversation
| add_cgroups(); | ||
|
|
||
| /* Initialize mem_chase pointer chains for all resources */ | ||
| for (i = 0; i < opts.resources->nresources; i++) { |
There was a problem hiding this comment.
Why this is not part of init_resources ?
There was a problem hiding this comment.
we want to keep all resources init in same place
There was a problem hiding this comment.
Yes this should be moved. But should I move this to parse_task_event_data or does it have to be init_resources_data()? Problem with the latter is that we don't have the event details (buf size, stride, pattern, etc) in there.
So I either have to put it in parse... or pass these args (union of per resource args) into init_resource_data().
Which approach do you prefer?
There was a problem hiding this comment.
Passing the union of per resource arg into init_resource_data is the cleanest solution IMO
|
Also I forgot to mention that you should update the doc/tutorial.txt with the description of the new events |
Add a new "memrun" event that supports three memory workload types
beyond the existing "mem" (memset) and "iorun" (write syscall):
- "write": byte-at-a-time volatile stores over a sized buffer
- "read": byte-at-a-time volatile reads over a sized buffer
- "chase": pointer-chasing over a linked list for memory latency
measurement, inspired by lmbench's lat_mem_rd
The read and write implementations use nested loops to avoid modulo
overhead — the outer loop counts full buffer passes, the inner loop
does sequential access. This gives symmetric read/write performance
(unlike the legacy "mem" event which uses vectorized memset).
The chase workload supports configurable buffer size, stride, and
pattern. The "random" pattern uses bit-reversed pointer ordering
(from lmbench) to defeat hardware prefetchers, while "sequential"
uses a simple stride-based chain. Buffer size controls which cache
level is stressed.
JSON syntax:
"memrun": { "type": "chase", "size": 262144, "stride": 64,
"pattern": "random", "count": 50000000 }
"memrun": { "type": "read", "size": 10485760, "count": 10485760 }
"memrun": { "type": "write", "size": 10485760, "count": 10485760 }
The existing "mem" integer event is unchanged for backward
compatibility.
Verified on ARM64 with simpleperf showing clear differentiation
across workload types in instruction and bus-access stats.
Signed-off-by: Saravana Kannan <saravanak@kernel.org>
Add POSIX semaphore-based synchronization events. Unlike suspend/resume
and signal/wait which are stateless (wakeups are lost if the target
thread isn't blocked yet), semaphores accumulate posts — a sem_post
increments the counter and a sem_wait decrements it or blocks if zero.
This enables reliable cross-thread wakeup patterns where the poster
may run ahead of the waiter, such as pipeline stages where an upstream
thread can complete multiple frames before a preempted downstream
thread gets to run.
JSON syntax:
"sem_post": "sem_name" // increment, never blocks
"sem_wait": "sem_name" // decrement or block until > 0
Both sem_post and sem_wait events sharing the same name reference
the same underlying semaphore, initialized to 0 at startup.
Verified on ARM64 with two test cases:
- Basic producer/consumer with synchronized handoff
- Early-post test where waker posts 5 times before sleeper starts
(3s delay), confirming sleeper consumes accumulated posts
without blocking
Signed-off-by: Saravana Kannan <saravanak@kernel.org>
|
Found a bunch of bugs in the previous implementation when I tested esoteric combinations of these memrun events. Fixed them all.
|
Add a new "memrun" event that supports three memory workload types beyond the existing "mem" (memset) and "iorun" (write syscall):
The read and write implementations use nested loops to avoid modulo overhead — the outer loop counts full buffer passes, the inner loop does sequential access. This gives symmetric read/write performance (unlike the legacy "mem" event which uses vectorized memset).
The chase workload supports configurable buffer size, stride, and pattern. The "random" pattern uses bit-reversed pointer ordering (from lmbench) to defeat hardware prefetchers, while "sequential" uses a simple stride-based chain. Buffer size controls which cache level is stressed.
JSON syntax:
"memrun": { "type": "chase", "size": 262144, "stride": 64,
"pattern": "random", "count": 50000000 }
"memrun": { "type": "read", "size": 10485760, "count": 10485760 }
"memrun": { "type": "write", "size": 10485760, "count": 10485760 }
The existing "mem" integer event is unchanged for backward compatibility.
Verified on ARM64 with simpleperf showing clear differentiation across workload types in instruction and bus-access stats.