Re: [RESEND RFC PATCH 0/3] Provide fast access to thread specific data

From: Prakash Sangappa
Date: Tue Sep 14 2021 - 13:25:03 EST




> On Sep 9, 2021, at 10:42 AM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> On Wed, Sep 8, 2021 at 5:02 PM Prakash Sangappa
> <prakash.sangappa@xxxxxxxxxx> wrote:
>>
>> Resending RFC. This patchset is not final. I am looking for feedback on
>> this proposal to share thread specific data for us in latency sensitive
>> codepath.
>>
>> (patchset based on v5.14-rc7)
>>
>> Cover letter previously sent:
>> ----------------------------
>>
>> Some applications, like a Databases require reading thread specific stats
>> frequently from the kernel in latency sensitive codepath. The overhead of
>> reading stats from kernel using system call affects performance.
>> One use case is reading thread's scheduler stats from /proc schedstat file
>> (/proc/pid/schedstat) to collect time spent by a thread executing on the
>> cpu(sum_exec_runtime), time blocked waiting on runq(run_delay). These
>> scheduler stats, read several times per transaction in latency-sensitive
>> codepath, are used to measure time taken by DB operations.
>>
>> This patch proposes to introduce a mechanism for kernel to share thread
>> stats thru a per thread shared structure shared between userspace and
>> kernel. The per thread shared structure is allocated on a page shared
>> mapped between user space and kernel, which will provide a way for fast
>> communication between user and kernel. Kernel publishes stats in this
>> shared structure. Application thread can read from it in user space
>> without requiring system calls.
>
>
> Can these use cases be addressed by creating a perf event
> (perf_event_open) and mmapping it?


As I understand these will be sampling based used for profiling? So will not
be suitable for the use case we are looking at.

Also, it would require every thread to open and create perf event mappings.
Not sure how well this would scale given the requirement Is for use by a large
number of threads.

The proposal here is to provision a per thread shared memory space(between
userspace and kernel) with a small memory footprint as it needs to be allocated
from pinned memory, to share thread specific info, like per thread sched stats
here. This should have low overhead and scale better.