Re: [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation
From: Alexey Budankov
Date: Wed Oct 14 2020 - 08:16:11 EST
On 13.10.2020 19:20, Arnaldo Carvalho de Melo wrote:
> Em Mon, Oct 12, 2020 at 11:50:29AM +0300, Alexey Budankov escreveu:
>>
>> Patch set provides threaded trace streaming for base perf record
>> operation. Provided streaming mode (--threads) mitigates profiling
>> data losses and resolves scalability issues of serial and asynchronous
>> (--aio) trace streaming modes on multicore server systems. The patch
>> set is based on the prototype [1], [2] and the most closely relates
>> to mode 3) "mode that creates thread for every monitored memory map".
>>
>> The threaded mode executes one-to-one mapping of trace streaming threads
>> to mapped data buffers and streaming into per-CPU trace files located
>> at data directory. The data buffers and threads are affined to NUMA
>> nodes and monitored CPUs according to system topology. --cpu option
>> can be used to specify exact CPUs to be monitored.
>>
>> Basic analysis of data directories is provided for perf report mode.
>> Raw dump (-D) and aggregated reports are available for data directories,
>> still with no memory consumption optimizations. However data directories
>> collected with --compression-level option enabled can be analyzed with
>> little less memory because trace files are unmaped from tool process
>> memory after loading collected data.
>>
>> Provided streaming mode is available with Zstd compression/decompression
>> (--compression-level) and handling of external commands (--control).
>> AUX area tracing, related and derived modes like --snapshot or
>> --aux-sample are not enabled. --switch-output, --switch-output-event,
>> --switch-max-files and --timestamp-filename options are not enabled.
>
> Would be interesting to spell out what are the difficulties to have
> those options working with this threaded mode, as I expect that once
> this is all reviewed and tested we should switch to it by default, no?
At the moment I am not sure about this as the default mode. It all depends
on specifics of HW configuration and workload to be monitored and analyzed.
On middle and small sized systems --aio could still fit better from HW/OS
resource consumption perspective.
Initial intent to enable AUX area tracing faced the need to define some
(optimal?) way to store index data at data directory, thus left aside of
this first step to bring threaded trace streaming into Perf tool.
--switch-output-* and --timestamp-filename use cases are not yet clear
for data directories and thus look like features of the second order.
Addressing all that issues in a single patch set looks too much. Proper
way to have it all in is step-by-step. Also I should say that it is aside
of the scope of current Intel VTune specific needs.
Alexei