Re: [PATCH v5 07/10] perf record: implement -z,--compression_level=n option and compression

From: Alexey Budankov
Date: Thu Mar 07 2019 - 10:26:56 EST



On 07.03.2019 15:14, Jiri Olsa wrote:
> On Thu, Mar 07, 2019 at 11:39:46AM +0300, Alexey Budankov wrote:
>>
>> On 05.03.2019 15:25, Jiri Olsa wrote:
>>> On Fri, Mar 01, 2019 at 06:58:32PM +0300, Alexey Budankov wrote:
>>>
>>> SNIP
>>>
>>>>
>>>> /*
>>>> * Increment md->refcount to guard md->data[idx] buffer
>>>> @@ -350,7 +357,7 @@ int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx,
>>>> md->prev = head;
>>>> perf_mmap__consume(md);
>>>>
>>>> - rc = push(to, &md->aio.cblocks[idx], md->aio.data[idx], size0 + size, *off);
>>>> + rc = push(to, md->aio.data[idx], size0 + size, *off, &md->aio.cblocks[idx]);
>>>> if (!rc) {
>>>> *off += size0 + size;
>>>> } else {
>>>> @@ -556,13 +563,15 @@ int perf_mmap__read_init(struct perf_mmap *map)
>>>> }
>>>>
>>>> int perf_mmap__push(struct perf_mmap *md, void *to,
>>>> - int push(struct perf_mmap *map, void *to, void *buf, size_t size))
>>>> + int push(struct perf_mmap *map, void *to, void *buf, size_t size),
>>>> + perf_mmap__compress_fn_t compress, void *comp_data)
>>>> {
>>>> u64 head = perf_mmap__read_head(md);
>>>> unsigned char *data = md->base + page_size;
>>>> unsigned long size;
>>>> void *buf;
>>>> int rc = 0;
>>>> + size_t mmap_len = perf_mmap__mmap_len(md);
>>>>
>>>> rc = perf_mmap__read_init(md);
>>>> if (rc < 0)
>>>> @@ -574,7 +583,10 @@ int perf_mmap__push(struct perf_mmap *md, void *to,
>>>> buf = &data[md->start & md->mask];
>>>> size = md->mask + 1 - (md->start & md->mask);
>>>> md->start += size;
>>>> -
>>>> + if (compress) {
>>>> + size = compress(comp_data, md->data, mmap_len, buf, size);
>>>> + buf = md->data;
>>>> + }
>>>> if (push(md, to, buf, size) < 0) {
>>>> rc = -1;
>>>> goto out;
>>>
>>> when we discussed the compress callback should be another layer
>>> in perf_mmap__push I was thinking more of the layered/fifo design,
>>> like:
>>>
>>> normaly we call:
>>>
>>> perf_mmap__push(... push = record__pushfn ...)
>>> -> reads mmap data and calls push(data), which translates as:
>>>
>>> record__pushfn(data);
>>> - which stores the data
>>>
>>>
>>> for compressed it'd be:
>>>
>>> perf_mmap__push(... push = compressed_push ...)
>>>
>>> -> reads mmap data and calls push(data), which translates as:
>>>
>>> compressed_push(data)
>>> -> reads data, compresses them and calls, next push callback in line:
>>>
>>> record__pushfn(data)
>>> - which stores the data
>>>
>>>
>>> there'd need to be the logic for compressed_push to
>>> remember the 'next push' function
>>
>> That is suboptimal for AIO. Also compression is an independent operation that
>> could be applied on any of push stages you mean.
>
> not sure what you mean by suboptimal, but I think
> that it can still happen in subsequent push callback
>
>>
>>>
>>> but I think this was the original idea behind the
>>> perf_mmap__push -> it gets the data and pushes them for
>>> the next processing.. it should stay as simple as that
>>
>> Agree on keeping simplicity and, at the moment, there is no any push to the next
>> processing in the code so provided implementation fits as for serial as for AIO
>> at the same time sticking to simplicity as much as possibly. If you see something
>> that would fit better please speak up and share.
>
> I have to insist that perf_mmap__push stays untouched
> and we do other processing in the push callbacks

What is about perf_mmap__aio_push()?

Without compression it does
memcpy(), memcpy(), aio_push()

With compression its does
memcpy_with_compression(), memcpy_with_compression(), aio_push()

and deviation that increases amount of copy operations i.e. implementing three or more
is suboptimal in terms of runtime overhead and data loss decrease

Compression for serial streaming can be implemented in push() callback.
AIO case would go with compression over a parameter in aio_push().
So the both trace writing schemas could be optimally extended.

~Alexey

>
> jirka
>