Re: [PATCH] perf record: Add snapshot mode support for perf's regular events

From: Adrian Hunter
Date: Wed Nov 25 2015 - 04:09:10 EST


On 25/11/15 10:43, Wangnan (F) wrote:
>
>
> On 2015/11/25 16:27, Adrian Hunter wrote:
>> On 25/11/15 09:47, Wangnan (F) wrote:
>>>
>>> On 2015/11/25 15:22, Adrian Hunter wrote:
>>>> On 25/11/15 05:50, Wangnan (F) wrote:
>>>>> On 2015/11/24 23:20, Arnaldo Carvalho de Melo wrote:
>>>>>> Em Tue, Nov 24, 2015 at 08:06:41AM -0700, David Ahern escreveu:
>>>>>>> On 11/24/15 7:00 AM, Yunlong Song wrote:
>>>>>>>> +static int record__write(struct record *rec, void *bf, size_t size)
>>>>>>>> +{
>>>>>>>> + if (rec->memory.size && memory_enabled) {
>>>>>>>> + if (perf_memory__write(&rec->memory, bf, size) < 0) {
>>>>>>>> + pr_err("failed to write memory data, error: %m\n");
>>>>>>>> + return -1;
>>>>>>>> + }
>>>>>>>> + } else {
>>>>>>>> + if (perf_data_file__write(rec->session->file, bf, size) < 0) {
>>>>>>>> + pr_err("failed to write perf data, error: %m\n");
>>>>>>>> + return -1;
>>>>>>>> + }
>>>>>>>> + rec->bytes_written += size;
>>>>>>>> }
>>>>>>>>
>>>>>>>> - rec->bytes_written += size;
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>> @@ -86,6 +214,8 @@ static int record__mmap_read(struct record *rec, int
>>>>>>>> idx)
>>>>>>>> if (old == head)
>>>>>>>> return 0;
>>>>>>>>
>>>>>>>> + memory_enabled = 1;
>>>>>>>> +
>>>>>>>> rec->samples++;
>>>>>>>>
>>>>>>>> size = head - old;
>>>>>>>> @@ -113,6 +243,7 @@ static int record__mmap_read(struct record *rec,
>>>>>>>> int
>>>>>>>> idx)
>>>>>>>> md->prev = old;
>>>>>>>> perf_evlist__mmap_consume(rec->evlist, idx);
>>>>>>>> out:
>>>>>>>> + memory_enabled = 0;
>>>>>>>> return rc;
>>>>>>>> }
>>>>>>>>
>>>>>>> So you are basically ignoring all samples until SIGUSR2 is received.
>>>>>>> That
>>>>>> No, he is not, its just that his code is difficult to follow, has to be
>>>>>> rewritten, but he is ignoring just PERF_RECORD_SAMPLE events, so it
>>>>>> will..
>>>>>>
>>>>>>> means the resulting data file will have limited history of task
>>>>>>> events for
>>>>>> ... have a complete history of task events, since PERF_RECORD_FORK, etc
>>>>>> are not being ignored.
>>>>>>
>>>>>> No?
>>>>> Actually we are discussing about this problem.
>>>>>
>>>>> For such tracking events (PERF_RECORD_FORK...), we have dummy event so
>>>>> it is possible for us to receive tracking events from a separated
>>>>> channel, therefore we don't have to parse every events to pick those
>>>>> events out. Instead, we can process tracking events differently, then
>>>>> more interesting things can be done. For example, squashing those tracking
>>>>> events if it takes too much memory...
>>>>>
>>>>> Furthermore, there's another problem being discussed: if userspace
>>>>> ringbuffer
>>>>> is bytes based, parsing event is unavoidable. Without parsing event we are
>>>>> unable to find the new 'head' pointer when overwriting.
>>>> Have you considered trying to find the head by trial-and-error at the time
>>>> you make the snapshot i.e. look at the first 8 bytes (event records are 8
>>>> byte aligned) and see if it is a valid record header, if not try the next 8
>>>> bytes. When you find a real event record it should parse without error and
>>>> the subsequent events should all parse without error too, all the way to
>>>> the
>>>> tail. Then you can use timestamps and compare the events byte-by-byte to
>>>> avoid overlaps between 2 snapshots.
>>> It seems not work. Now we have BPF output event, it is possible that a
>>> BPF program output anything through that event. Even if we have a magic
>>> in head of each event, we can't prevent BPF output event output that
>>> magic, except we introduce some 'escape' method to prevent BPF output
>>> event output some data pattern. So although might work in reallife,
>>> this solution is logically incorrect. Or am I miss someting?
>> When you find the head, all the events will parse correctly. It seems to me
>> highly unlikely that would happen if you guessed the head wrongly.
>> It is only incorrect if it gives the wrong result.
>
> Right, so I said it might work in reallife. However, I think we
> should better to try to provide some logically correct solution.
> Also, 'guessing' means some sort of intelligence, or how do we
> deal with guessing error? Simply drop them?

It is not "intelligence" it is a linear search. If it gives more than one
answer, it is a fatal error. You can mitigate that by adding more
validation of the event records.

But it is only a suggestion.

> And what's your opinion on the bucket besed ring buffer? With that
> design we only need to maintain a ringbuffer of pointers. It should
> be much simpler. The only drawback I can image is the waste of memory
> because we have to alloc buckets pessimistically. Do you think
> that method have other problem I haven't considered?

The drawback is that you have to copy all the events all the time instead of
letting the kernel ring buffer wraparound without any userspace involvement
until you make a snapshot.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/