Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode

From: Liang, Kan
Date: Mon Oct 29 2018 - 14:20:21 EST

On 10/29/2018 1:48 PM, David Miller wrote:
From: "Liang, Kan" <kan.liang@xxxxxxxxxxxxxxx>
Date: Mon, 29 Oct 2018 13:42:56 -0400

On 10/29/2018 1:40 PM, David Miller wrote:
From: "Liang, Kan" <kan.liang@xxxxxxxxxxxxxxx>
Date: Mon, 29 Oct 2018 10:33:06 -0400

I just realized that the problem in KNL will be back if we switch
back to non-overwrite mode.
What is KNL?

Intel Xeon Phi Processor, Knights Landing.

I don't understand how a specific piece of hardware directly leads to
ring buffer processing timeouts, or multi-minute thread map processing

Perf top processes all samples in a serial way. With the number of CPU increasing under the heavy load, the number of samples increase dramatically. The processing time also increase significantly.
When the processing time is longer than display refresh time, only the stale data is shown.

I use KNL as an example. Because the problem is even worse on KNL. There is nothing output with perf top.

In theory, it's a problem for all large scale platforms.

You'll have to explain all of the details of your test scenerio, and
the exact problems triggers, which

My test was the same as yours, just running a parallel kernel build on KNL.

caused you to write these patches
which causes serious regressions for what I consider a core simple use
case of perf top.

I agree that the warning message is annoying. I will try to find another way to deliver the message. But I think we do need the warning message.

You didn't see any warning before the patch. I think it is just because perf top hides the problem.


And that's running perf top during a parallel kernel build.