Re: [PATCHES/RFC] Re: A concern about overflow ring buffer mode
From: Arnaldo Carvalho de Melo
Date: Mon Oct 29 2018 - 14:33:02 EST
Em Mon, Oct 29, 2018 at 02:20:15PM -0400, Liang, Kan escreveu:
> On 10/29/2018 1:48 PM, David Miller wrote:
> > From: "Liang, Kan" <kan.liang@xxxxxxxxxxxxxxx>
> > Date: Mon, 29 Oct 2018 13:42:56 -0400
> > >
> > >
> > > On 10/29/2018 1:40 PM, David Miller wrote:
> > > > From: "Liang, Kan" <kan.liang@xxxxxxxxxxxxxxx>
> > > > Date: Mon, 29 Oct 2018 10:33:06 -0400
> > > >
> > > > > I just realized that the problem in KNL will be back if we switch
> > > > > back to non-overwrite mode.
> > > > What is KNL?
> > > >
> > > Intel Xeon Phi Processor, Knights Landing.
> > I don't understand how a specific piece of hardware directly leads to
> > ring buffer processing timeouts, or multi-minute thread map processing
> > times...
> Perf top processes all samples in a serial way. With the number of CPU
> increasing under the heavy load, the number of samples increase
> dramatically. The processing time also increase significantly.
> When the processing time is longer than display refresh time, only the stale
> data is shown.
> I use KNL as an example. Because the problem is even worse on KNL. There is
> nothing output with perf top.
> In theory, it's a problem for all large scale platforms.
> > You'll have to explain all of the details of your test scenerio, and
> > the exact problems triggers, which
> My test was the same as yours, just running a parallel kernel build on KNL.
> > caused you to write these patches
> > which causes serious regressions for what I consider a core simple use
> > case of perf top.
> I agree that the warning message is annoying. I will try to find another way
> to deliver the message. But I think we do need the warning message.
There is no problem with the message, the problem is the thread where
the message is being displayed, just signal the display thread to
display the warning, not doing that in the event processing thread.
> You didn't see any warning before the patch. I think it is just because perf
> top hides the problem.
> > And that's running perf top during a parallel kernel build.