Re: [PATCH RFC] sched: Add a per-thread core scheduling interface
From: Joel Fernandes
Date: Thu May 21 2020 - 16:40:48 EST
Hi Linus,
On Thu, May 21, 2020 at 11:31:38AM -0700, Linus Torvalds wrote:
> On Wed, May 20, 2020 at 3:26 PM Joel Fernandes (Google)
> <joel@xxxxxxxxxxxxxxxxx> wrote:
> >
> > ChromeOS will use core-scheduling to securely enable hyperthreading.
> > This cuts down the keypress latency in Google docs from 150ms to 50ms
> > while improving the camera streaming frame rate by ~3%.
>
> I'm assuming this is "compared to SMT disabled"?
Yes this is compared to SMT disabled, I'll improve the commit message.
> What is the cost compared to "SMT enabled but no core scheduling"?
With SMT enabled and no core scheduling, it is around 40ms in the higher
percentiles. Also one more thing I wanted to mention, this is the 90th
percentile.
> But the real reason I'm piping up is that your latency benchmark
> sounds very cool.
>
> Generally throughput benchmarks are much easier to do, how do you do
> this latency benchmark, and is it perhaps something that could be run
> more widely (ie I'm thinking that if it's generic enough and stable
> enough to be run by some of the performance regression checking
> robots, it would be a much more interesting test-case than some of the
> ones they run right now...)
Glad you like it! The metric is calculated with a timestamp of when the
driver says the key was pressed, up until when the GPU says we've drawn
pixels in response.
The test requires a mostly only requires Chrome browser. It opens some
pre-existing test URLs (a google doc, a window that opens a camera stream and
another window that decodes video). This metric is already calculated in
Chrome, we just scrape it from
chrome://histograms/Event.Latency.EndToEnd.KeyPress. If you install Chrome,
you can goto this link and see the histogram. We open a Google docs window
and synthetically input keys into it with a camera stream and video decoding
running in other windows which gives the CPUs a good beating. Then we collect
roughly the 90th percentile keypress latency from the above histogram and the
camera and decoded video's FPS, among other things. There is a test in the
works that my colleagues are writing to run the full Google hangout video
chatting stack to stress the system more (versus just the camera stream). I
guess if the robots can somehow input keys into the Google docs and open the
right windows, then it is just a matter of scraping the histogram.
> I'm looking at that "threaded phoronix gzip performance regression"
> thread due to a totally unrelated scheduling change ("sched/fair:
> Rework load_balance()"), and then I see this thread and my reaction is
> "the keypress latency thing sounds like a much more interesting
> performance test than threaded gzip from clear linux".
>
> But the threaded gzip test is presumably trivial to script, while your
> latency test is perhaps very specific to one particular platform and
> setuip?
Yes it is specifically a ChromeOS running on a pixel book running a 7th Gen
Intel Core i7 with 4 hardware threads.
https://store.google.com/us/product/google_pixelbook
I could try to make it a synthetic test but it might be difficult for a robot
to run it if it does not have graphics support and a camera connected to it.
It would then need a fake/emulated camera connected to it. These robots run
Linux in a non-GUI environment in qemu instances right?
thanks,
- Joel