Re: [PATCHSET RFC] sched: Implement BPF extensible scheduler class

From: Barret Rhoden
Date: Wed Dec 14 2022 - 18:20:21 EST


On 12/14/22 17:23, Tejun Heo wrote:
Google guys probably have a lot to say here too and there may be many
commonalties, but here's how things are on our end.

your email pretty much captures my experiences from the google side. in fact, i think i'll save it for the next time someone asks me to summarize the challenges with both kernel rollouts and testing changes on workloads. =)

I was given to believe this was a fairly rapid process.

Going back to the first phase where we're experimenting in a more controlled
environment. Yes, that is a faster process but only in comparison to the
second phase. Some controlled experiments, the faster ones, usually take
several hours to obtain a meaningful result. It just takes a while for
production workloads to start, jit-compile all the hot code paths, warm up
caches and so on. Others, unfortunately, take a lot longer to ramp up to the
degree whether it can be compared against production numbers. Some of the
benchmarks stretch multiple days.

With SCX, we can keep just keep hotswapping and tuning the scheduler
behavior getting results in tens of minutes instead of multiple hours and
without worrying about crashing the test machines

for testing sched policies on one of our bigger apps, the O(hours) kernel reboot vs O(minutes) reload of a BPF scheduler is a pain. but that's only for a single machine; it can be much worse on a full cluster.

full-cluster tests are a different beast. we are one of many groups that want to do testing, and we have to reserve a time on their cluster. but to change the kernel, it actually took us weeks to coordinate an kernel change on the app's large testing cluster - essentially since we were using an unqualified kernel, we 'blocked' all of the other testing.

it's way easier and faster to have a running test environment setup and
iterate through scheduling behavior changes without worrying about crashing
the machine than having to cycle and re-setup test setup for each iteration.

i'm a newcomer to BPF, but for me the "interaction with live machine" is a major BPF feature, both in SCX and also more broadly with the various tracing tools and other BPF uses. (not to mention the per-workload or per-machine customization that BPF enables, but that's a separate discussion).

thanks,

barret