Re: [PATCH v2 0/8] scheduler tinification
From: Nicolas Pitre
Date: Wed Jun 07 2017 - 13:09:46 EST
On Wed, 7 Jun 2017, Ingo Molnar wrote:
>
> * Nicolas Pitre <nicolas.pitre@xxxxxxxxxx> wrote:
>
> > Many embedded systems don't need the full scheduler support. Most of the
> > time, user space is tightly controlled and many of the scheduler facilities
> > are simply unused.
>
> Sorry, NAK:
>
> > 23 files changed, 3190 insertions(+), 2897 deletions(-)
>
> That's a lot of extra code plus churn for a code base that is already pretty
> #ifdef heavy.
>
> Also, the savings are marginal, even with significant functionality disabled:
>
> > text data bss dec hex filename
> > 28623 3404 128 32155 7d9b kernel/sched/built-in.o
> >
> > With this series and dl and rt classes disabled:
> >
> > text data bss dec hex filename
> > 20734 3334 40 24108 5e2c kernel/sched/built-in.o
>
> With 1GHz + 1GB RAM SoCs being well below $10 in bulk we worry about code
> complexity, predictability, testability, behavioral and ABI uniformity a lot more
> than about the last 10-20k of kernel text footprint...
>
> So I think the 'tiny' efforts are fundamentally misguided and are shooting for an
> ever shrinking market of RAM/ROM starved products whose share is shrinking every
> month.
I'm rather seeing the opposite: an ever growing market of
internet-connected coin-cell-battery-powered tiny devices where the
amount of RAM is counted in kilobytes rather than megabytes.
Let me repeat some background as to what my fundamental motivation is,
and then maybe you'll understand why I'm doing this.
What is the biggest buzzword in the IT industry besides AI right now?
It is IOT.
Most IOT targets are so small that people are rewriting new operating
systems from scratch for them. Lots of fragmentation already exists.
We're talking about systems with less than one megabyte of RAM,
sometimes much less. Still, those things are being connected to the
internet. And this is going to be a total security nightmare.
I wish to be able to leverage the Linux ecosystem for as much of the IOT
space as possible to avoid the worst of those nightmares. The Linux
ecosystem has a *lot* of knowledgeable people around it, a lot of
testing infrastructure and tooling available already, etc. If a
security issue turns up on Linux, it has a greater chance of being
caught early, or fixed quickly otherwise, and finding people with the
right knowledge is easier on Linux than it could be on any RTOS out
there. Still with me so far?
Yes we have tools that can automatically reduce the kernel size. We can
use LTO with the compiler, etc. LTO is pretty good already. It can
typically reduce the kernel size by 20%. If all system calls are
disabled except for a few ones, then LTO can get rid of another 20%. The
minimal kernel I get is still 400-500 KB in size. That's still too big.
There is this 120 KB of VFS code that is always there even though there
is no real filesystem at all configured in the kernel. There is that
other 100 KB of core driver support code despite the fact that the set
of drivers I'm using are very simple and make no use of most of that
core driver code. Etc.
There comes a point where there is no option but to explicitly trim out
parts of the kernel as such decisions cannot be automated, hence this
patch series. Bringing the scheduler under 20KB in size is therefore
very useful in that context. Alternatively I could push for a parallel
implementation as I did with the TTY layer where I obtained a 6x size
reduction. But in the scheduler case I obtained only a 2x size reduction
so I thought it could be more profitable to get about the same saving by
reworking the existing code instead., and eventually contributing a very
bare scheduler class that would be a smaller alternative to the fair
scheduler for deployments where that makes sense. Unless you actually
changed your mind about alternative whole scheduler implementations that
is...
For Linux to be suitable for small IoT, it has to be small, damn small.
My target is 256 KB of RAM. And if you look at the kind of application
those 256-KB systems are doing, it's basically one main task typically
acquiring sensor data and sending it in some crypted protocol over a
wireless network on the internet, and possibly accepting commands back.
So what do you need from the OS to achieve that? A few system calls, a
minimal scheduler, minimal memory management, minimal filesystem
structure and minimal network stack. And your user app.
So, why not having each of those blocks be created using the existing
Linux syscall interface and internal API? At that point, it should be
possible to take your standard full-featured Linux workstation and
develop your user app on it, run it there using all the existing native
debugging tools, etc. In the end you just pick the mini version of
everything for the final target and you're done. And you don't have to
learn a whole new OS, development environment and program model, etc.
Next on my list would be a cache-less, completely serialized VFS bypass
that has only what's needed to make the link between the read/write
syscalls, a filesystem driver and a block driver while preserving the
existing kernel APIs. And by being really small, the maintenance cost of
a "parallel" implementation isn't very high, certainly much less than
trying to maintain a single code path that can scale to both extremes
in that case.
PS: As far as I remember, Linus didn't condemn the idea last time I
brought up this topic in his presence. I therefore hope we could
find ways for allowing Linux usage into the largest computing device
deployment to come.
Nicolas