Re: [PATCH 0/3] minitty: a minimal TTY layer alternative for embedded systems

From: Nicolas Pitre
Date: Fri Mar 24 2017 - 13:50:02 EST


On Fri, 24 Mar 2017, Greg Kroah-Hartman wrote:

> On Fri, Mar 24, 2017 at 08:31:45AM -0400, Nicolas Pitre wrote:
> > That's the crux of the argument: touching the current TTY layer is NOT
> > going to help keeping it stable. Here, not only I did remove features,
> > but the ones I kept were reimplemented to be much smaller and
> > potentially less scalable and performant too. The ultimate goal here is
> > to have the smallest code possible with very simple locking and not
> > necessarily the most scalable code. That in itself is contradictory with
> > the regular TTY code and warrants a separate implementation. And because
> > it is so small, it is much easier to understand and much easier to
> > maintain.
>
> So, what you are really saying here is "the current tty layer is too
> messy, too complex, too big, and not understandable, so I'm going to
> route around it by rewriting the whole thing just for my single-use-case
> because I don't want to touch it."

That's not exactly what I'm saying.

Yes, the current TTY code is big. It has to, given that it is extremely
flexible, it can scale up and still be robust, and it covers a large
amount of use cases. Because of those characteristics, it fundamentally
cannot be made small. You just can't have it all.

I'm not saying that the current code is not understandable. I spent
considerable amount of my time understanding it, first and foremost to
get to know what I'm talking about, and find ways to shrink its memory
footprint initially. It is certainly complex because of the flexibility
and robustness it provides. My code most likely wouldn't perform as well
in the presence of multiple high-throughput channels for example. But
that's not my concern.

I'm concerned about small embedded systems where 85% of that code is
useless. In some cases the ability to change baudrate is also unneeded
so I intend to make that part configurable too.

But in the end there is simply no way I could achieve the same footprint
reduction with the existing code. This is clearly impossible.

For example, my code perform line discipline handling in the very same
buffer where the RX interrupt is storing new data. The existing TTY code
has up to 3 buffering layers because of the needed modularisation to
support swappable line discipline modules, etc. It is simply
unreasonable to expect that the later can be turned into the former
without either breaking things or severely restricting its scope.

Let's be honest here: the existing code _could_ possibly be reduced of
course. That would require a lot of efforts to gain 50% reduction maybe?
What I'm looking at with my proposal here is a 6x reduction factor and
I'm still not done with it. There is no way I could do that with the
existing code.

Let me give you some background as to what my fundamental motivation is,
and then maybe you'll understand why I'm doing this.

What is the biggest buzzword in the IT industry right now? It is IOT.

Most IOT targets are so small that people are rewriting new operating
systems from scratch for them. Lots of fragmentation already exists.
We're talking about systems with less than one megabyte of RAM,
sometimes much less. Still, those things are being connected to the
internet. And this is going to be a total security nightmare.

I wish to be able to leverage the Linux ecosystem for as much of the IOT
space as possible to avoid the worst of those nightmares. The Linux
ecosystem has a *lot* of knowledgeable people around it, a lot of
testing infrastructure and tooling available already, etc. If a
security issue turns up on Linux, it has a greater chance of being
caught early, or fixed quickly otherwise, and finding people with the
right knowledge is easier on Linux than it could be on any RTOS out
there. Still with me so far?

Yes we have tools that can automatically reduce the kernel size. We can
use LTO with the compiler, etc. LTO is pretty good already. It can
typically reduce the kernel size by 20%. If all system calls are
disabled except for a few ones, then LTO can get rid of another 20%.
The minimal kernel I get is still 400-500 KB in size. That's still too
big. Part of the size is this 60 KB of TTY + serial driver code just to
send some debugging messages out or do simple shell interactions! Now
with this mini TTY and one of the existing UART driver I'm down to 20
KB and there is still room for more reduction.

There is also this 120 KB of VFS code that is always there even though
there is no real filesystem at all configured in the kernel. There is
that other 100 KB of core driver support code despite the fact that the
set of drivers I'm using are very simple and basic. Etc.

For Linux to be suitable, it has to be small, damn small. My target is
256 KB of RAM. And if you look at the kind of application those 256 KB
systems are doing, it's basically one main task typically acquiring
sensor data and sending it in some crypted protocol over a wireless
network on the internet, and possibly accepting commands back. So what
do you need from the OS to achieve that? A few system calls, a minimal
scheduler, minimal memory management, minimal filesystem structure and
minimal network stack. And your user app.

So, why not having each of those blocks be created using the existing
Linux syscall interface and internal API? At that point, it should be
possible to take your standard full-featured Linux workstation and
develop your user app on it, run it there using all the existing native
debugging tools, etc. Also, it should be possible to swap some of those
kernel blocks for the tiny alternative in your kernel config and still
be able to boot such a kernel on your PC workstation and validate them
there, test them with the existing fuzers, etc. That's what I have here
with this mini TTY implementation. In the end you just take the mini
version of everything for the final target and you're done. And you
don't have to learn a whole new development environment and program
model, etc.

I hope you'd agree with me that for such a goal, I cannot just try to
shrink the existing code. There has to be a parallel implementation of
some blocks alongside the main one that preserves the existing API but
that provides much less scalability and fewer features. Next on my list
would be a cache-less, completely serialized VFS alternative that has
only what's needed to make the link between the read/write syscalls, a
filesystem driver and a block driver. And by being really small, the
maintenance cost of a parallel implementation isn't very high, certainly
much less than trying to maintain a single version that can scale to
both extremes.

Hence this series, which I hope could be the beginning of a trend for
allowing Linux into the largest computing device deployment to come.


Nicolas