Re: RT patch acceptance

From: Elladan
Date: Tue May 31 2005 - 14:16:45 EST


On Mon, May 30, 2005 at 07:32:10PM -0400, James Bruce wrote:
> Nick Piggin wrote:
> >Sorry James, we were talking about hard realtime. Read the thread.
>
> hard realtime = mathematically provable maximum latency
>
> Yes, you'll want a nanokernel for that, you're right. That's because
> one has to analyze every line of code, and protect against introduced
> regressions, which is almost impossible given the pace that Linux-proper
> is developed. Then there's the other 95% of applications, for which a
> "statistical RT" approach such as used in the RT patch suffice. So
> arguing for a nanokernel for (provable) hard realtime is orthogonal to
> the discussion of this patch, and we apparently don't actually disagree.

In the real world, this isn't really possible. Ideally, you'd like to be able
to offer some proof of correctness for the software, but this isn't actually
going to get you provable maximum latency, because you can't prove the
hardware.

Even with perfect software, the hardware is subject to cosmic rays, bad design,
etc. Even if you strongly control the hardware for latency, eg. turn off
cache and try to make sure everything is measurable, in the end the real proof
that your device does what it says it does is measurement. If the RTOS
guarantees aren't violated during testing, or at best, in a time period
comparable to the failure rate of the hardware, that's "good enough."

Given that hardware is always subject to failure or flakiness, the more
practical distinction between "hard" and "soft" realtime is whether the failure
rate is measurable or is lost in the noise of other failure modes such as
hardware. "Soft" RT typically means that the failure rate is measurable but
may be sufficient for particular tasks, and in comparison "hard" means the
software is thought to be correct within your ability to measure.

Certainly there's a lot of value for some applications in trying to control the
software well enough that all the latencies can be understood and characterized
by inspection, but on any sort of consumer commodity hardware system this is
really not going to buy you much. There are so many potential latencies just
due to wacky hardware that even a "perfect" RTOS is going to be subject to all
sorts of weird latencies and bizarre issues eg. with interrupt routing and CPU
thermal control and the like.

Showing that the application works as intended is really just going to be a
matter of showing that on a particular system, the latency requirements are met
under load. Which is exactly a sort of statistical approach. For almost all
"PC" applications that need realtime, this is exactly what's desired.

And clearly, the ultimate test of any RT system is exactly a "statistical" test
- can it be measured to fail, and if so, why and how often?

For limited embedded applications, a "hard" nanokernel approach can certainly
lead to higher confidence that the device works as intended, but for anything
outside of embedded products it's really not very practical. Nobody's going to
run their desktop OS under a nanokernel just to make their DVD software work
right.

-J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/