Re: rt20 patch question
From: Mark Hounschell
Date: Wed May 10 2006 - 11:34:27 EST
Steven Rostedt wrote:
> On Wed, 10 May 2006, Mark Hounschell wrote:
>
>> Steven Rostedt wrote:
>>> (It is expected on LKML to not touch the CC list, and especially keep the
>>> one you are replying to)
>>>
>> Ok. I'm on so many it's hard to remember what each want.
>
> :) I've read that in other lists it's impolite to CC others. I still do
> it :} I find that, espically if I'm on lots of lists, if I'm on a thread,
> I prefer to be emailed directly, that way I know about a topic that I
> might need to quickly respond to. I never pay attention to policies
> abount stripping CC lists, because I don't ever want to be stripped from a
> thread I'm interested in. The LKML has 300 to 700 emails a day, so you
> really do need to CC those, otherwise you'll be lost in the noise.
>
>
> [snip]
>
>> Thank you. That is exactly what I wanted to know. I ask because when I
>> run my app in complete preemption mode I have random periods where the
>> machine stops for many seconds at a time. Only in complete preemption
>> mode does this happen. In Voluntary and Preempt modes this does not
>> occure. I'm having a hard time trying to determine if the problem is in
>> my application.
>>
>
> OK, now you got my attention. What do you mean by your machine stops?
>
> Are you playing with priorities? You might want to turn on latency
> tracing, although it could be a PI leak. But I really need to know more,
> since I'm suspecting that your app isn't written properly to work with a
> true RT environment.
>
> RT means that you can easily freeze the machine if you have a high prio
> task that runs more than you expect it to. With this power comes great
> responsibility, as well as understanding.
>
> Is this SMP or UP?
>
> Could you explain you app a little and what tasks are RT?
>
> Thanks,
>
> -- Steve
>
>
Ok, I'll try to explain the application. It is an emulation of some old
legacy hardware (SEL-32) that ran a proprietary RTOS (MPX-32). We
emulate the hardware not the software. We have some specialized pci
cards that emulate some of that hardware. IE, a card that has some
timers and external interrupt capabilities (RTOM). All our drivers are
GPL BTW.
This app can only run in an SMP environment BTW. And the more power the
better.
Anyway the legacy CPU is the main thread/process and each legacy I/O
device, whether virtual or real, is emulated or driven by one or more
other threads. The most important part of this as far as Real-Time is
concerned is the determinism/latancy in the deliverance of interrupts
(external or timer) to the main CPU thread from the RTOM card.
Determinism of I/O operations is also important however. We achive the
best results by using both process and irq affinity.
The CPU thread/process and the irq of the RTOM pci card are bound to a
single processor and all other 'user' processes and irqs are forced off
that processor onto the other processor. The apps I/O threads are bound
to that other processor also.
The CPU process does not relinquish his processor. He is in a loop
fetching and executing legacy machine language instructions and only
comes out of that loop upon receiving an interrupt from the RTOM card or
some I/O completion event from one of the I/O threads. You know, kind of
like a real CPU would do.
The CPU process/thread in the past ran at FIFO prio 99 and the I/O
threads at lower FIFO or RR priorities. With rt20 all our priorities are
now set below the range used for hardirqs.
All this has worked well in the past as long as we control what else is
run on the system. We want to be able to use the machine for other
things and still have some reasonable determinism in the application. So
we are looking the the rt patch. We have some other tools that give us
an fairly good indication as to the determinism of any given box and can
see the rt patch in complete preempt mode does in fact make a difference.
So to my problem. What I mean by "the machine stops" is just that all
indications of the mouse, keyboard, and vidio stop. Then in a few
seconds will usually continue. At first I only saw problems when using
ethernet in the emulation. I would telnet into the emulation from the
linux box and do the equivalent of cat'ing a very large file. The
machine will always "stop" somewhere randomly along the display. Then
maybe continue on or maybe not. So I thought I might have a problem with
my ethernet module. Then I noticed similar things with the SCSI module
when accessing legacy scsi devices from within the emulation. Somtimes
the whole machine doesn't stop. It would appear that only somethings
have stopped. Like one or more of my I/O threads??
I can only say for sure that I do not have these "stops" when running
any other kernel or when running the rt20 kernel in any of the
non-complete preemption modes.
The only change that had to be made to this app for it to run at all on
the rt20 kernel was insuring that the RTOM irq thread was at a higher
priority than the CPU process/thread. Otherwise no signals were received
from the RTOM.
Mark
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/