Re: HZ=1000 & kernel profiling?

David S. Miller (davem@jenolan.rutgers.edu)
Sun, 19 Jan 1997 16:14:31 -0500


Date: Sun, 19 Jan 1997 21:29:17 +0100 (MET)
From: Ingo Molnar <mingo@pc5829.hil.siemens.at>

duh, this will rock.

Starting to sound nice right?

nice scheme. And softening cli/sti makes both profiling easier, and RT
Linux too ...

btw, when i coded up my cli/sti latency profiler, i had hard times
identifying all >real< cli/sti points. Assembly glue logic for example,
all the zillion trap handler entry points, scheduling code ... interrupt
handlers. Many places to watch for, and each platform needs fixes
unfortunately. Ah and those pesky traps from kernel space when accessing
user-space pointers ... fun stuff

Ingo, awaiting those patches eagerly =P

I'm hitting some stumbling blocks right now and some other things are
side-tracking this work a bit. So I'll brain dump what Linus and I
have come up with so far so perhaps others can thing about it and
perhaps come up with some ideas.

Essentially the two tiered scheme looks like:

save_flags(), cli(), sti(), and restore_flags() modify some global
kernel variables used to indicate across the system whether the kernel
is in a "cli() state" or not.

I'm convinced it can be done with just the intr_count variable we have
already with some atomic operations and clever encodings, Linus thinks
this is asking too much and some more state needs to be there to get
it all right. I plan to prove him wrong ;-)

When an interrupt _does_ come in (remember cli() doesn't touch the
real CPU interrupt enables etc.) the interrupt entry code does
something like:

irq_trap:
if(kernel_is_in_cli_state(&intr_count)) {
irqs_to_service |= (1 << irq_level);
regs.eflags &= ~(EFLAGS_IF);
iret();
}
normal_processing();

Ok, then restore_flags() and sti() do something like:

if(exit_cli_state(&intr_count))
if(irqs_to_service)
handler_pending_irqs();

The idea is that since we are doing this all in software we will see
many code streams that never take an intervening interrupt at all when
the cli/sti is performed, this also makes recursive cli()'s efficient
as well.

The observant will notice something important for the SMP case, and
the following is why I am trying to urge the Intel irq forwarding
stuff to be implemented so badly.

What should the irq_trap code above do if IRQ is handled on a
processor not in kernel mode? Clear interrupts and go back to user
mode? No way of course. If IRQ's are only sent to the kernel lock
holder (like on the Sparc) this case is dismissed entirely. I guess
on the Intel you could go:

if(in_cli_state) {
if(from_user) {
spin_until_cli_state_is_left();
goto rest;
}
}
rest:
normal_processing();

When you think about the implementation of all of this you need to
keep in your head that "a cli() is a cli() is a cli()". This just
means be careful to handle _all_ of the cases properly.

One idea was to make (intr_count == -1) mean "safely in cli() state",
cli()'s do atomic_dec_and_test(&intr_count), interrupt entry just does
atomic_inc(&intr_count) and atomic_dec(&intr_count) like normal.

So a cli()'er will wait until all currently being serviced interrupts
return, and they are the first to get intr_count to -1. Looks not too
bad so far.

Now here's a fun case that scheme doesn't handle. What does an
"interrupt handler" itself do to enter a cli()? Of course it doesn't
work, and I've tried to contrive all sorts of schemes where intr_count
is "normalized" when we are in an interrupt hander and this scheme
thus works in this scenerio. It became gross, and the issue now is
that it isn't good to add extra overhead to the software state code
for cli() to check "in interrupt handler?" It begins to defeat all
the performance gains this scheme is supposed to allow for.

Next idea Linus had at the Thai restaurant at Usenix was to have
logically two intr_counts. The normal one we have now and a "per cpu"
intr_count array. Each processor when it enters an interrupt goes:

intr_count++;
intr_count[cpuid]++;

Then cli() can check to see if both of those are equal before it
allows itself to enter a cli() section. Clever, but I still think
some cases still won't be handled properly. For instance, how does
the above work for save_flags() and restore_flags()?

The things to keep in mind for any proposed implementation of all of
this seems to be:

1) One thread of execution can enter into cli() state at once.

2) The current interface c-code uses must work as is no matter
how it is implemented, thus save_flags(), cli(), sti(),
and restore_flags() must "appear" to the code which uses
them now to do the same thing.

3) Things should work just as expected from within an
interrupt handler.

4) All of the global state to implement this must be operated
on atomically so it all works on SMP, as intended. This
means the atomic capabilities of the various architectures
must be considered when the implementation is designed, for
the purposes of doability and efficiency. (ie. don't pick
an atomic primitive that is only found on a few processors
or that takes 100 instructions to implement on a particular
processor ;-)

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s ////
ethernet. Beat that! ////
-----------------------------------------////__________ o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><