Re: [PATCH RFC/RFB] x86_64, i386: interrupt dispatch changes
From: Alexander van Heukelum
Date: Tue Nov 04 2008 - 10:47:53 EST
On Tue, 4 Nov 2008 18:07:29 +0300, "Cyrill Gorcunov"
<gorcunov@xxxxxxxxx> said:
> [Alexander van Heukelum - Tue, Nov 04, 2008 at 01:28:39PM +0100]
> | Hi all,
> |
> | An x86 processor handles an interrupt (from an external
> | source, software generated or due to an exception),
> | depending on the contents if the IDT. Normally the IDT
> | contains mostly interrupt gates. Linux points each
> | interrupt gate to a unique function. Some are specific
> | to some task (handling traps, IPI's, ...), the others
> | are stubs that push the interrupt number to the stack
> | and jump to 'common_interrupt'.
> |
> | This patch removes the need for the stubs.
> |
> | An interrupt gate contains a FAR pointer to the interrupt
> | handler, meaning that the code segment of the interrupt
> | handler is also reloaded. Instead of pointing each (non-
> | specific) interrupt gate to a unique handler, we set a
> | unique code segment and use a common handler. When the
> | handler finishes the code segment is restored to the
> | 'normal'/previous one.
> |
> | In order to have a unique code segment for each interrupt
> | vector, the GDT is extended to 512 entries (1 full page),
> | and the last half of the page describes identical code
> | segments (identical, except for the number in the cs
> | register), which are refered to by the 256 interrupt
> | gates.
> |
> | In this version, even the specialized handlers get run
> | with their code segment switched. This is not necessary,
> | but I like the fact that in a register dump one can now
> | see from the code segment that the code is ran due to
> | a (hard) interrupt. The exception I made is the int 0x80
> | (syscall), which runs with the normal kernel code segment.
> |
> |
> | Concluding: changing interrupt handling to this way
> | removes quite a bit of source code. It also removes the
> | need for the interrupt stubs and, on i386, pointers to
> | them. This saves a few kilobytes of code. The page
> | reserved for the GDT is now fully used. The cs register
> | indicating directly that code is executed on behalf of
> | a (hardware) interrupt is a nice debugging aid. This way
> | of handling interrupts also leads to cleaner code: this
> | patch already gets rid of some 'ugly' macro magic in
> | entry_32.S and irqinit_64.c.
> |
> | More cleanup is certainly possible, but I have tried to
> | keep the changes local and small. If switching code
> | segments is too expensive for some paths, that can be
> | fixed by not doing that ;).
> |
> | I'ld welcome some numbers on a few benchmarks on real
> | hardware (I only tested on qemu: debian runs without
> | noticable differences before/after this patch).
> |
> | Greetings,
> | Alexander
> |
> | P.S. Just in case someone thinks this is a great idea and
> | testing and benchmarking goes well...
> |
> ...
>
> Hi Alexander, great done!
>
> not taking into account the cost of cs reading (which I
> don't suspect to be that expensive apart from writting,
> on the other hand I guess walking on GDT entries could
> be not that cheap especially with new segments you propose,
> I guess cpu internally check for segment to be the same
> and do not reload it again even if it's described as FAR
> pointer but I could be wrong so Andi CC'ed :)
Thanks! And indeed Andi might know more about this.
I wonder how the time needed for reading the GDT segments
balances against the time needed due to the extra redirection
due to running the stubs. I'ld be interested if the difference
can be measured with the current implementation. (I really
need to highjack a machine to do some measurements; I hoped
someone would do it before I got to it ;) )
Even if some CPU's have some internal optimization for the case
where the gate segment is the same as the current one, I wonder
if it is really important... Interrupts that occur while the
processor is running userspace already cause changing segments.
They are more likely to be in cache, maybe.
Greetings,
Alexander
> A small nit in implementation:
>
> entry_32.S:
> + push %eax
> + push %eax
> + mov %cs,%eax
> + shr $3,%eax
> + and $0xff,%eax
> + not %eax
> + mov %eax,4(%esp)
> + pop %eax
>
> CFI_ADJUST_CFA_OFFSET missed?
Sure, I did just enough to make it work for me ;).
> - Cyrill -
--
Alexander van Heukelum
heukelum@xxxxxxxxxxx
--
http://www.fastmail.fm - IMAP accessible web-mail
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/