Re: [BUG] perf: bogus correlation of kernel symbols

From: Ingo Molnar
Date: Fri May 20 2011 - 14:35:31 EST



* Dan Rosenberg <drosenberg@xxxxxxxxxxxxx> wrote:

> On Fri, 2011-05-20 at 15:11 +0200, Ingo Molnar wrote:
>
> > We need to allocate the IDT dynamically: just kmalloc() it, update idt_descr
> > and do a load_idt(). Double check places that modify idt_descr or use
> > idt_table.
> >
> > Note, you could do this as a side effect of a nice performance optimization:
> > would you be interested in allocating it in the percpu area, using
> > percpu_alloc()? That way the IDT is distributed between CPUs - this has
> > scalability advantages on NUMA systems and maybe even on SMP.
> >
>
> Any suggestions on when this allocation should take place? I'm hesitant to
> touch anything in arch/x86/kernel/head_32.S, where the IDT is setup and lidt
> idt_descr is called (on x86-32 anyway). That means at some point I'd have to
> copy the table into a region allocated with alloc_percpu() and set up a new
> descriptor. Seems like this should happen before IRQ is enabled, but I'm not
> sure about the best place.

I think there's a static percpu area that can be used pretty early on.

The boot IDT can be marked __initdata so its space wont be wasted.

The thing is, until SMP is not initialized the boot IDT can be kept. So i'd
suggest allocating per CPU IDTs after memory has initialized. For that a pretty
good place is trap_init(): there we already have the page allocator initialized
and probably the percpu allocator too. IDT allocation is also pretty naturally
done in trap_init().

> Also, I'd still welcome suggestions on generating entropy so early in the
> boot process as to randomize the location at which the kernel is
> decompressed.
>
> On a related note, would there be obstacles to marking the IDT as read-only?

The cost is that its access TLB may change from a 2MB TB to a 4K TLB. We
generally try to keep critical data structures in 2MB mapped areas.

But this is really hard to measure (you'd have to have a borderline workload
where the loss of a single 4K TLB is measurable) so i'd suggest splitting this
from the randomization step.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/