Re: [PATCH 17/18] 2.6.17.9 perfmon2 patch for review: modified x86_64 files
From: Stephane Eranian
Date: Thu Aug 24 2006 - 05:11:42 EST
Andi,
On Wed, Aug 23, 2006 at 12:09:25PM +0200, Andi Kleen wrote:
> Stephane Eranian <eranian@xxxxxxxxxxxxxxxxx> writes:
>
> In general this stuff would be much easier to review if you
> really split it into logical pieces: this means not modified/new,
> but one patch doing one thing. Then the hooks could be reviewed
> together with the code.
>
Yes, I think that would be nice but it is very hard to generate
such patches from the source tree that I have now. I would have to
manually edit the new/mod patches to group things based on
functionalities.
> > @@ -934,6 +935,7 @@ void setup_threshold_lvt(unsigned long l
> > void smp_local_timer_interrupt(struct pt_regs *regs)
> > {
> > profile_tick(CPU_PROFILING, regs);
> > + pfm_handle_switch_timeout();
>
> It is still unclear why you can't use an ordinary add_timer() ?
>
The hook is used to decrement a timeout value used for event set switching.
Set switching is upported for both per-thread and system-wide contexts. For
per-thread, the timeout must be "saved/restored" when the thread is context
switched. The timeout must be handled in the context of the monitored thread.
I am not sure add_timer() is a good fit for this. The add_timer looks good but
del_timer() does not as for an active timer, it would need to return the
leftover duration so it can be reactivated via a new add_timer() on context
switch in.
> > #include <asm/atomic.h>
> > @@ -589,6 +590,8 @@ void __init init_IRQ(void)
> > /* IPI vectors for APIC spurious and error interrupts */
> > set_intr_gate(SPURIOUS_APIC_VECTOR, spurious_interrupt);
> > set_intr_gate(ERROR_APIC_VECTOR, error_interrupt);
> > +
> > + pfm_vector_init();
>
> I think I would prefer those to be expanded here so it's all in one place.
>
Done.
> > put_cpu();
> > }
> > + pfm_exit_thread(me);
> > }
> >
> > void flush_thread(void)
> > @@ -462,6 +464,8 @@ int copy_thread(int nr, unsigned long cl
> > asm("mov %%es,%0" : "=m" (p->thread.es));
> > asm("mov %%ds,%0" : "=m" (p->thread.ds));
> >
> > + pfm_copy_thread(p);
> > +
>
> AFAIK there was some work in -mm* for generic notifiers for exit/copy hooks. Can those
> be used?
>
I have not yet looked at the -mm* tree.
>
> > if (unlikely(test_tsk_thread_flag(me, TIF_IO_BITMAP))) {
> > p->thread.io_bitmap_ptr = kmalloc(IO_BITMAP_BYTES, GFP_KERNEL);
> > if (!p->thread.io_bitmap_ptr) {
> > @@ -532,6 +536,10 @@ static noinline void __switch_to_xtra(st
> > */
> > memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
> > }
> > +
> > + if (test_tsk_thread_flag(next_p, TIF_PERFMON)
> > + || test_tsk_thread_flag(prev_p, TIF_PERFMON))
> > + pfm_ctxsw(prev_p, next_p);
> > }
> >
> > /*
> > @@ -620,13 +628,12 @@ __switch_to(struct task_struct *prev_p,
> > unlazy_fpu(prev_p);
> > write_pda(kernelstack,
> > task_stack_page(next_p) + THREAD_SIZE - PDA_STACKOFFSET);
> > -
> > - /*
> > - * Now maybe reload the debug registers and handle I/O bitmaps
> > - */
> > - if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW))
> > - || test_tsk_thread_flag(prev_p, TIF_IO_BITMAP))
> > - __switch_to_xtra(prev_p, next_p, tss);
> > + /*
> > + * Now maybe reload the debug registers and handle I/O bitmaps
> > + */
> > + if (unlikely((task_thread_info(next_p)->flags & _TIF_WORK_CTXSW)
> > + || (task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW)))
> > + __switch_to_xtra(prev_p, next_p, tss);
>
>
> This should be a separate patch for once (creating _TIF_WORK_CTXSW)
The _TIF_WORK_CTXSW is already in a separate patch which you have accepted
into your tree if I recall. It was part of the TIF_DEBUG/TIF_IO_BITMAP patch.
Unless you are repeating the first point you have at the top of this message
about group by functionality.
> >
> > return prev_p;
> > }
> > diff -urp linux-2.6.17.9.base/arch/x86_64/kernel/signal.c linux-2.6.17.9/arch/x86_64/kernel/signal.c
> > --- linux-2.6.17.9.base/arch/x86_64/kernel/signal.c 2006-08-18 09:26:24.000000000 -0700
> > +++ linux-2.6.17.9/arch/x86_64/kernel/signal.c 2006-08-21 03:37:46.000000000 -0700
> > @@ -24,6 +24,7 @@
> > #include <linux/stddef.h>
> > #include <linux/personality.h>
> > #include <linux/compiler.h>
> > +#include <linux/perfmon.h>
> > #include <asm/ucontext.h>
> > #include <asm/uaccess.h>
> > #include <asm/i387.h>
> > @@ -493,6 +494,8 @@ void do_notify_resume(struct pt_regs *re
> > clear_thread_flag(TIF_SINGLESTEP);
> > }
> >
> > + pfm_handle_work(current);
>
> At least the if () should be expanded, and most likely it wants
> merging with other flags too similar to the context switch (if (any work) { check which work })
> In separate patches again please.
>
The situation is a little bit more complicated here. The pfm_handle_work()
can be invoked for one of 3 reasons:
- BLOCK: we want to block the thread before it returns to user on its
way back from a PMU interrupt. Why is this useful? As a consequence
of a counter overflow, the sampling buffer may be full at which
point monitoring stops and a user level notification is generated.
In this situation and when a tool is monitoring another thread, it
may be interesting to block that thread to avoid blind spots. This
is never systematic and must be requested. It only works when
non-self monitoring. By default, the monitored thread keeps on
running with monitoring turned off. Depending on the measurement
and workload, you may want to block to avoid blind spot or you want
to keep caches/TLB warm.
- RESET: we need to reset the counters. This is needed when monitoring
another thread. You cannot remotely update the PMU of a thread unless
it is OFF any CPU.
- ZOMBIE: Monitoring must be stoped and resources must be freed. This is the
case when a tool is monitoring another thread and the tool dies
first. At that time, the context is still attached to the monitored
thread. We cannot easily and synchronously stopped the other
thread on the exit path of the tool. Instead we use the
TIF_NOTIFY_RESUME mechanism to force the monitored thread to cleanup
itself.
to get to pfm_handle_work(), we set TIF_NOTIFY_RESUME. Once in pfm_handle_work()
with the context properly locked, we check the reason for coming here. To mimic,
what we do with TIF flags in __switch_to(). I would have to add 3 new TIF flags.
The TIF_PERFMON flag means something different. When you come to notify_resume()
for a signal in a monitored thread, you may not need to go into pfm_handle_work().
But what is sure, is that if you do not have TIF_PERFMON set you never need to
get into pfm_handle_work(). So one thing I could do if to check for TIF_PERFMON
to miinize the number of useless calls to pfm_handle_work().
>
> > +
> > /* deal with pending signal delivery */
> > if (thread_info_flags & _TIF_SIGPENDING)
> > do_signal(regs,oldset);
> > Only in linux-2.6.17.9/arch/x86_64: perfmon
> > diff -urp linux-2.6.17.9.base/include/asm-x86_64/hw_irq.h linux-2.6.17.9/include/asm-x86_64/hw_irq.h
> > --- linux-2.6.17.9.base/include/asm-x86_64/hw_irq.h 2006-08-18 09:26:24.000000000 -0700
> > +++ linux-2.6.17.9/include/asm-x86_64/hw_irq.h 2006-08-21 03:37:46.000000000 -0700
> > @@ -67,6 +67,7 @@ struct hw_interrupt_type;
> > * sources per level' errata.
> > */
> > #define LOCAL_TIMER_VECTOR 0xef
> > +#define LOCAL_PERFMON_VECTOR 0xee
> >
> > /*
> > * First APIC vector available to drivers: (vectors 0x30-0xee)
> > @@ -74,7 +75,7 @@ struct hw_interrupt_type;
> > * levels. (0x80 is the syscall vector)
> > */
> > #define FIRST_DEVICE_VECTOR 0x31
> > -#define FIRST_SYSTEM_VECTOR 0xef /* duplicated in irq.h */
> > +#define FIRST_SYSTEM_VECTOR 0xee /* duplicated in irq.h */
>
> We still had one up free iirc so there is no need to move the system vectors.
>
Done.
Thanks for your useful feedback. I am slowly going through all your messages.
--
-Stephane
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/