Re: [Patch 00/11] Hardware Breakpoint interfaces

From: K.Prasad
Date: Fri Mar 27 2009 - 18:07:51 EST


On Wed, Mar 25, 2009 at 03:48:31PM -0400, Alan Stern wrote:

Hi Alan,
Thanks for the thorough review. It has helped uncover bugs that might
have been discovered after much delay. Please find my responses inline.

> There are some serious issues involving userspace breakpoints and the
> legacy ptrace interface. It all comes down to this: At what point
> is a breakpoint registered for a ptrace caller?
>
> Remember, to set up a breakpoint a debugger needs to call ptrace
> twice: once to put the address in one of the DR0-DR3 registers and
> once to set up DR7. So when does the task own the breakpoint?
>
> Logically, we should wait until DR7 gets set, because until then the
> breakpoint is not active. But then how do we let the caller know that
> one of his breakpoints conflicts with a kernel breakpoint?
>
> If we report an error during an attempt to set DR0-DR3 then at least
> it's unambiguous. But then how do we know when the task is _finished_
> using the breakpoint? It's under no obligation to set the register
> back to 0.
>
> Related to this is the question of how to store the task's versions of
> DR0-DR3 when there is no associated active breakpoint. Maybe it would
> be best to keep the existing registers in the thread structure.
>

These are profound questions and I believe that it is upto the process in
user-space to answer them.

What we could ensure from the kernel-space is to retain the
existing behaviour of ptrace i.e. return success when a write is done on
DR0-DR3 (almost unconditionally) and reserve error returns only when DR7
is written into.

The patch in question could possibly return an -ENOMEM (even when write
is done on DR0-DR3) but I will change the behaviour as stated above.


A DR0 - DR3 return will do a:
thread->debugreg[n] = val;
return 0;

while all error returns are reserved for:
rc = ptrace_write_dr7(tsk, val);

> > +++ linux-2.6-tip/kernel/hw_breakpoint.c
> > @@ -0,0 +1,367 @@
> ...
> > +struct task_struct *last_debugged_task;
>
> Is this variable provided only for use by the hw_breakpoint_handler()
> routine, for detecting lazy debug-register switching? It won't work
> right on SMP systems. You need to use a per-CPU variable instead.
>

Thanks for pointing it out. Here's what it will be made:
DEFINE_PER_CPU(struct task_struct *, last_debugged_task);

That also re-introduces the put_cpu_no_sched() into
switch_to_thread_hw_breakpoint() function.

void switch_to_thread_hw_breakpoint(struct task_struct *tsk)
{
/* Set the debug register */
arch_install_thread_hw_breakpoint(tsk);
per_cpu(last_debugged_task, get_cpu()) = current;
put_cpu_no_resched();
}

> > +/*
> > + * Install the debug register values for just the kernel, no thread.
> > + */
> > +void switch_to_none_hw_breakpoint(void)
> > +{
> > + arch_install_none();
> > +}
>
> Even though "arch_install_none" was my own name, I don't like it very
> much. "arch_remove_user_hw_breakpoints" would be better.
>

How about arch_uninstall_thread_hw_breakpoint()? (Being the opposite
of arch_install_thread_hw_breakpoint()).

> > +/*
> > + * Erase all the hardware breakpoint info associated with a thread.
> > + *
> > + * If tsk != current then tsk must not be usable (for example, a
> > + * child being cleaned up from a failed fork).
> > + */
> > +void flush_thread_hw_breakpoint(struct task_struct *tsk)
> > +{
> > + int i;
> > + struct thread_struct *thread = &(tsk->thread);
> > +
> > + mutex_lock(&hw_breakpoint_mutex);
> > +
> > + /* The thread no longer has any breakpoints associated with it */
> > + clear_tsk_thread_flag(tsk, TIF_DEBUG);
> > + for (i = 0; i < HB_NUM; i++) {
> > + if (thread->hbp[i]) {
> > + hbp_user_refcount[i]--;
> > + kfree(thread->hbp[i]);
>
> Ugh! In general you shouldn't deallocate memory you didn't allocate
> originally. What will happen when there is a utrace interface in
> addition to the ptrace interface?
>

I can't see how I can invoke ptrace related code here to free memory
here, although I agree that __unregister_user_hw_breakpoint() code need not
mess with it.
I will retain the kfree() in flush_thread_hw_breakpoint(), but remove it move
it from the latter to ptrace related code.

> > + thread->hbp[i] = NULL;
> > + }
> > + }
> > + thread->hbp_num_installed = 0;
>
> This variable doesn't seem to serve any particularly useful purpose.
> Eliminate it.
>

Done.

> > +/*
> > + * Validate the settings in a hw_breakpoint structure.
> > + */
> > +static int validate_settings(struct hw_breakpoint *bp, struct task_struct *tsk)
> > +{
> > + int ret;
> > + unsigned int align;
> > +
> > + if (!bp)
> > + return -EINVAL;
> > +
> > + ret = arch_validate_hwbkpt_settings(bp, &align, tsk);
> > + if (ret < 0)
> > + goto err;
> > +
> > + /*
> > + * Check that the low-order bits of the address are appropriate
> > + * for the alignment implied by len.
> > + */
> > + if (bp->info.address & align)
> > + return -EINVAL;
>
> I sort of think this test belongs in the arch-specific code also.
> After all, some types of CPU might not have alignment constraints.
>

Moved. It also helps eliminate passing a parameter back and forth.

> > +/*
> > + * Actual implementation of unregister_user_hw_breakpoint.
> > + */
> > +void __unregister_user_hw_breakpoint(int pos, struct task_struct *tsk,
> > + struct hw_breakpoint *bp)
>
> What happened to unregister_user_hw_breakpoint? It doesn't seem to
> exist any more.
>

I thought it would be more convenient to introduce them after we have
virtualised debug registers. But thinking further, I think we can adopt
a simple 'first-fit' approach can be used to allocate debug registers
for user-space too. I will include a
(un)register_user_hw_breakpoint(task, hw_breakpoint)
in the next iteration and export them too.

> In general, will the caller know the value of pos? Probably not,
> unless the caller is ptrace. It shouldn't be one of the parameters.
>

Given that ptrace contains the debugreg number, I chose to use it. The
proposed interfaces (as discussed above) will help users who cannot
provide the info.

> > +{
> > + struct thread_struct *thread = &(tsk->thread);
> > +
> > + if (!bp)
> > + return;
> > +
> > + hbp_user_refcount[pos]--;
> > + thread->hbp_num_installed--;
> > +
> > + arch_unregister_user_hw_breakpoint(pos, bp, tsk);
> > +
> > + if (tsk == current)
> > + switch_to_thread_hw_breakpoint(tsk);
> > + kfree(tsk->thread.hbp[pos]);
>
> Once again, memory should be deallocated by the same module that
> allocated it.
>

Moved to ptrace_write_dr7() where __unregister_user_hw_breakpoint() is
invoked.

> > +/**
> > + * unregister_kernel_hw_breakpoint - unregister a HW breakpoint for kernel space
> > + * @bp: the breakpoint structure to unregister
> > + *
> > + * Uninstalls and unregisters @bp.
> > + */
> > +void unregister_kernel_hw_breakpoint(struct hw_breakpoint *bp)
> > +{
> > + int i, j;
> > +
> > + mutex_lock(&hw_breakpoint_mutex);
> > +
> > + /* Find the 'bp' in our list of breakpoints for kernel */
> > + for (i = hbp_kernel_pos; i < HB_NUM; i++)
> > + if (bp == hbp_kernel[i])
> > + break;
>
> If you would store the register number in the arch-specific part of
> struct hw_breakpoint then this loop wouldn't be needed.
>

It's just a tiny loop (theoretical max is HB_NUM). It could save a member in
'struct hw_breakpoint' - a significant saving if there are
multiple number of threads that use debug registers.

The code is already guilty of storing the address of the breakpoint
twice i.e. once in thread->hbp->info.address and again in
thread->debugreg[n]. Adding this would be another. What do you say?

> > + /*
> > + * We'll shift the breakpoints one-level above to compact if
> > + * unregistration creates a hole
> > + */
> > + if (i > hbp_kernel_pos)
> > + for (j = i; j == hbp_kernel_pos; j--)
> > + hbp_kernel[j] = hbp_kernel[j-1];
>
> The loop condition "j == hbp_kernel_pos" is wrong. It should be
> "j > hbp_kernel_pos". And making this change shows that the preceding
> "if" statement is redundant.
>

I missed it, will correct.

> > +
> > + /*
> > + * Delete the last kernel breakpoint entry after compaction and update
> > + * the pointer to the lowest numbered kernel entry after updating its
> > + * corresponding value in kdr7
> > + */
> > + hbp_kernel[hbp_kernel_pos] = 0;
> > + arch_unregister_kernel_hw_breakpoint();
>
> Even though it was part of my original design, there's no good reason
> for making arch_register_kernel_hw_breakpoint and
> arch_unregister_kernel_hw_breakpoint be separate functions. There
> should just be a single function: arch_update_kernel_hw_breakpoints.
> The same is true for arch_update_user_hw_breakpoints. In each case,
> all that is needed is to recalculate the DR7 mask and value.
>

This and a few other suggestions below can be taken only if we chose to
update all kernel related breakpoint registers, irrespective of the
change. In return for saving a few lines of code (+simpler code +
increased readability) we should take some runtime overhead during
(un)register_<> calls.

I'm not sure about the overhead of processing an IPI (which you've cited
as being much larger than the actual code being executed), but a little
reluctant to remove code that is tuned for more specific tasks. Consider
a large system where the number of CPUs is huge (say three digits or so),
and we want to install a breakpoint for the last register hb_num. It would
invoke a write on all hb_num-1 registers for 'n' CPUs. I'm not sure if it's
worthwhile for saving a few lines of code.

I can change it if you insist. Let me know what you think.

> > + hbp_kernel_pos++;
>
> And this should be moved up one line, so that the arch-specific code
> knows how many kernel breakpoints to register.
>

This is done after invoking arch_unregister_kernel_hw_breakpoint() just
so that the corresponding values in kdr7 are updated.

> > +static int __init init_hw_breakpoint(void)
> > +{
> > + int i;
> > +
> > + hbp_kernel_pos = HB_NUM;
> > + for (i = 0; i < HB_NUM; i++)
> > + hbp_user_refcount[i] = 0;
>
> This loop is unnecessary, since uninitialized static values are set to
> 0 in any case.
>
> > + load_debug_registers();
>
> Hmm, I suspect this line can safely be omitted.
>

Done.

> > +
> > + return register_die_notifier(&hw_breakpoint_exceptions_nb);
> > +}
>
>
> > +/*
> > + * Install the kernel breakpoints in their debug registers.
> > + * If 0 <= pos < HB_NUM, then set the debug register corresponding to that number
> > + * If 'pos' is negative, then all debug registers are updated
> > + */
> > +void arch_install_kernel_hw_breakpoint(void *idx)
>
> I don't like this design decision. Why not simply install all the
> kernel breakpoints every time? The extra effort would be invisible
> compared to the overhead of an IPI.
>

Response as above.

> > +{
> > + int pos = *(int *)idx;
> > + unsigned long dr7;
> > + int i;
> > +
> > + get_debugreg(dr7, 7);
> > +
> > + /* Don't allow debug exceptions while we update the registers */
> > + set_debugreg(0UL, 7);
> > +
> > + for (i = hbp_kernel_pos; i < HB_NUM; i++) {
> > + if ((pos >= 0) && (i != pos))
> > + continue;
> > + dr7 &= ~(dr7_masks[i]);
> > + if (hbp_kernel[i])
> > + set_debugreg(hbp_kernel[i]->info.address, i);
> > + }
>
> For example, this loop could be written more simply as follows:
>
> switch (hbp_kernel_pos) {
> case 0:
> set_debugreg(hbp_kernel[0]->info.address, 0);
> case 1:
> set_debugreg(hbp_kernel[1]->info.address, 1);
> ...
> }
>
With above code
i)it uses fewer lines of code ii)Although when coding is completely
done, hbp_kernel[i] cannot be NULL here, we have a check for the same
just in case. It helped me several times during the course of development
to have the check as above and prevent crashes.

> > +
> > + dr7 |= kdr7;
>
> Of course, you would also have to mask out the user bits from DR7.
> You could do something like this:
>
> dr7 = (dr7 & dr7_user_mask[hbp_kernel_pos]) | kdr7;
>
> where dr7_user_mask is a static array containing the five appropriate
> mask values.
>

These are functionally equivalent changes and hope you wouldn't mind if
I choose to skip them.

> > + /* No need to set DR6 */
> > + set_debugreg(dr7, 7);
> > +}
> > +
> > +void arch_load_debug_registers()
> > +{
> > + int pos = -1;
> > + /*
> > + * We want all debug registers to be initialised for this
> > + * CPU so pos = -1
> > + */
> > + arch_install_kernel_hw_breakpoint((void *)&pos);
> > +}
>
> If you follow my suggestion above then this routine isn't needed at
> all. Callers can invoke arch_install_kernel_hw_breakpoints instead.
>

Response as earlier.

> > +/*
> > + * Install the thread breakpoints in their debug registers.
> > + */
> > +void arch_install_thread_hw_breakpoint(struct task_struct *tsk)
> > +{
> > + int i;
> > + struct thread_struct *thread = &(tsk->thread);
> > +
> > + for (i = 0; (i < hbp_kernel_pos) && hbp_user_refcount[i]; i++)
> > + if (thread->hbp[i])
> > + set_debugreg(thread->hbp[i]->info.address, i);
>
> The loop condition is wrong. But since this routine is on the hot
> path we should avoid using a loop at all. In fact, if the DR0-DR3
> register values are added back into the thread structure, we could
> simply do this:
>
> switch (hbp_kernel_pos) {
> case 4:
> set_debugreg(thread->dr3, 3);
> case 3:
> set_debugreg(thread->dr2, 2);
> ...
> }
>

The above loop will now become (after inclusion of debug registers in
thread_struct), with fewer indirections.

for (i = 0; i < hbp_kernel_pos; i++)
if (thread->hbp[i])
set_debugreg(thread->debugreg[i], i);

It is better because i)contains fewer lines of code compared to a switch case
ii)doesn't write onto 'dont-care' debug registers iii)If considered an
overhead, the compiler can always unroll the loop for optimisation.

Will change if you insist.

> > +
> > + /* No need to set DR6 */
> > +
> > + set_debugreg((kdr7 | thread->dr7), 7);
> > +}
>
> > +/*
> > + * Check for virtual address in kernel space.
> > + */
> > +int arch_check_va_in_kernelspace(unsigned long va, u8 hbp_len)
> > +{
> > + unsigned int len;
> > +
> > + len = get_hbp_len(hbp_len);
> > +
> > + return ((va >= TASK_SIZE) && ((va + len) >= TASK_SIZE));
>
> In theory this should be (va + len - 1).
>

You mean check for?
return ((va >= TASK_SIZE) && ((va + (len - 1)) >= TASK_SIZE));

> > +}
> > +
> > +/*
> > + * Store a breakpoint's encoded address, length, and type.
> > + */
> > +void arch_store_info(struct hw_breakpoint *bp)
> > +{
> > + /*
> > + * User-space requests will always have the address field populated
> > + * For kernel-addresses, either the address or symbol name can be
> > + * specified.
> > + */
> > + if (bp->info.address)
> > + return;
> > + if (bp->info.name)
> > + bp->info.address = (unsigned long)
> > + kallsyms_lookup_name(bp->info.name);
> > +}
>
> I still think the address and name fields shouldn't be arch-specific.
> After all, won't _every_ arch need to have a copy of exactly this same
> function?
>

It is the thought of having breakpoints for I/O (which cannot possibly
have a name) and breakpoints over a range of addresses (unlike having
just one address field) that makes me think that these are best kept in
arch-specific structure.

If at a later point in time, they appear on all arch-specific structures
(that have an implementation), I will move the fields into generic
structure.

> > +/*
> > + * Modify an existing user breakpoint structure.
> > + */
> > +int arch_modify_user_hw_breakpoint(int pos, struct hw_breakpoint *bp,
> > + struct task_struct *tsk)
> > +{
> > + struct thread_struct *thread = &(tsk->thread);
> > +
> > + /* Check if the register to be modified was enabled by the thread */
> > + if (!(thread->dr7 & (1 << (pos * DR_ENABLE_SIZE))))
> > + return -EINVAL;
> > +
> > + thread->dr7 &= ~dr7_masks[pos];
> > + thread->dr7 |= encode_dr7(pos, bp->info.len, bp->info.type);
> > +
> > + return 0;
> > +}
>
> It might be possible to eliminate this rather awkward code, once the
> DR0-DR3 values are added back into the thread structure.
>

I'm sorry I don't see how. Can you explain?

> > +/*
> > + * Copy out the debug register information for a core dump.
> > + *
> > + * tsk must be equal to current.
> > + */
> > +void dump_thread_hw_breakpoint(struct task_struct *tsk, int u_debugreg[8])
> > +{
> > + struct thread_struct *thread = &(tsk->thread);
> > + int i;
> > +
> > + memset(u_debugreg, 0, sizeof u_debugreg);
> > + for (i = 0; i < thread->hbp_num_installed && thread->hbp[i]; ++i)
> > + u_debugreg[i] = thread->hbp[i]->info.address;
>
> The loop condition is wrong, since you don't compact userspace
> breakpoints. But it could be unrolled into:
>
> u_debugreg[0] = thread->dr0;
> ...
> u_debugreg[3] = thread->dr3;
>

I agree that some of the code in the patch were based on the assumption
that the registers by user-space users would be consumed in an
increasing fashion, but it should be changed.

The above code will become:
for (i = 0; i < HB_NUM; ++i)
if (thread->hbp[i])
u_debugreg[i] = thread->debugreg[i];

Also note that "unsinged long debugreg[HB_NUM]" is embedded in
thread_struct and not as shown below for using them in loops
conveniently.

unsigned long debugreg0;
unsigned long debugreg1;
...

> > + u_debugreg[7] = thread->dr7;
> > + u_debugreg[6] = thread->dr6;
> > +}
> > +
> > +/*
> > + * Handle debug exception notifications.
> > + */
> > +int __kprobes hw_breakpoint_handler(struct die_args *args)
> > +{
> > + int i, rc = NOTIFY_DONE;
> > + struct hw_breakpoint *bp;
> > + /* The DR6 value is stored in args->err */
> > + unsigned long dr7, dr6 = args->err;
>
> Please change this. (I should have changed it long ago, but I never
> got around to it.) Instead of passing the DR6 value in args->err,
> pass a pointer to the dr6 variable in do_debug(). That way the
> handler routines can turn off bits in that variable and do_debug() can
> see which bits remain set at the end.
>
> Of course, this will require a corresponding change to the
> post_kprobe_handler() routine.
>

Ok.

> > +
> > + if (dr6 & DR_STEP)
> > + return NOTIFY_DONE;
>
> This test is wrong. Why did you change it? It should be:
>
> if (!(dr6 & (DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
>
> In theory it's possible to have both the Single-Step bit and a Debug-Trap
> bit set at the same time.
>

This code is in hw_breakpoint_handler() which we don't intend to enter
if single-stepping bit is set (through kprobes) and hence the
NOTIFY_DONE. Given that HW_BREAKPOINT_EXECUTE is not
allowed over kernel-space addresses, we cannot have a kprobe and a HW
breakpoint over the same address causing simultaneous exceptions.

However when the patch once had support for Instructions breakpoints +
post_handler(), it was a different case then.

Is there a reason why you think this check and/or return condition
should be different?

> > +
> > + get_debugreg(dr7, 7);
> > +
> > + /* Disable breakpoints during exception handling */
> > + set_debugreg(0UL, 7);
> > +
> > + /*
> > + * Assert that local interrupts are disabled
> > + * Reset the DRn bits in the virtualized register value.
> > + * The ptrace trigger routine will add in whatever is needed.
> > + */
> > + current->thread.dr6 &= ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);
> > +
> > + /* Lazy debug register switching */
> > + if (last_debugged_task != current)
> > + switch_to_none_hw_breakpoint();
> > +
> > + /* Handle all the breakpoints that were triggered */
> > + for (i = 0; i < HB_NUM; ++i) {
> > + if (likely(!(dr6 & (DR_TRAP0 << i))))
> > + continue;
> > + /*
> > + * Find the corresponding hw_breakpoint structure and
> > + * invoke its triggered callback.
> > + */
> > + if (hbp_user_refcount[i])
> > + bp = current->thread.hbp[i];
> > + else if (i >= hbp_kernel_pos)
> > + bp = hbp_kernel[i];
> > + else /* False alarm due to lazy DR switching */
> > + continue;
> > +
> > + if (!bp)
> > + break;
>
> This logic is wrong. It should go like this:
>
> if (i >= hbp_kernel_pos)
> bp = hbp_kernel[i];
> else {
> bp = current->thread.hbp[i];
> if (!bp) {
> /* False alarm due to lazy DR switching */
> continue;
> }
> }
>

I agree. The 'break' following the "if (!bp)" in my patch would have ignored a
few genuine exceptions and it should have been:
if (!bp)
continue;

Your suggested code is elegant and I'll adopt it.

> > +
> > + switch (bp->info.type) {
> > + case HW_BREAKPOINT_WRITE:
> > + case HW_BREAKPOINT_RW:
> > + if (bp->triggered)
>
> Do you really need to test bp->triggered?
>

It's a carriage from old code. Will remove.

> > + (bp->triggered)(bp, args->regs);
> > +
> > + if (arch_check_va_in_userspace(bp->info.address,
> > + bp->info.len))
> > + rc = NOTIFY_DONE;
> > + else
> > + rc = NOTIFY_STOP;;
> > + goto exit;
>
> What on Earth is the reason for all this? What happens if two
> breakpoints get triggered at the same time?
>

The hw_breakpoint_handler() will be modified like this:
(without the modifications to dr6). Note that the 'goto exit' has
changed to 'continue' to allow handling of multiple exceptions.

int __kprobes hw_breakpoint_handler(struct die_args *args)
{
int i, rc = NOTIFY_DONE;
struct hw_breakpoint *bp;
/* The DR6 value is stored in args->err */
unsigned long dr7, dr6 = args->err;

if (dr6 & DR_STEP)
return NOTIFY_DONE;

get_debugreg(dr7, 7);

/* Disable breakpoints during exception handling */
set_debugreg(0UL, 7);

/*
* Assert that local interrupts are disabled
* Reset the DRn bits in the virtualized register value.
* The ptrace trigger routine will add in whatever is needed.
*/
current->thread.debugreg6 &= \
~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3);

/* Lazy debug register switching */
if (per_cpu(last_debugged_task, get_cpu()) != current) {
switch_to_none_hw_breakpoint();
put_cpu_no_resched();
}

/* Handle all the breakpoints that were triggered */
for (i = 0; i < HB_NUM; ++i) {
if (likely(!(dr6 & (DR_TRAP0 << i))))
continue;
/*
* Find the corresponding hw_breakpoint structure and
* invoke its triggered callback.
*/
if (i >= hbp_kernel_pos)
bp = hbp_kernel[i];
else {
bp = current->thread.hbp[i];
if (!bp) {
/* False alarm due to lazy DR switching */
continue;
}
}

switch (bp->info.type) {
case HW_BREAKPOINT_WRITE:
case HW_BREAKPOINT_RW:
(bp->triggered)(bp, args->regs);

if (arch_check_va_in_userspace(bp->info.address,
bp->info.len))
rc = NOTIFY_DONE;
else
rc = NOTIFY_STOP;;
continue;
case HW_BREAKPOINT_EXECUTE:
/*
* Presently we allow instruction breakpoints
* only in
* user-space when requested through ptrace.
*/
if (arch_check_va_in_userspace(bp->info.address,
bp->info.len)) {
(bp->triggered)(bp, args->regs);
/*
* do_debug will notify user through a
* SIGTRAP
* signal. So we are not requesting a
* NOTIFY_STOP here
*/
rc = NOTIFY_DONE;
continue;
}
}
}

set_debugreg(dr7, 7);
return rc;
}

> > + case HW_BREAKPOINT_EXECUTE:
> > + /*
> > + * Presently we allow instruction breakpoints only in
> > + * user-space when requested through ptrace.
> > + */
> > + if (arch_check_va_in_userspace(bp->info.address,
> > + bp->info.len)) {
> > + (bp->triggered)(bp, args->regs);
>
> Why do you need this test?
>
> > + /*
> > + * do_debug will notify user through a SIGTRAP
> > + * signal So we are not requesting a
> > + * NOTIFY_STOP here
> > + */
> > + rc = NOTIFY_DONE;
> > + goto exit;
> > + }
> > + }
>
> In fact, why do you distinguish between data breakpoints and code
> breakpoints in the first place? Shouldn't they be treated the same?
>

We would receive breakpoint exceptions with type HW_BREAKPOINT_EXECUTE
only for user-space (through ptrace) and this is a double-check (the
first one being done at arch_validate_hwbkpt_settings(). Also, we don't
want to invoke bp->triggered (which would be ptrace_triggered) without
checking if the causative IP is in user-space. Hence the above code.

> > + }
> > +
> > + /* Stop processing further if the exception is a stray one */
>
> That comment is wrong. It should say something like this:
>
> /* Stop processing if there's nothing more to do */
>
> > + if (!(dr6 & ~(DR_TRAP0|DR_TRAP1|DR_TRAP2|DR_TRAP3)))
> > + rc = NOTIFY_STOP;
>
> On the other hand, I'm not sure that this NOTIFY_STOP will help much
> anyway. All it does is provide an early exit from the notifier chain
> when a hardware breakpoint occurs. But if there wasn't also a
> Single-Step exception, the kprobes handler shouldn't take long to
> run. Hence an early exit doesn't provide much advantage.
>

Yes, I'm for removing this check too.

> > +exit:
> > + set_debugreg(dr7, 7);
> > + return rc;
> > +}
>
>
> > @@ -530,13 +530,14 @@ asmlinkage __kprobes struct pt_regs *syn
> > dotraplinkage void __kprobes do_debug(struct pt_regs *regs, long error_code)
> > {
> > struct task_struct *tsk = current;
> > - unsigned long condition;
> > + unsigned long dr6;
> > int si_code;
> >
> > - get_debugreg(condition, 6);
> > + get_debugreg(dr6, 6);
> > + set_debugreg(0, 6); /* DR6 may or may not be cleared by the CPU */
> >
> > /* Catch kmemcheck conditions first of all! */
> > - if (condition & DR_STEP && kmemcheck_trap(regs))
> > + if (dr6 & DR_STEP && kmemcheck_trap(regs))
> > return;
>
> Are you sure this is right? Is it possible for any of the DR_TRAPn bits
> to be set as well when this happens?
>
>

I did not look at this check before. But the (dr6 & DR_STEP) condition
should make sure no HW breakpoint exceptions are set (since we don't
allow instruction breakpoints in kernel-space yet, as explained above).

> > @@ -83,6 +85,8 @@ void exit_thread(void)
> > put_cpu();
> > kfree(bp);
> > }
> > + if (unlikely(t->dr7))
> > + flush_thread_hw_breakpoint(me);
>
> Shouldn't you test the TIF_DEBUG flag instead? After all, the thread
> might very well have some hw_breakpoint structures allocated even though
> t->dr7 is 0.
>

Yes, it's a mistake and missed many eyes before. Thanks for pointing it
out!

> >
> > ds_exit_thread(current);
> > }
> > @@ -103,14 +107,9 @@ void flush_thread(void)
> > }
> > #endif
> >
> > - clear_tsk_thread_flag(tsk, TIF_DEBUG);
> > + if (unlikely(tsk->thread.dr7))
> > + flush_thread_hw_breakpoint(tsk);
>
> Same thing here.
>
> > @@ -265,7 +267,14 @@ int copy_thread(int nr, unsigned long cl
> >
> > task_user_gs(p) = get_user_gs(regs);
> >
> > + p->thread.io_bitmap_ptr = NULL;
> > +
> > tsk = current;
> > + err = -ENOMEM;
> > + if (unlikely(tsk->thread.dr7)) {
> > + if (copy_thread_hw_breakpoint(tsk, p, clone_flags))
> > + goto out;
> > + }
>
> And here.
>
> > @@ -426,6 +438,25 @@ __switch_to(struct task_struct *prev_p,
> > lazy_load_gs(next->gs);
> >
> > percpu_write(current_task, next_p);
> > + /*
> > + * There's a problem with moving the switch_to_thread_hw_breakpoint()
> > + * call before current is updated. Suppose a kernel breakpoint is
> > + * triggered in between the two. The hw-breakpoint handler will see
> > + * that current is different from the task pointer stored in the chbi
> > + * area, so it will think the task pointer is leftover from an old task
> > + * (lazy switching) and will erase it. Then until the next context
> > + * switch, no user-breakpoints will be installed.
> > + *
> > + * The real problem is that it's impossible to update both current and
> > + * chbi->bp_task at the same instant, so there will always be a window
> > + * in which they disagree and a breakpoint might get triggered. Since
> > + * we use lazy switching, we are forced to assume that a disagreement
> > + * means that current is correct and chbi->bp_task is old. But if you
> > + * move the code above then you'll create a window in which current is
> > + * old and chbi->bp_task is correct.
> > + */
>
> Don't you think this comment should be updated to match the changes
> you have made in the code? There no longer is a chbi area, for example.
>
>
> Alan Stern
>

Will read like this:
/*
* There's a problem with moving the
* switch_to_thread_hw_breakpoint()
* call before current is updated. Suppose a kernel breakpoint
* is
* triggered in between the two. The hw-breakpoint handler will
* see
* that current is different from the task pointer stored in
* last_debugged_task, so it will think the task pointer is
* leftover
* from an old task (lazy switching) and will erase it. Then
* until the
* next context switch, no user-breakpoints will be installed.
*
* The real problem is that it's impossible to update both
* current and
* last_debugged_task at the same instant, so there will always
* be a
* window in which they disagree and a breakpoint might get
* triggered.
* Since we use lazy switching, we are forced to assume that a
* disagreement means that current is correct and
* last_debugged_task is
* old. But if you move the code above then you'll create a
* window in
* which current is old and last_debugged_task is correct.
*/
if (unlikely(test_tsk_thread_flag(next_p, TIF_DEBUG)))
switch_to_thread_hw_breakpoint(next_p);


Thanks,
K.Prasad

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/