Hello Andi,
Thanks for taking the trouble to go through our code in such detail and
thinking through the race conditions in dp_vaddr_to_page, which I had sort
of shut my eyes to and postponed for while because it didn't seem very easy
to close all the loopholes in an elegant way. I need to understand the mm
locking hierachies and appropriate usage more thoroughly to do a complete
job.
I have labelled the key points you have brought up in your note below as
(a), (b), (c) for ease of reference.
For (a), your suggestion of a two pass approach is I guess feasible, but I
wish there were a simpler way to do it.
Actually I don't even really like the idea of forcing the swapped out page
back in, which we are having to do right now - it would have been nicer
if there were a swapin() routine in the vma ops that we could have used for
on-demand probe insertion, just the way we use inode address space
readpage() right now for discardable pages, but maybe that's asking for too
much :-) [Could vma type based swapin() logic be a useful abstraction in
general, aside from dprobes ?].
Anyway, let me think over this for a while ...
We don't quite understand (b), though. There is indeed a race due to our
not holding the page given to us by handle_mm_fault, while we try to access
it, and we need to fix that of course, but that doesn't sound exactly like
what you mention here. We do have handle_mm_fault being called under the mm
semaphore. Could you explain the deadlock situation that you have in mind ?
We've taken (c) as a very reasonable usability feedback. Now, we would be
displaying the opcodes as part of the dprobes query results. Hope that
would help.
It's good to hear that you've found dprobes useful and that you ported it
to 2.4 yourself. Do send us any comments, suggestions, improvements etc
that come to mind as you use it or go through the code.
Regards
Suparna
Suparna Bhattacharya
Systems Software Group, IBM Global Services, India
E-mail : bsuparna@in.ibm.com
Phone : 91-80-5267117, Extn : 2525
Richard J Moore@IBMGB
10/19/2000 01:27 AM
To: DProbes Team India
cc:
From: Richard J Moore/UK/IBM@IBMGB
Subject: Re: [ANNOUNCE] DProbes 1.1
Richard Moore - RAS Project Lead - Linux Technology Centre (PISC).
http://oss.software.ibm.com/developerworks/opensource/linux
Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK
---------------------- Forwarded by Richard J Moore/UK/IBM on 18/10/2000
20:57 ---------------------------
Andi Kleen <ak@suse.de> on 18/10/2000 18:38:13
Please respond to Andi Kleen <ak@suse.de>
To: Richard J Moore/UK/IBM@IBMGB
cc: linux-kernel@vger.kernel.org
Subject: Re: [ANNOUNCE] DProbes 1.1
Hallo Richard,
On Wed, Oct 18, 2000 at 10:44:11AM +0100, richardj_moore@uk.ibm.com wrote:
>
>
> We've release v1.1 of DProbes - deatils and code is on the DProbes web
> page.
>
> the enhancements include:
>
> - DProbes for kernel version 2.4.0-test7 is now available.
First thanks for this nice work.
I ported the older 1.0 dprobes to 2.4 a few weeks ago for my own use.
It is very useful for kernel work. Unfortunately the user space support
had still one ugly race which I didn't fix because it required too
extensive changes for my simple port (and it didn't concern me because
I only use kernel level breakpoints)
I see the problems are still in 1.1.
(a) The problem is the vma loop in process_recs_in_cow_pages over the vmas
of an
address_space. In 2.4 the only way to do that safely is to hold the
address_space spinlock. Unfortunately you cannot take the semaphore
or execute handle_mm_fault while holding the spinlock, because they could
sleep. The only way I think to do it relatively race free without adding
locks
to the core VM is to do it two pass (first collect all the mms with mmget()
and their addresses in a separate list with the spinlock and then process
it
with the spinlock released)
(b) Then dp_vaddr_to_page has another race. It cannot hold the mm semaphore
because that would deadlock with handle_mm_struct. Not holding it means
though that the page could be swapped out again after you faulted it in
before you have a change to access it. It probably can be done with an
loop that checks and locks the page atomically (e.g. using cmpexchg)
and retries the handle_mm_fault as needed.
There may be more races I missed, the 2.4 SMP MM locking hierarchy is
unfortunately not very flexible and makes things like what dprobes wants
to do relatively hard.
(c) Another change I added and which I found useful is a printk to show
the opcode of mismatched probes (this way wrong offsets in the probe
definitions are easier to fix)
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Tue Oct 31 2000 - 21:00:13 EST