Re: [PATCH] ARM fix syscall trace return value

From: Mathieu Desnoyers
Date: Tue Feb 17 2009 - 14:30:39 EST


* Viktor Rosendahl (Viktor.Rosendahl@xxxxxxxxx) wrote:
> On Tue, 2009-02-17 at 19:18 +0100, ext Mathieu Desnoyers wrote:
> > Hi Russell,
> >
> > I am currently finding core bugs in the Linux kernel implementation of
> > the ARM architecture. :-( e.g. return value not being sent to the
> > syscall_trace function upon exit (upon which LTTng depends). (patch
> > below)
> >
> > This is _very_ silly because there is no dependency on the syscall being
> > executed, and the syscall_entry/syscall_exit events are recorded at the
> > _exact_ same time. Yes, I mean the _exact_ same time : using a clock
> > which consists of atomic_add_return monotonic increments, it seems like
> > ARM is able to return the _same_ value of an atomic increment return
> > *twice* !! I think the atomic.h primitives are broken and that they
> > allow concurrent modification of a given atomic variable by the pipeline.
> > It sounds weird, and I hope I am not crazy (just getting into the ARM
> > world..). ;) Any thoughts ? I'll try adding some barriers to see if it
> > helps.
>
> Hi Mathieu,
>
> I am currently investigating a very similar behavior,
> (syscall_entry/syscall_exit events having the exact same time in lttng).
>
> However, I am using the CCNT (together with trace-clock-32-to-64.c) for
> timestamping. This is, if I understand you correctly, a different clock
> than the one you are using, not using atomic_add_return(). Thus, I
> suspect that the reason for getting the exact same time for entry/exit
> events might be something else than the clocks being broken.
>
> I have to admit that I cannot explain how it can happen though. Could it
> be some weird problem in the lttng trace recording ?
>

I had the same result as you with the ccnt-based clock I am currently
developing, so I went back to a more "solid" and atomic
atomic_add_return clock. But I noticed that we still had entry/exit with
the same timestamps, so I was really unsure about what was happening,
because there is no trace corruption and because I have never, ever,
seen that kind of problem on any other architecture (x86, powerpc,
mips...). So I fixed the syscall_trace exit parameter, which now makes
sure there is a dependency on the return value. But I want to find out
why the atomic add return failed to be atomic in that particular
condition. I suspect there is a missing memory barrier in atomic.h.

Mathieu

> best regards,
>
> Viktor
>
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/