Re: [this_cpu_xx V8 11/16] Generic support for this_cpu_cmpxchg

From: Christoph Lameter
Date: Mon Jan 04 2010 - 12:22:34 EST


On Tue, 22 Dec 2009, Mathieu Desnoyers wrote:

> > > I am a bit concerned about the "generic" version of this_cpu_cmpxchg.
> > > Given that what LTTng needs is basically an atomic, nmi-safe version of
> > > the primitive (on all architectures that have something close to a NMI),
> > > this means that it could not switch over to your primitives until we add
> > > the equivalent support we currently have with local_t to all
> > > architectures. The transition would be faster if we create an
> > > atomic_cpu_*() variant which would map to local_t operations in the
> > > initial version.
> > >
> > > Or maybe have I missed something in your patchset that address this ?
> >
> > NMI safeness is not covered by this_cpu operations.
> >
> > We could add nmi_safe_.... ops?
> >
> > The atomic_cpu reference make me think that you want full (LOCK)
> > semantics? Then use the regular atomic ops?
>
> nmi_safe would probably make sense here.

I am not sure how to implement fallback for nmi_safe operations though
since there is no way of disabling NMIs.

> But given that we have to disable preemption to add precision in terms
> of trace clock timestamp, I wonder if we would really gain something
> considerable performance-wise.

Not sure what exactly you attempt to do there.

> I also thought about the design change this requires for the per-cpu
> buffer commit count pointer which would have to become a per-cpu pointer
> independent of the buffer structure, and I foresee a problem with
> Steven's irq off tracing which need to perform buffer exchanges while
> tracing is active. Basically, having only one top-level pointer for the
> buffer makes it possible to exchange it atomically, but if we have to
> have two separate pointers (one for per-cpu buffer, one for per-cpu
> commit count array), then we are stucked.

You just need to keep percpu pointers that are offsets into the percpu
area. They can be relocated as needed to the processor specific addresses
using the cpu ops.

> So given that per-cpu ops limits us in terms of data structure layout, I
> am less and less sure it's the best fit for ring buffers, especially if
> we don't gain much performance-wise.

I dont understand how exactly the ring buffer logic works and what you are
trying to do here.

The ringbuffers are per cpu structures right and you do not change cpus
while performing operations on them? If not then the per cpu ops are not
useful to you.

If you dont: How can you safely use the local_t operations for the
ringbuffer logic?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/