Re: Random scheduler/unaligned accesses crashes with perf lockevents on sparc 64

From: Steven Rostedt
Date: Tue Apr 06 2010 - 09:41:35 EST


It's best to send to my rostedt@xxxxxxxxxxx account, just like it is
best to send to your davem@xxxxxxxxxxxxx ;-)

On Mon, 2010-04-05 at 19:15 -0700, David Miller wrote:
> From: Frederic Weisbecker <fweisbec@xxxxxxxxx>
> Date: Mon, 5 Apr 2010 21:40:58 +0200
> > It happens without CONFIG_FUNCTION_TRACER as well (but it happens
> > when the function tracer runs). And I hadn't your
> > perf_arch_save_caller_regs() when I triggered this.
> I think there's still something wrong with the ring buffer stuff on
> architectures like sparc64.
> Stephen, I'm looking at the 8-byte alignment fix that was made a few
> weeks ago, commit:
> commit 2271048d1b3b0aabf83d25b29c20646dcabedc05
> Author: Steven Rostedt <srostedt@xxxxxxxxxx>
> Date: Thu Mar 18 17:54:19 2010 -0400
> ring-buffer: Do 8 byte alignment for 64 bit that can not handle 4 byte align
> and I'm not so sure it's completely correct.
> Originally, the ring buffer entries determine where the entry data
> resides (either &event->array[0] or &event->array[1]) based upon the
> length.
> Beforehand, in all cases:
> 1) If length could be encoded into event->type_len (ie. <=
> RB_MAX_SMALL_DATA) then event->type_len holds the length
> and the event data starts at &event->array[0]
> 2) Otherwise (length > RB_MAX_SMALL_DATA) the length is
> encoded into event->array[0] and the event data starts at
> &event->array[1]
> But now, there is a new semantic when CONFIG_64BIT is true and
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is false (which isn't the right
> test btw, f.e. sparc 32-bit needs this handling just like sparc 64-bit
> does since it uses full 64-bit loads and stores to access u64 objects
> and thus will crash without proper alignment, the correct test should

OK, so the a 64 bit word still needs 64 bit alignment when storing to a
data pointer.

I wonder if we should just have a special copy in this case for the
events and remove this patch in the ring buffer. That is:

__assign_word(__entry->word, value);


#define __assgin_word(dest, src) \
memcpy(&(dest), &(src), sizeof(src));

This would fix it for all.

> This new semantic is:
> 1) Entries always encode the length in ->array[0] and ->type_len
> is set to zero.
> And then there are special cases like events of type
> RINGBUF_TYPE_PADDING which, even though ->type_len is non-zero, encode
> a length field in ->array[0] which is used by the ring buffer
> iterators such as rb_event_length(), but this only applies only if
> event->time_delta is non-zero. (Phew!)

The RINGBUF_TYPE_PADDING is used to either fill the rest of the sub
buffer, where alignment does not matter, or to replace a deleted event
that had another event after, it, which should already be aligned.

> The commit adjusts the code in rb_calculate_event_length() to force 8
> byte chunks when RB_FORCE_8BYTE_ALIGNMENT is set. It also adjusted
> the rb_update_event() logic so that it unconditionally uses
> event->array[0] for the length on such platforms.
> However I don't see any logic added to ring_buffer_event_length()
> to handle this forcing. That alone can't explain the crashes
> Frederic and I are seeing, since only oprofile seems to use that
> helper function, but I can just imagine there might be other
> subtle bugs linering after the above commit.
> Anyways, that's just the inital potential problem I've discovered.
> I'll start auditing the rest of this code.
> I wonder if there's a simpler way to implement this alignment fix such
> that we don't have to constantly make sure scores of locations in
> ring_buffer.c get this magic exception case correct.
> We should probably also BUILD_BUG_ON() if BUF_PAGE_HDR_SIZE is not
> a multiple of the necessary alignment, since the ring buffer
> entries start at the end of that.
> Also I noticed (painfully :-) that 2.6.33 needs a backport of this
> alignment fix too, so we should submit it to -stable (once we sift
> out all the bugs of course).

What about removing the logic from the ring buffer and moving it to the
TRACE_EVENT() macros as I suggested above?

We would probably need a way to read the buffers too.

I also know that Mathieu has some strange tricks to force alignment but
I'm still not convinced it would make things any less fragile than what
is already there.

-- Steve

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at