Re: [RFC -v2] kfifo writer side lock-less support

From: Huang Ying
Date: Tue Aug 24 2010 - 04:43:57 EST


On Tue, 2010-08-24 at 15:55 +0800, Stefani Seibold wrote:
> Hi Huang,
>
> Am Dienstag, den 24.08.2010, 09:42 +0800 schrieb Huang Ying:
> > Hi, Stefani,
> >
> > The sample code is attached with the mail too. If it is necessary, I
> > can put the sample code into samples/kfifo directory if the basic
> > idea of the patch is accepted.
> >
>
> The samples use a own implementation of a record handling. Please use
> the one provided by the kfifo API.
>
> > This patch add lock-less support for kfifo writer side amongst
> > different contexts on one CPU, such as NMI, IRQ, soft_irq, process,
> > etc. This makes kfifo can be used to implement per-CPU lock-less data
> > structure.
> >
> > The different contexts on one CPU have some kind of preemption
> > priority as follow:
> >
> > process -> soft_irq -> IRQ -> NMI
> >
> > Where preemption priority increases from left to right. That is, the
> > right side context can preempt left side context, but not in the
> > reverse direction, that means the right side context will run to the
> > end before return to the left side context. The lock-less algorithm
> > used in the patch take advantage of this.
> >
> > The algorithm works in reserve/commit style, where "reserve" only
> > allocate the space, while "commit" really makes the data into buffer,
> > data is prepared in between. cmpxchg is used in "reserve", this
> > guarantees different spaces will be allocated for different
> > contexts. Only the "commit" in lowest context will commit all
> > allocated spaces. Because all contexts preempting lowest context
> > between "reserve" and "commit" will run to the end, all data put into
> > buffer are valid.
> >
>
> I don't like it, because it handles a very rare use case. The batch is
> bloating the code of the fifo API and the macros, so user of this get
> also an increase of the code size. most and increase the size of the
> kfifo structure. Most of the user, i think more than 99 percent will not
> need it.

The patch adds only 1 field (unsigned int) to struct __kfifo. I think
that should be acceptable. Because sizeof(struct __kfifo) should be much
smaller that __kfifo->mask + 1 in most cases.

For macros, only INIT_KFIFO, kfifo_reset and kfifo_put is enlarged a
little (one instruction?).

And yes, I added some new functions and macros. But if I implement
another ring buffer instead of making kfifo lock-less (for multiple
writers), I need to implement more functions and macros, the code size
increment will be larger, isn't it?

> And there a lot of bugs on a first small review.
>
> Your kfifo_skip_len does not handle record fifos. User of kfifo_put,
> kfifo_avail will get an increased code size. Your kfifo_is_full isn't
> longer working correctly. You did not fixed kfifo_in(), so this function
> will not work. And so on....

After we reach the consensus on the general idea, we can look at these
issues one by one.

> > Some idea of the lock-less algorithm in the patch comes from Steven
> > Rostedt's trace ring buffer implementation.
> >
> > The new xxx_ptr interface and kfifo_iter makes it possible to
> > write/read content of kfifo in-place in addition to copying out/in.
> >
>
> The regular use case of a fifo is that there is one write and one
> reader. In this case, the current implementation of the kfifo structure
> is lock less.
>
> So i you need this, than it would be better to use the ring buffer.

I really need multiple-writers and one reader in APEI GHES support, and
I need lock-less in writer side (because the buffer need to be written
in NMI handler). So I can not use the original kfifo implementation.

> > Signed-off-by: Huang Ying <ying.huang@xxxxxxxxx>
> > Signed-off-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
>
> I was never CC by a mail where Andi has signed off this patch. It would
> be great if i will get some infos why he like it.
>
> But i think you will never get an ACK for this by me, because it bloats
> the most-used functions of the hand optimized kfifo API.
>
> Rule number one: do not increase the code size if the functionality is
> not needed and used!

There is at least one user. I will post the corresponding patches later.

Best Regards,
Huang Ying


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/