Re: [PATCH v4 01/34] lib/printbuf: New data structure for printing strings

From: Kent Overstreet
Date: Mon Jun 20 2022 - 11:31:36 EST


On Mon, Jun 20, 2022 at 04:44:10AM +0000, David Laight wrote:
> From: Kent Overstreet
> > Sent: 20 June 2022 01:42
> >
> > This adds printbufs: a printbuf points to a char * buffer and knows the
> > size of the output buffer as well as the current output position.
> >
> > Future patches will be adding more features to printbuf, but initially
> > printbufs are targeted at refactoring and improving our existing code in
> > lib/vsprintf.c - so this initial printbuf patch has the features
> > required for that.
> >
> > Signed-off-by: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> > Reviewed-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> > ---
> > include/linux/printbuf.h | 122 +++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 122 insertions(+)
> > create mode 100644 include/linux/printbuf.h
> >
> > diff --git a/include/linux/printbuf.h b/include/linux/printbuf.h
> > new file mode 100644
> > index 0000000000..8186c447ca
> > --- /dev/null
> > +++ b/include/linux/printbuf.h
> > @@ -0,0 +1,122 @@
> > +/* SPDX-License-Identifier: LGPL-2.1+ */
> > +/* Copyright (C) 2022 Kent Overstreet */
> > +
> > +#ifndef _LINUX_PRINTBUF_H
> > +#define _LINUX_PRINTBUF_H
> > +
> > +#include <linux/kernel.h>
> > +#include <linux/string.h>
> > +
> > +/*
> > + * Printbufs: String buffer for outputting (printing) to, for vsnprintf
> > + */
> > +
> > +struct printbuf {
> > + char *buf;
> > + unsigned size;
> > + unsigned pos;
>
> No naked unsigneds.

This is the way I've _always_ written kernel code - single word type names.

>
> > +};
> > +
> > +/*
> > + * Returns size remaining of output buffer:
> > + */
> > +static inline unsigned printbuf_remaining_size(struct printbuf *out)
> > +{
> > + return out->pos < out->size ? out->size - out->pos : 0;
> > +}
> > +
> > +/*
> > + * Returns number of characters we can print to the output buffer - i.e.
> > + * excluding the terminating nul:
> > + */
> > +static inline unsigned printbuf_remaining(struct printbuf *out)
> > +{
> > + return out->pos < out->size ? out->size - out->pos - 1 : 0;
> > +}
>
> Those two are so similar mistakes will be make.

If you've got ideas for better names I'd be happy to hear them - we discussed
this and this was what we came up with.

> You can also just return negatives when the buffer has overlowed
> and get the callers to test < or <= as required.

Yeesh, no.

> I also wonder it is necessary to count the total length
> when the buffer isn't long enough?
> Unless there is a real pressing need for it I'd not bother.
> Setting pos == size (after writing the '\0') allows
> overflow be detected without most of the dangers.

Because that's what snprintf() needs.

> > +
> > +static inline unsigned printbuf_written(struct printbuf *out)
> > +{
> > + return min(out->pos, out->size);
>
> That excludes the '\0' for short buffers but includes
> it for overlong ones.

It actually doesn't.

> > +}
> > +
> > +/*
> > + * Returns true if output was truncated:
> > + */
> > +static inline bool printbuf_overflowed(struct printbuf *out)
> > +{
> > + return out->pos >= out->size;
> > +}
> > +
> > +static inline void printbuf_nul_terminate(struct printbuf *out)
> > +{
> > + if (out->pos < out->size)
> > + out->buf[out->pos] = 0;
> > + else if (out->size)
> > + out->buf[out->size - 1] = 0;
> > +}
> > +
> > +static inline void __prt_char(struct printbuf *out, char c)
> > +{
> > + if (printbuf_remaining(out))
> > + out->buf[out->pos] = c;
>
> At this point it is (should be) always safe to add the '\0'.
> Doing so would save the extra conditionals later on.

True, but at the cost of making the code less straightforward. I may have a look
at it.

>
> > + out->pos++;
> > +}
> > +
> > +static inline void prt_char(struct printbuf *out, char c)
> > +{
> > + __prt_char(out, c);
> > + printbuf_nul_terminate(out);
> > +}
> > +
> > +static inline void __prt_chars(struct printbuf *out, char c, unsigned n)
> > +{
> > + unsigned i, can_print = min(n, printbuf_remaining(out));
> > +
> > + for (i = 0; i < can_print; i++)
> > + out->buf[out->pos++] = c;
> > + out->pos += n - can_print;
> > +}
> > +
> > +static inline void prt_chars(struct printbuf *out, char c, unsigned n)
> > +{
> > + __prt_chars(out, c, n);
> > + printbuf_nul_terminate(out);
> > +}
> > +
> > +static inline void prt_bytes(struct printbuf *out, const void *b, unsigned n)
> > +{
> > + unsigned i, can_print = min(n, printbuf_remaining(out));
> > +
> > + for (i = 0; i < can_print; i++)
> > + out->buf[out->pos++] = ((char *) b)[i];
> > + out->pos += n - can_print;
> > +
> > + printbuf_nul_terminate(out);
>
> jeepers - that can be written so much better.
> Something like:
> unsigned int i, pos = out->pos;
> int space = pos - out->size - 1;
> char *tgt = out->buf + pos;
> const char *src = b;
> out->pos = pos + n;
>
> if (space <= 0)
> return;
> if (n > space)
> n = space;
>
> for (i = 0; i < n; i++)
> tgt[i] = src[i];
> tgt[1] = 0;
>

I find your version considerably harder to read, and I've stared at enough
assembly that I trust the compiler to generate pretty equivalent code.

> > +}
> > +
> > +static inline void prt_str(struct printbuf *out, const char *str)
> > +{
> > + prt_bytes(out, str, strlen(str));
>
> Do you really need to call strlen() and then process
> the buffer byte by byte?

Versus introducing a branch to check for nul into the inner loop of prt_bytes()?
You're not serious, are you?