Re: rfc: treewide scripted patch mechanism? (was: Re: [PATCH] Makefile: Convert -Wimplicit-fallthrough=3 to just -Wimplicit-fallthrough for clang)QUILT

From: Willy Tarreau
Date: Wed Aug 21 2019 - 00:01:55 EST


On Tue, Aug 20, 2019 at 05:43:27PM -0700, Linus Torvalds wrote:
> I would seriously suggest doing something like
>
> copy_string( dst, dstsize, src, srcsize, FLAGS );
>
> where FLAGS migth be "pad" or whatever. Make it return the size of the
> resulting string, because while it can be convenient to pass 'dst" on,
> it's not useful.

I actually like this a lot. FLAGS could also indicate whether or not a
zero before srcsize ends the copy or not, allowing to copy substrings
of known length or known valid strings of unknown length by passing ~0
in srcsize. And it could also indicate whether the returned value should
indicate how much was copied or how much would have been needed for the
copy to work (so that testing (result <= dstsize) indicates truncation).

> And then maybe just add the helper macro that turns an array into a
> "pointer, size" combination, rather than yet another letter jumble.

I did exactly this in some of my projects including haproxy, I called
the lib "ist" for "indirect string", and found it extremely convenient
to use because many functions now return an ist or take an ist as an
argument. Passing a structure of only two elements results in passing
only two registers, and that's the same for the return value. Moreover,
the compiler is smart enough to eliminate a *lot* of manipulations, and
to replace pointer dereferences with direct register manipulations. I
do have a lot of ist("foo") spread everywhere in the code, which makes
a struct ist from the string and its length, and when you look at the
code, the compiler used immediate values for both the string and its
length. It's also extremely convenient for string comparisons and
lookups because you start by checking the length and can eliminate
lookups and dereferences, making keyword parsers very efficient. It
also allows us to have an istcat() function doing like strncat() but
with the output size always known so that there's no risk of appending
past the end when the starting point doesn't match the beginning of a
string.

I must confess that I became quite addict to using this because it's
so much convenient not to have to care about string length nor zero
termination anymore, without the overhead of calling strlen() on
resulting values!

For illustration of the simplicity the code is here :
http://git.haproxy.org/?p=haproxy.git;a=blob_plain;f=include/common/ist.h

And here are a few examples of usage:
- declaration in arrays:
http://git.haproxy.org/?p=haproxy.git;a=blob;f=contrib/prometheus-exporter/service-prometheus.c;h=9b9ef2ea8e2e8ee0cc63364500d39fc08009fb8d;hb=HEAD#l644
- appending to a string:
http://git.haproxy.org/?p=haproxy.git;a=blob;f=contrib/prometheus-exporter/service-prometheus.c;h=9b9ef2ea8e2e8ee0cc63364500d39fc08009fb8d;hb=HEAD#l1112
- passing as function arguments:
http://git.haproxy.org/?p=haproxy.git;a=blob;f=src/http_ana.c;h=b2069e3ead59e7bcde45ac76a1c6b0b6b5fb3882;hb=HEAD#l2468
http://git.haproxy.org/?p=haproxy.git;a=blob;f=src/http_ana.c;h=b2069e3ead59e7bcde45ac76a1c6b0b6b5fb3882;hb=HEAD#l2602
- checking for known values:
http://git.haproxy.org/?p=haproxy.git;a=blob;f=src/h2.c;h=c41da8e5ee116e75e4719709527511c299a3657c;hb=HEAD#l295

I'm personally totally convinced by this approach and am slowly improving
this interface to progressively use it everywhere, and quite frankly I
can only strongly recommend going into the same direction for ease of
use, safety, and efficiency.

Willy