Re: [PATCH v4 1/2] tools api: add a lightweight buffered reading api

From: Ian Rogers
Date: Tue Apr 14 2020 - 22:27:06 EST


On Tue, Apr 14, 2020 at 7:20 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>
> On Tue, Apr 14, 2020 at 9:48 AM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> >
> > On Mon, Apr 13, 2020 at 5:16 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Apr 14, 2020 at 1:22 AM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> > > >
> > > > On Mon, Apr 13, 2020 at 12:29 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > > > >
> > > > > Hi Ian,
> > > > >
> > > > > On Sat, Apr 11, 2020 at 3:42 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > The synthesize benchmark shows the majority of execution time going to
> > > > > > fgets and sscanf, necessary to parse /proc/pid/maps. Add a new buffered
> > > > > > reading library that will be used to replace these calls in a follow-up
> > > > > > CL. Add tests for the library to perf test.
> > > > > >
> > > > > > v4 adds the test file missed in v3.
> > > > > >
> > > > > > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> > > > > > ---
> > > > > > +/* Read a hexadecimal value with no 0x prefix into the out argument hex. If the
> > > > > > + * first character isn't hexadecimal returns -2, io->eof returns -1, otherwise
> > > > > > + * returns the character after the hexadecimal value which may be -1 for eof.
> > > > >
> > > > > I'm not sure returning -1 is good when it actually reads something and
> > > > > meets EOF.
> > > > > Although it would have a valid value, users might consider it an error IMHO.
> > > > > Why not returning 0 instead? (I'm ok with -1 for the later use of the API).
> > > >
> > > > Thanks for the feedback! In the code for /proc/pid/maps this is a
> > > > hypothetical, but I think having the API right is important. I didn't
> > > > go with 0 as you mention 0 'could' encode a character, for example,
> > > > 7fffabcd\0 wouldn't be distinguishable from 7fffabcd<EOF>.
> > >
> > > Practically I don't think it matters in this case as long as we can
> > > distinguish them in the next call (if the user wants to do).
> > > What users want to do (I think) is whether the returned value
> > > (in *hex) is ok to use or not. By returning -1 on EOF, it might
> > > be confusing for users..
> >
> > In the /proc/pid/maps case the code for reading an address like
> > "00400000-00452000 " the code is:
> >
> > if (io__get_hex(io, start) != '-')
> > return false;
> > if (io__get_hex(io, end) != ' ')
> > return false;
> >
> > If io__get_hex doesn't return the next character it becomes:
> >
> > if (io__get_hex(io, start))
> > return false;
> > if (io__get_char(io) != '-')
> > return false;
> > if (io__get_hex(io, end))
> > return false;
> > if (io__get_char(io) != ' ')
> > return false;
> >
> > Which is twice as verbose and requires that io have a rewind operation
> > to go backward when io__get_hex and io__get_dec have gone 1 character
> > too far.
>
> Yeah, I'm not against returning the next character - it's good.
> The only concern was whether it should return -1 or 0 when
> it meets EOF after parsing some digits.
>
> But I think we can go with this version as there's no such case
> when parsing /proc/pid/maps.
>
> >
> > > > The updated
> > > > code distinguishes the cases as 0 meaning character \0, -1 meaning EOF
> > > > and -2 meaning bad encoding. Your worry is that a hex number that's
> > > > next to EOF will get a result of -1 showing the EOF came next. and
> > > > code that does 'if ( .. < 0)' would trigger. While clunky, it'd be
> > > > possible in those cases to change the code to 'if ( .. < -1)'.
> > >
> > > Yes, but it's not conventional IMHO.
> > >
> > >
> > > > So my thoughts are:
> > > > 1) being able to tell apart the 3 cases could be important - this is
> > > > all hypothetical;
> > > > 2) keeping EOF and error as negative numbers has a degree of consistency;
> > > > 3) using -1 for EOF comes from get_char, it'd be nice to have one
> > > > value mean EOF.
> > > > Perhaps the issue is the name of the function? It isn't a standard API
> > > > to return the next character, but it simplified things for me as I
> > > > didn't need to add a 'rewind' operation. The function names could be
> > > > something like io__get_hex_then_char and io__get_dec_then_char, EOF
> > > > for the 'then_char' part would be more consistent. I'd tried to keep
> > > > the names short and have a load bearing comment, which isn't ideal but
> > > > generally I believe the style is that function names are kept short.
> > > > Let me know what you think.
> > >
> > > I'm ok with the function name and understand your concerns.
> > > And I don't want to insist it strongly but just sharing my thoughts.
> > >
> > > Thanks
> > > Namhyung
> >
> > Thanks, feedback appreciated! It is useful to discuss and it is
> > straightforward to change the API but I'm in two minds as to whether
> > it would be better.
> >
> > I'd still like to land this and the next patch, as getting rid of
> > fgets/sscanf saves 50us from event synthesis. Breaking out the io part
> > of that change wasn't done so much with a view to replacing stdio, but
> > just something minimal that serves the /proc/pid/maps case.
>
> The performance gain looks nice! Thanks for working on this.

Thanks for the feedback! I think we can keep an eye on the API, the
idea to change to 0 isn't an unreasonable one. I have some other
multithreaded synthesis improving patches to send from Stephane, and
I've updated the benchmark to measure these. I'll re-send these
patches with the new single and multi-threaded benchmark data.

Thanks,
Ian

> Thanks
> Namhyung