Re: [PATCH v4 1/2] tools api: add a lightweight buffered reading api

From: Ian Rogers
Date: Mon Apr 13 2020 - 20:50:15 EST


On Mon, Apr 13, 2020 at 5:16 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>
> On Tue, Apr 14, 2020 at 1:22 AM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> >
> > On Mon, Apr 13, 2020 at 12:29 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > >
> > > Hi Ian,
> > >
> > > On Sat, Apr 11, 2020 at 3:42 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> > > >
> > > > The synthesize benchmark shows the majority of execution time going to
> > > > fgets and sscanf, necessary to parse /proc/pid/maps. Add a new buffered
> > > > reading library that will be used to replace these calls in a follow-up
> > > > CL. Add tests for the library to perf test.
> > > >
> > > > v4 adds the test file missed in v3.
> > > >
> > > > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> > > > ---
> > > > +/* Read a hexadecimal value with no 0x prefix into the out argument hex. If the
> > > > + * first character isn't hexadecimal returns -2, io->eof returns -1, otherwise
> > > > + * returns the character after the hexadecimal value which may be -1 for eof.
> > >
> > > I'm not sure returning -1 is good when it actually reads something and
> > > meets EOF.
> > > Although it would have a valid value, users might consider it an error IMHO.
> > > Why not returning 0 instead? (I'm ok with -1 for the later use of the API).
> >
> > Thanks for the feedback! In the code for /proc/pid/maps this is a
> > hypothetical, but I think having the API right is important. I didn't
> > go with 0 as you mention 0 'could' encode a character, for example,
> > 7fffabcd\0 wouldn't be distinguishable from 7fffabcd<EOF>.
>
> Practically I don't think it matters in this case as long as we can
> distinguish them in the next call (if the user wants to do).
> What users want to do (I think) is whether the returned value
> (in *hex) is ok to use or not. By returning -1 on EOF, it might
> be confusing for users..

In the /proc/pid/maps case the code for reading an address like
"00400000-00452000 " the code is:

if (io__get_hex(io, start) != '-')
return false;
if (io__get_hex(io, end) != ' ')
return false;

If io__get_hex doesn't return the next character it becomes:

if (io__get_hex(io, start))
return false;
if (io__get_char(io) != '-')
return false;
if (io__get_hex(io, end))
return false;
if (io__get_char(io) != ' ')
return false;

Which is twice as verbose and requires that io have a rewind operation
to go backward when io__get_hex and io__get_dec have gone 1 character
too far.

> > The updated
> > code distinguishes the cases as 0 meaning character \0, -1 meaning EOF
> > and -2 meaning bad encoding. Your worry is that a hex number that's
> > next to EOF will get a result of -1 showing the EOF came next. and
> > code that does 'if ( .. < 0)' would trigger. While clunky, it'd be
> > possible in those cases to change the code to 'if ( .. < -1)'.
>
> Yes, but it's not conventional IMHO.
>
>
> > So my thoughts are:
> > 1) being able to tell apart the 3 cases could be important - this is
> > all hypothetical;
> > 2) keeping EOF and error as negative numbers has a degree of consistency;
> > 3) using -1 for EOF comes from get_char, it'd be nice to have one
> > value mean EOF.
> > Perhaps the issue is the name of the function? It isn't a standard API
> > to return the next character, but it simplified things for me as I
> > didn't need to add a 'rewind' operation. The function names could be
> > something like io__get_hex_then_char and io__get_dec_then_char, EOF
> > for the 'then_char' part would be more consistent. I'd tried to keep
> > the names short and have a load bearing comment, which isn't ideal but
> > generally I believe the style is that function names are kept short.
> > Let me know what you think.
>
> I'm ok with the function name and understand your concerns.
> And I don't want to insist it strongly but just sharing my thoughts.
>
> Thanks
> Namhyung

Thanks, feedback appreciated! It is useful to discuss and it is
straightforward to change the API but I'm in two minds as to whether
it would be better.

I'd still like to land this and the next patch, as getting rid of
fgets/sscanf saves 50us from event synthesis. Breaking out the io part
of that change wasn't done so much with a view to replacing stdio, but
just something minimal that serves the /proc/pid/maps case.

Thanks,
Ian