Re: [PATCH v2] statx: optimize copy of struct statx to userspace

From: Eric Biggers
Date: Sat Mar 11 2017 - 21:17:19 EST

Hi Al,

On Sun, Mar 12, 2017 at 01:24:15AM +0000, Al Viro wrote:
> On Sat, Mar 11, 2017 at 01:45:55PM -0800, Eric Biggers wrote:
> > From: Eric Biggers <ebiggers@xxxxxxxxxx>
> >
> > I found that statx() was significantly slower than stat(). As a
> > microbenchmark, I compared 10,000,000 invocations of fstat() on a tmpfs
> > file to the same with statx() passed a NULL path:
> Umm...

Well, it's a silly benchmark, but stat performance is important, and usually
things are cached already so most of the time is just overhead --- which this
measures. And since nothing actually uses statx() yet, you can't do a benchmark
just by running some command like 'git status' or whatever.

> > + tmp.stx_dev_major = MAJOR(stat->dev);
> > + tmp.stx_dev_minor = MINOR(stat->dev);
> > + memset(tmp.__spare2, 0, sizeof(tmp.__spare2));
> > +
> > + return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0;
> That relies upon there being no padding in the damn structure.
> It's true and probably will be true on any target, but
> a) it's bloody well worth stating explicitly
> and
> b) struct statx tmp = {.stx_mask = stat->result_mask};
> will get rid of those memset() you've got there by implicit
> zeroing of fields missing in partial structure initializer.
> Padding is *not* included into that, but you are relying upon
> having no padding anyway...

If padding is a concern at all (AFAICS it's not actually an issue now with
struct statx, but people tend to have different opinions on how careful they
want to be with padding), then I think we'll just have to start by memsetting
the whole struct to 0.

stat() actually does that, except that it's overridden on some architectures,
like x86, to only memset the actual reserved fields. I had kind of assumed that
as long as we're defining a new struct that has the same layout on all
architectures, with no padding, that we'd want to take the opportunity to only
memset the actual reserved fields, to make it a bit faster. And yes, a
designated initializer would make this look a bit nicer, though it might have to
be changed later if any future statx() extensions involve fields that are
unioned or conditionally written.

Note that another approach would be to copy the fields individually, but with
unsafe_put_user() instead of __put_user(). That would avoid the overhead of
user_access_begin()/user_access_end() which was the main performance problem.
However, the infrastructure around this doesn't seem to be ready quite yet:
there aren't good implementations of unsafe_put_user() yet, nor is there an
unsafe_clear_user() yet.

- Eric