Re: [PATCH 01/29] Revert "userfaultfd: don't fail on unrecognized features"

From: Axel Rasmussen
Date: Fri Mar 31 2023 - 12:54:17 EST


On Thu, Mar 30, 2023 at 3:27 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> On Thu, Mar 30, 2023 at 12:04:09PM -0700, Axel Rasmussen wrote:
> > On Thu, Mar 30, 2023 at 8:57 AM Peter Xu <peterx@xxxxxxxxxx> wrote:
> > >
> > > This is a proposal to revert commit 914eedcb9ba0ff53c33808.
> > >
> > > I found this when writting a simple UFFDIO_API test to be the first unit
> > > test in this set. Two things breaks with the commit:
> > >
> > > - UFFDIO_API check was lost and missing. According to man page, the
> > > kernel should reject ioctl(UFFDIO_API) if uffdio_api.api != 0xaa. This
> > > check is needed if the api version will be extended in the future, or
> > > user app won't be able to identify which is a new kernel.
> >
> > 100% agreed, this was a mistake.
> >
> > >
> > > - Feature flags checks were removed, which means UFFDIO_API with a
> > > feature that does not exist will also succeed. According to the man
> > > page, we should (and it makes sense) to reject ioctl(UFFDIO_API) if
> > > unknown features passed in.
> >
> > I still strongly disagree with reverting this part, my feeling is
> > still that doing so makes things more complicated for no reason.
> >
> > Re: David's point, it's clearly wrong to change semantics so a thing
> > that used to work now fails. But this instead makes it more permissive
> > - existing userspace programs continue to work as-is, but *also* one
> > can achieve the same thing more simply (combine probing +
> > configuration into one step). I don't see any problem with that,
> > generally.
> >
> > But, if David and others don't find my argument convincing, it isn't
> > the end of the world. It just means I have to go update my userspace
> > code to be a bit more complicated. :)
>
> I'd say it's fine if it's your own program that you can in full control
> easily. :) Sorry again for not noticing that earlier.
>
> There's one reason that we may consider keeping the behavior. IMHO it is
> when there're major softwares that uses the "wrong" ABI (let's say so;
> because it's not following the man pages). If you're aware any such major
> softwares (especially open sourced) will break due to this revert patch,
> please shoot.

Well, I did find one example, criu:
https://github.com/checkpoint-restore/criu/blob/criu-dev/criu/uffd.c#L266
It doesn't do the two-step probing process, it looks to me like it
does what my patch was intending users to do:

It just asks for the requested features up-front, without any probing.
And then it returns the subset of features it *actually* got,
ostensibly so the caller can compare that vs. what it requested.

Then again, it looks like the caller doesn't *actually* compare the
features it got vs. what it asked for. I don't know enough about criu
to know if this is a bug, or if they actually just don't care.
https://github.com/checkpoint-restore/criu/blob/criu-dev/criu/uffd.c#L312

>
> >
> > I also still think the man page is incorrect or at least incomplete no
> > matter what we do here; we should be sure to update it as a follow-up.
>
> So far it looks still fine if with this revert. Maybe I overlooked
> somewhere?
>
> I'll add this into my todo, but with low priority. If you have suggestion
> already on how to improve the man page please do so before me!

My thinking is that it describes the bits and pieces, but doesn't
explicitly describe end-to-end how to configure a userfaultfd using
the two-step probing approach. (Or state that this is actually
*required*, unless you only want to set features=0 in any case.)

Maybe it will be easiest if I just send a patch myself with what I'm
thinking, and we can see what folks think. Always easier to just look
at a patch vs. talking about it in the abstract. :)

>
> --
> Peter Xu
>