Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
From: Peter Xu
Date: Fri Mar 15 2019 - 04:27:07 EST
On Wed, Mar 13, 2019 at 10:50:48AM -0700, Mike Kravetz wrote:
> On 3/12/19 11:00 PM, Peter Xu wrote:
> > On Tue, Mar 12, 2019 at 12:59:34PM -0700, Mike Kravetz wrote:
> >> On 3/11/19 2:36 AM, Peter Xu wrote:
> >>>
> >>> The "kvm" entry is a bit special here only to make sure that existing
> >>> users like QEMU/KVM won't break by this newly introduced flag. What
> >>> we need to do is simply set the "unprivileged_userfaultfd" flag to
> >>> "kvm" here to automatically grant userfaultfd permission for processes
> >>> like QEMU/KVM without extra code to tweak these flags in the admin
> >>> code.
> >>
> >> Another user is Oracle DB, specifically with hugetlbfs. For them, we would
> >> like to add a special case like kvm described above. The admin controls
> >> who can have access to hugetlbfs, so I think adding code to the open
> >> routine as in patch 2 of this series would seem to work.
> >
> > Yes I think if there's an explicit and safe place we can hook for
> > hugetlbfs then we can do the similar trick as KVM case. Though I
> > noticed that we can not only create hugetlbfs files under the
> > mountpoint (which the admin can control), but also using some other
> > ways. The question (of me... sorry if it's a silly one!) is whether
> > all other ways to use hugetlbfs is still under control of the admin.
> > One I know of is memfd_create() which seems to be doable even as
> > unprivileged users. If so, should we only limit the uffd privilege to
> > those hugetlbfs users who use the mountpoint directly?
>
> Wow! I did not realize that apps which specify mmap(MAP_HUGETLB) do not
> need any special privilege to use huge pages. Honestly, I am not sure if
> that was by design or a bug. The memfd_create code is based on the MAP_HUGETLB
> code and also does not need any special privilege. Not to sidetrack this
> discussion, but people on Cc may know if this is a bug or by design. My
> opinion is that huge pages are a limited resource and should be under control.
> One needs to be a member of a special group (or root) to access via System V
> interfaces.
Yeah I completely agree that huge pages should need some special
care...
>
> The DB use case only does mmap of files in an explicitly mounted filesystem.
> So, limiting it in that manner would work for them.
>
> > Another question is about fork() of privileged processes - for KVM we
> > only grant privilege for the exact process that opened the /dev/kvm
> > node, and the privilege will be lost for any forked childrens. Is
> > that the same thing for OracleDB/Hugetlbfs?
>
> I need to confirm with the DB people, but it is my understanding that the
> exact process which does the open/mmap will be the one using userfaultfd.
It'll be nice if these can be confirmed and if above proposal could
still be an alternative for us (grant privilege for processes who do
mknod() upon the hugetlbfs mountpoint; drop privilege when fork as
usual), since IMHO it is still the simplest approach comparing to what
we've discussed in the other threads...
Thanks,
--
Peter Xu