Re: Possible deny of service with memfd_create()
From: Michal Hocko
Date: Fri Feb 05 2021 - 05:53:54 EST
On Fri 05-02-21 08:54:31, Christian König wrote:
> Am 05.02.21 um 01:32 schrieb Hugh Dickins:
> > On Thu, 4 Feb 2021, Michal Hocko wrote:
> > > On Thu 04-02-21 17:32:20, Christian Koenig wrote:
> > > > Hi Michal,
> > > >
> > > > as requested in the other mail thread the following sample code gets my test
> > > > system down within seconds.
> > > >
> > > > The issue is that the memory allocated for the file descriptor is not
> > > > accounted to the process allocating it, so the OOM killer pics whatever
> > > > process it things is good but never my small test program.
> > > >
> > > > Since memfd_create() doesn't need any special permission this is a rather
> > > > nice deny of service and as far as I can see also works with a standard
> > > > Ubuntu 5.4.0-65-generic kernel.
> > > Thanks for following up. This is really nasty but now that I am looking
> > > at it more closely, this is not really different from tmpfs in general.
> > > You are free to create files and eat the memory without being accounted
> > > for that memory because that is not seen as your memory from the sysstem
> > > POV. You would have to map that memory to be part of your rss.
>
> I mostly agree. The big difference is that tmpfs is only available when
> mounted.
>
> And tmpfs can be restricted in size per mount point as well as per user
> quotas IIRC. Looking at my desktop system those restrictions are actually
> exactly what I see there.
I cannot find anything about per user quotas for tmpfs in the tmpfs man
page. Or maybe I am looking at a wrong layer and there is a generic
handling somewhere in the vfs core?
> But memfd_create() is just free for all, you don't have any size limit nor
> access restriction as far as I can see.
Yes, this is unfortunate and a design decision that should have been
considered when the syscall has been introduced. But this boat has
sailed looong ago to change that without risking a userspace breakage.
> > > The only existing protection right now is to use memoery cgroup
> > > controller because the tmpfs memory is accounted to the process which
> > > faults the memory in (or write to the file).
>
> Agreed, but having to rely on cgroup is not really satisfying when you have
> to maintain a hardened server.
Yes I do recognize the pain. The only other way to mitigate the risk is
to disallow the syscall to untrusted users in a hardened environment.
You should be very strict in tmpfs usage there already.
--
Michal Hocko
SUSE Labs