Re: [PATCH] dm ioctl: Restore __GFP_HIGH in copy_params()

From: Mikulas Patocka
Date: Mon May 22 2017 - 10:53:04 EST




On Mon, 22 May 2017, Michal Hocko wrote:

> On Mon 22-05-17 08:00:11, Mikulas Patocka wrote:
> >
> > On Mon, 22 May 2017, Michal Hocko wrote:
> >
> > > > Sometimes, I/O to a device mapper device is blocked until the userspace
> > > > daemon dmeventd does some action (for example, when dm-mirror leg fails,
> > > > dmeventd needs to mark the leg as failed in the lvm metadata and then
> > > > reload the device).
> > > >
> > > > The dmeventd daemon mlocks itself in memory so that it doesn't generate
> > > > any I/O. But it must be able to call ioctls. __GFP_HIGH is there so that
> > > > the ioctls issued by dmeventd have higher chance of succeeding if some I/O
> > > > is blocked, waiting for dmeventd action. It reduces the possibility of
> > > > low-memory-deadlock, though it doesn't eliminate it entirely.
> > >
> > > So what happens if the memory reserves are depleted. Do we deadlock?
> >
> > Yes, it will deadlock.
>
> That would be more than unfortunate and begs for a different solution.
> The thing is that __GFP_HIGH is not propagated to all allocations in the
> vmalloc proper. E.g. page table allocations are hardcoded GFP_KERNEL.

For a typical device mapper use, the ioctl area is smaller than 4k, so the
vmalloc won't happen.

> > > Why is OOM killer insufficient to allow the further progress?
> >
> > I don't know if the OOM killer will or won't be triggered in this
> > situation, it depends on the people who wrote the OOM killer.
>
> I am not sure I understand. OOM killer is invoked for _all_ allocations
> <= PAGE_ALLOC_COSTLY_ORDER that do not have __GFP_NORETRY as long as the
> OOM killer is not disabled (oom_killer_disable) and that only happens
> from the PM suspend path which makes sure that no userspace is active at
> the time. AFAIU this is a userspace triggered path and so the later
> shouldn't apply to it and GFP_KERNEL should be therefore sufficient.
> Relying to a portion of memory reserves to prevent from deadlock seems
> fundamentaly broken to me.
>
> --
> Michal Hocko
> SUSE Labs

The lvm2 was designed this way - it is broken, but there is not much that
can be done about it - fixing this would mean major rewrite. The only
thing we can do about it is to lower the deadlock probability with
__GFP_HIGH (or PF_MEMALLOC that was used some times ago).

Mikulas