Re: [PATCH 0/2] scop GFP_NOFS api

From: NeilBrown
Date: Tue May 03 2016 - 19:26:55 EST


On Wed, May 04 2016, Michal Hocko wrote:

> Hi,
>
> On Sun 01-05-16 07:55:31, NeilBrown wrote:
> [...]
>> One particular problem with your process-context idea is that it isn't
>> inherited across threads.
>> Steve Whitehouse's example in gfs shows how allocation dependencies can
>> even cross into user space.
>
> Hmm, I am still not sure I understand that example completely but making
> a dependency between direct reclaim and userspace can hardly work.

No it can't. Specifically: if direct reclaim blocks on user-space that
must be a problem.
I think the point of this example is that some filesystem things can
block on user-space in ways that are very hard to encode in with flags
as they are multi-level indirect.
So the conclusion (my conclusion) is that direct reclaim mustn't block.

When I was working on deadlock avoidance in loop-back NFS I went down
the path of adding GFP flags and extended the PF_FSTRANS flag and got it
working (think) but no-one liked it. It was way too intrusive.

Some how I landed on the idea of making nfs_release_page non blocking
and everything suddenly became much much simpler. Problems evaporated.

NFS has a distinct advantage here. The "Close-to-open" cache semantic
means that all dirty pages must be flushed on last close. So when
->evict_inode is finally called there is nothing much to do - just free
everything up. So I could fix NFS without worrying about (or even
noticing) evict_inode.


> Especially when the direct reclaim might be sitting on top of hard to
> guess pile of locks. So unless I've missed anything what Steve has
> described is a clear NOFS context.
>
>> A more localized one that I have seen is that NFSv4 sometimes needs to
>> start up a state-management thread (particularly if the server
>> restarted).
>> It uses kthread_run(), which doesn't actually create the thread but asks
>> kthreadd to do it. If NFS writeout is waiting for state management it
>> would need to make sure that kthreadd runs in allocation context to
>> avoid deadlock.
>> I feel that I've forgotten some important detail here and this might
>> have been fixed somehow, but the point still stands that the allocation
>> context can cross from thread to thread and can effectively become
>> anything and everything.
>
> Not sure I understand your point here but relying on kthread_run
> from GFP_NOFS context has always been deadlock prone with or without
> scope GFP_NOFS semantic so I am not really sure I see your point
> here. Similarly relying on a work item which doesn't have a dedicated
> WQ_MEM_RECLAIM WQ is deadlock prone. You simply shouldn't do that.

The point is really that saying "You shouldn't do that" isn't much good
when "that" is exactly what the fs developer wants to do and it seems to
work and never triggers a warning.

You can create lots of rules about what is or is not allowed, or
you can make everything that it not explicit forbidden (ideally at
compile time but possibly at runtime with might_sleep or lockdep),
permitted.

I prefer the latter.

>
>> It is OK to wait for memory to be freed. It is not OK to wait for any
>> particular piece of memory to be freed because you don't always know who
>> is waiting for you, or who you really are waiting on to free that
>> memory.
>>
>> Whenever trying to free memory I think you need to do best-effort
>> without blocking.
>
> I agree with that. Or at least you have to wait on something that is
> _guaranteed_ to make a forward progress. I am not really that sure this
> is easy to achieve with the current code base.

I accept that it isn't "easy". But I claim that it isn't particularly
difficult either.

NeilBrown

Attachment: signature.asc
Description: PGP signature