Unfortunately even now knowing the place of the issue so well I don't see
the connection to the commits e084b+5f8dcc21
Still a mystery.
- I couldn't find something but
did they change the accounting somewhere or e.g. changed the timing/order
of watermark updates and allocations?
Not that I can think of.
Eventually it might come down to a discussion of allocation priorities and
we might even keep them as is and accept this issue - I still would prefer
a good second chance implementation, other page cache allocation flags or
something else that explicitly solves this issue.
In that line, the patch that replaced congestion_wait() with a waitqueue
makes some sense.
I'll need to do a number of tests before I can move that upstream but I
don't think it's a merge candidate. Unfortunately, I'll be offline for a
week starting tomorrow so I won't be able to do the testing.
When I get back, I'll revisit those patches with the view to pushing
them upstream. I hate to treat symptoms here without knowing the
underlying problem but this has been spinning in circles for ages with
little forward progress :(