Re: [RFC PATCH 00/79] Generic page write protection and a solution to page waitqueue

From: Tim Chen
Date: Fri Apr 20 2018 - 15:57:48 EST


On 04/04/2018 12:17 PM, jglisse@xxxxxxxxxx wrote:
> From: JÃrÃme Glisse <jglisse@xxxxxxxxxx>
>
> https://cgit.freedesktop.org/~glisse/linux/log/?h=generic-write-protection-rfc
>
> This is an RFC for LSF/MM discussions. It impacts the file subsystem,
> the block subsystem and the mm subsystem. Hence it would benefit from
> a cross sub-system discussion.
>
> Patchset is not fully bake so take it with a graint of salt. I use it
> to illustrate the fact that it is doable and now that i did it once i
> believe i have a better and cleaner plan in my head on how to do this.
> I intend to share and discuss it at LSF/MM (i still need to write it
> down). That plan lead to quite different individual steps than this
> patchset takes and his also easier to split up in more manageable
> pieces.
>
> I also want to apologize for the size and number of patches (and i am
> not even sending them all).
>
> ----------------------------------------------------------------------
> The Why ?
>
> I have two objectives: duplicate memory read only accross nodes and or
> devices and work around PCIE atomic limitations. More on each of those
> objective below. I also want to put forward that it can solve the page
> wait list issue ie having each page with its own wait list and thus
> avoiding long wait list traversale latency recently reported [1].
>
> It does allow KSM for file back pages (truely generic KSM even between
> both anonymous and file back page). I am not sure how useful this can
> be, this was not an objective i did pursue, this is just a for free
> feature (see below).
>
> [1] https://groups.google.com/forum/#!topic/linux.kernel/Iit1P5BNyX8
>
> ----------------------------------------------------------------------
> Per page wait list, so long page_waitqueue() !
>
> Not implemented in this RFC but below is the logic and pseudo code
> at bottom of this email.
>
> When there is a contention on struct page lock bit, the caller which
> is trying to lock the page will add itself to a waitqueue. The issues
> here is that multiple pages share the same wait queue and on large
> system with a lot of ram this means we can quickly get to a long list
> of waiters for differents pages (or for the same page) on the same
> list [1].

Your approach seems useful if there are lots of locked pages sharing
the same wait queue.

That said, in the original workload from our customer with the long wait queue
problem, there was a single super hot page getting migrated, and it
is being accessed by all threads which caused the big log jam while they wait for
the migration to get completed.
With your approach, we will still likely end up with a long queue
in that workload even if we have per page wait queue.

Thanks.

Tim