Re: [PATCH v1 0/5] s390: page_mapcount(), page_has_private() and PG_arch_1

From: Matthew Wilcox
Date: Thu Apr 04 2024 - 23:43:03 EST


On Thu, Apr 04, 2024 at 06:36:37PM +0200, David Hildenbrand wrote:
> On my journey to remove page_mapcount(), I got hooked up on other folio
> cleanups that Willy most certainly will enjoy.
>
> This series removes the s390x usage of:
> * page_mapcount() [patches WIP]
> * page_has_private() [have patches to remove it]
>
> ... and makes PG_arch_1 only be set on folio->flags (i.e., never on tail
> pages of large folios).
>
> Further, one "easy" fix upfront.

Looks like you didn't see:

https://lore.kernel.org/linux-s390/20240322161149.2327518-1-willy@xxxxxxxxxxxxx/

> ... unfortunately there is one other issue I spotted that I am not
> tackling in this series, because I am not 100% sure what we want to
> do: the usage of page_ref_freeze()/folio_ref_freeze() in
> make_folio_secure() is unsafe. :(
>
> In make_folio_secure(), we're holding the folio lock, the mmap lock and
> the PT lock. So we are protected against concurrent fork(), zap, GUP,
> swapin, migration ... The page_ref_freeze()/ folio_ref_freeze() should
> also block concurrent GUP-fast very reliably.
>
> But if the folio is mapped into multiple page tables, we could see
> concurrent zapping of the folio, a pagecache folios could get mapped/
> accessed concurrent, we could see fork() sharing the page in another
> process, GUP ... trying to adjust the folio refcount while we froze it.
> Very bad.

Hmmm. Why is that not then a problem for, eg, splitting or migrating?
Is it because they unmap first and then try to freeze?

> For anonymous folios, it would likely be sufficient to check that
> folio_mapcount() == 1. For pagecache folios, that's insufficient, likely
> we would have to lock the pagecache. To handle folios mapped into
> multiple page tables, we would have to do what
> split_huge_page_to_list_to_order() does (temporary migration entries).
>
> So it's a bit more involved, and I'll have to leave that to s390x folks to
> figure out. There are othe reasonable cleanups I think, but I'll have to
> focus on other stuff.
>
> Compile tested, but not runtime tested, I'll appreiate some testing help
> from people with UV access and experience.
>
> Cc: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> Cc: Heiko Carstens <hca@xxxxxxxxxxxxx>
> Cc: Vasily Gorbik <gor@xxxxxxxxxxxxx>
> Cc: Alexander Gordeev <agordeev@xxxxxxxxxxxxx>
> Cc: Christian Borntraeger <borntraeger@xxxxxxxxxxxxx>
> Cc: Sven Schnelle <svens@xxxxxxxxxxxxx>
> Cc: Janosch Frank <frankja@xxxxxxxxxxxxx>
> Cc: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx>
> Cc: Gerald Schaefer <gerald.schaefer@xxxxxxxxxxxxx>
> Cc: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> Cc: Thomas Huth <thuth@xxxxxxxxxx>
>
> David Hildenbrand (5):
> s390/uv: don't call wait_on_page_writeback() without a reference
> s390/uv: convert gmap_make_secure() to work on folios
> s390/uv: convert PG_arch_1 users to only work on small folios
> s390/uv: update PG_arch_1 comment
> s390/hugetlb: convert PG_arch_1 code to work on folio->flags
>
> arch/s390/include/asm/page.h | 2 +
> arch/s390/kernel/uv.c | 112 ++++++++++++++++++++++-------------
> arch/s390/mm/gmap.c | 4 +-
> arch/s390/mm/hugetlbpage.c | 8 +--
> 4 files changed, 79 insertions(+), 47 deletions(-)
>
> --
> 2.44.0
>