Re: [PATCHv5 0/7] Fix compound_head() race

From: Andrea Arcangeli
Date: Fri Sep 04 2015 - 09:44:03 EST


On Thu, Sep 03, 2015 at 03:35:51PM +0300, Kirill A. Shutemov wrote:
> Kirill A. Shutemov (7):
> mm: drop page->slab_page
> slub: use page->rcu_head instead of page->lru plus cast
> zsmalloc: use page->private instead of page->first_page
> mm: pack compound_dtor and compound_order into one word in struct page
> mm: make compound_head() robust
> mm: use 'unsigned int' for page order
> mm: use 'unsigned int' for compound_dtor/compound_order on 64BIT

Reviewed-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>

The only other alternative solution that doesn't require finding a bit
zero at the LSB in a field unused in tail pages, is to drop both
PG_head and PG_tail, and reserve 4 bits from page->flags.

This means a net loss of 2 bits from page->flags (and loss of 3 bits
if !CONFIG_PAGEFLAGS_EXTENDED), but then everything becomes simple and
there's no need of finding a LSB field that is guaranteed zero at all
times.

With those 4 bits, you clear them for not compound pages. When you
create a compound page you encode the compound_order in those 4 bits
of page->flags, equal for for all head and tail
pages. compound_order() then becomes atomically available for tail
pages too and compound_order goes away from struct page along with
first_page (and there's no need to add a compound_head).

In PageCompound you read the 4 bits, if they're not all zero it's
compound, otherwise it's not.

In PageHead/Tail, if the 4 bits are all zero it's not head/tail,
otherwise you do the math on the page_to_pfn(page). If the pfn is
naturally aligned against the order encoded in the 4 bits "!(pfn &
(1<<order)-1)" it's a head, otherwise it's a tail.

If it's a tail, for the compound_head then it's just a matter of doing
"return page - (pfn & ((1<<order)-1)" (no need of pfn_to_page).

This leverages the physical natural alignment of compound pages for
all orders. It'd cover up to CONFIG_FORCE_MAX_ZONEORDER == 16
(128MBytes of order 15 with PAGE_SIZE 4kb).

page_to_pfn can actually be replaced with
(&NODE_DATA(page_to_nid(page))->node_mem_map-page) which is faster as
page_to_nid only need to accesses page->flags which is already in
L1. So then it costs only one cacheline access in the pgdat and a sub.

Because of the two (or three) additional bits taken out of page->flags
I doubt it's viable on 32bit, but I thought I'd mention it just in case.

Thanks,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/