Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t

From: Rik van Riel
Date: Fri May 08 2015 - 09:46:17 EST

Next message: Rafael J. Wysocki: "Re: [PATCH v3 4/6] cpufreq: powernv: Call throttle_check() on receiving OCC_THROTTLE"
Previous message: Rafael J. Wysocki: "[PATCH] cpuidle: Fix the kerneldoc comment for cpuidle_enter_state()"
In reply to: Al Viro: "Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t"
Next in thread: Ingo Molnar: "Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 05/07/2015 03:11 PM, Ingo Molnar wrote:

> Stable, global page-struct descriptors are a given for real RAM, where
> we allocate a struct page for every page in nice, large, mostly linear
> arrays.
>
> We'd really need that for pmem too, to get the full power of struct
> page: and that means allocating them in nice, large, predictable
> places - such as on the device itself ...
>
> It might even be 'scattered' across the device, with 64 byte struct
> page size we can pack 64 descriptors into a single page, so every 65
> pages we could have a page-struct page.
>
> Finding a pmem page's struct page would thus involve rounding it
> modulo 65 and reading that page.
>
> The problem with that is fourfold:
>
> - that we now turn a very kernel internal API and data structure into
> an ABI. If struct page grows beyond 64 bytes it's a problem.
>
> - on bootup (or device discovery time) we'd have to initialize all
> the page structs. We could probably do this in a hierarchical way,
> by dividing continuous pmem ranges into power-of-two groups of
> blocks, and organizing them like the buddy allocator does.
>
> - 1.5% of storage space lost.
>
> - will wear-leveling properly migrate these 'hot' pages around?

MST and I have been doing some thinking about how to address some of
the issues above.

One way could be to invert the PG_compound logic we have today, by
allocating one struct page for every PMD / THP sized area (2MB on
x86), and dynamically allocating struct pages for the 4kB pages
inside only if the area gets split. They can be freed again when
the area is not being accessed in 4kB chunks.

That way we would always look at the struct page for the 2MB area
first, and if the PG_split bit is set, we look at the array of
dynamically allocated struct pages for this area.

The advantages are obvious: boot time memory overhead and
initialization time are reduced by a factor 512. CPUs could also
take a whole 2MB area in order to do CPU-local 4kB allocations,
defragmentation policies may become a little clearer, etc...

The disadvantage is pretty obvious too: 4kB pages would no longer
be the fast case, with an indirection. I do not know how much of
an issue that would be, or whether it even makes sense for 4kB
pages to continue being the fast case going forward.

Memory trends point in one direction, file size trends in another.

For persistent memory, we would not need 4kB page struct pages unless
memory from a particular area was in small files AND those files were
being actively accessed. Large files (mapped in 2MB chunks) or inactive
small files would not need the 4kB page structs around.

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Rafael J. Wysocki: "Re: [PATCH v3 4/6] cpufreq: powernv: Call throttle_check() on receiving OCC_THROTTLE"
Previous message: Rafael J. Wysocki: "[PATCH] cpuidle: Fix the kerneldoc comment for cpuidle_enter_state()"
In reply to: Al Viro: "Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t"
Next in thread: Ingo Molnar: "Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn_t"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]