Re: [PATCH v5 00/27] Memory Folios
From: Matthew Wilcox
Date: Thu Apr 01 2021 - 13:52:17 EST
On Thu, Apr 01, 2021 at 05:05:37AM +0000, Al Viro wrote:
> On Tue, Mar 30, 2021 at 10:09:29PM +0100, Matthew Wilcox wrote:
>
> > That's a very Intel-centric way of looking at it. Other architectures
> > support a multitude of page sizes, from the insane ia64 (4k, 8k, 16k, then
> > every power of four up to 4GB) to more reasonable options like (4k, 32k,
> > 256k, 2M, 16M, 128M). But we (in software) shouldn't constrain ourselves
> > to thinking in terms of what the hardware currently supports. Google
> > have data showing that for their workloads, 32kB is the goldilocks size.
> > I'm sure for some workloads, it's much higher and for others it's lower.
> > But for almost no workload is 4kB the right choice any more, and probably
> > hasn't been since the late 90s.
>
> Out of curiosity I looked at the distribution of file sizes in the
> kernel tree:
> 71455 files total
> 0--4Kb 36702
> 4--8Kb 11820
> 8--16Kb 10066
> 16--32Kb 6984
> 32--64Kb 3804
> 64--128Kb 1498
> 128--256Kb 393
> 256--512Kb 108
> 512Kb--1Mb 35
> 1--2Mb 25
> 2--4Mb 5
> 4--6Mb 7
> 6--8Mb 4
> 12Mb 2
> 14Mb 1
> 16Mb 1
>
> ... incidentally, everything bigger than 1.2Mb lives^Wshambles under
> drivers/gpu/drm/amd/include/asic_reg/
I'm just going to edit this table to add a column indicating ratio
to previous size:
> Page size Footprint
> 4Kb 1128Mb
> 8Kb 1324Mb 1.17
> 16Kb 1764Mb 1.33
> 32Kb 2739Mb 1.55
> 64Kb 4832Mb 1.76
> 128Kb 9191Mb 1.90
> 256Kb 18062Mb 1.96
> 512Kb 35883Mb 1.98
> 1Mb 71570Mb 1.994
> 2Mb 142958Mb 1.997
>
> So for kernel builds (as well as grep over the tree, etc.) uniform 2Mb pages
> would be... interesting.
Yep, that's why I opted for a "start out slowly and let readahead tell me
when to increase the page size" approach.
I think Johannes' real problem is that slab and page cache / anon pages
are getting intermingled. We could solve this by having slab allocate
2MB pages from the page allocator and then split them up internally
(so not all of that 2MB necessarily goes to a single slab cache, but all
of that 2MB goes to some slab cache).