Re: [PATCH v3 0/8] Support for transparent PUD pages for DAX files

From: Matthew Wilcox
Date: Fri Jan 22 2016 - 09:12:10 EST


On Thu, Jan 21, 2016 at 02:48:58PM -0800, mingming cao wrote:
> On 01/08/2016 11:49 AM, Matthew Wilcox wrote:
> > Filesystems still need work to allocate 1GB pages. With ext4, I can
> > only get 16MB of contiguous space, although it is aligned. With XFS,
> > I can get 80MB less than 1GB, and it's not aligned. The XFS problem
> > may be due to the small amount of RAM in my test machine.
>
> I dont think ext4 can do 1G at this time due to extent length bits
> (15 for unwritten) and block group size bundary (well, with flex bg we
> may able to relax this ). I have seen about 125M of contiguous space
> allocated on my fresh new ext4 filesystem. I do remember mballoc in ext4
> used to normalize the allocation request up to 8 or 16M, but it appears
> not that small any more.

I agree that the on-disk ext4 format can't represent a single 1GB
extent (ext4_extent's ee_len is 16 bits), but the in-memory extent tree
(extent_status's es_len) uses a 32-bit block count field, which can
represent an 8TB length extent with 4kB blocks.

It seems that at the moment, something is constraining allocations to be
at most 16MB, so that we can convert one extent_status to one ext4_extent.
What I'd like to see is code to convert one extent_status into multiple
ext4_extents on disc, and recombine multiple ext4_extents into a single
extent_status when the inode is read back in later.

Then we can start looking at places where ext4 puts metadata in the
middle of 1GB regions, preventing them from being used ... that'll be
a separate bag of issues, no doubt.