Re: [PATCH v8 01/10] fs: Allow fine-grained control of folio sizes

From: Ryan Roberts
Date: Fri Jul 05 2024 - 05:05:56 EST


On 05/07/2024 05:32, Dave Chinner wrote:
> On Fri, Jul 05, 2024 at 12:56:28AM +0100, Matthew Wilcox wrote:
>> On Fri, Jul 05, 2024 at 08:06:51AM +1000, Dave Chinner wrote:
>>>>> It seems strange to silently clamp these? Presumably for the bs>ps usecase,
>>>>> whatever values are passed in are a hard requirement? So wouldn't want them to
>>>>> be silently reduced. (Especially given the recent change to reduce the size of
>>>>> MAX_PAGECACHE_ORDER to less then PMD size in some cases).
>>>>
>>>> Hm, yes. We should probably make this return an errno. Including
>>>> returning an errno for !IS_ENABLED() and min > 0.
>>>
>>> What are callers supposed to do with an error? In the case of
>>> setting up a newly allocated inode in XFS, the error would be
>>> returned in the middle of a transaction and so this failure would
>>> result in a filesystem shutdown.
>>
>> I suggest you handle it better than this. If the device is asking for a
>> blocksize > PMD_SIZE, you should fail to mount it.

A detail, but MAX_PAGECACHE_ORDER may be smaller than PMD_SIZE even on systems
with CONFIG_TRANSPARENT_HUGEPAGE as of a fix that is currently in mm-unstable:

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
#define PREFERRED_MAX_PAGECACHE_ORDER HPAGE_PMD_ORDER
#else
#define PREFERRED_MAX_PAGECACHE_ORDER 8
#endif

/*
* xas_split_alloc() does not support arbitrary orders. This implies no
* 512MB THP on ARM64 with 64KB base page size.
*/
#define MAX_XAS_ORDER (XA_CHUNK_SHIFT * 2 - 1)
#define MAX_PAGECACHE_ORDER min(MAX_XAS_ORDER,
PREFERRED_MAX_PAGECACHE_ORDER)

But that also implies that the page cache can handle up to order-8 without
CONFIG_TRANSPARENT_HUGEPAGE so sounds like there isn't a dependcy on
CONFIG_TRANSPARENT_HUGEPAGE in this respect?



>
> That's my point: we already do that.
>
> The largest block size we support is 64kB and that's way smaller
> than PMD_SIZE on all platforms and we always check for bs > ps
> support at mount time when the filesystem bs > ps.
>
> Hence we're never going to set the min value to anything unsupported
> unless someone makes a massive programming mistake. At which point,
> we want a *hard, immediate fail* so the developer notices their
> mistake immediately. All filesystems and block devices need to
> behave this way so the limits should be encoded as asserts in the
> function to trigger such behaviour.
>
>> If the device is
>> asking for a blocksize > PAGE_SIZE and CONFIG_TRANSPARENT_HUGEPAGE is
>> not set, you should also decline to mount the filesystem.
>
> What does CONFIG_TRANSPARENT_HUGEPAGE have to do with filesystems
> being able to use large folios?
>
> If that's an actual dependency of using large folios, then we're at
> the point where the mm side of large folios needs to be divorced
> from CONFIG_TRANSPARENT_HUGEPAGE and always supported.
> Alternatively, CONFIG_TRANSPARENT_HUGEPAGE needs to selected by the
> block layer and also every filesystem that wants to support
> sector/blocks sizes larger than PAGE_SIZE. IOWs, large folio
> support needs to *always* be enabled on systems that say
> CONFIG_BLOCK=y.
>
> I'd much prefer the former occurs, because making the block layer
> and filesystems dependent on an mm feature they don't actually use
> is kinda weird...
>
> -Dave.
>