Re: [PATCH v7 11/11] xfs: enable block size larger than page size support
From: Pankaj Raghav (Samsung)
Date: Mon Jun 17 2024 - 12:55:02 EST
On Mon, Jun 17, 2024 at 08:51:04AM +0200, Christoph Hellwig wrote:
> On Mon, Jun 17, 2024 at 11:29:42AM +1000, Dave Chinner wrote:
> > > > + if (mp->m_sb.sb_blocksize > PAGE_SIZE)
> > > > + igeo->min_folio_order = mp->m_sb.sb_blocklog - PAGE_SHIFT;
> > > > + else
> > > > + igeo->min_folio_order = 0;
> > > > }
> > >
> > > The minimum folio order isn't really part of the inode (allocation)
> > > geometry, is it?
> >
> > I suggested it last time around instead of calculating the same
> > constant on every inode allocation. We're already storing in-memory
> > strunct xfs_inode allocation init values in this structure. e.g. in
> > xfs_inode_alloc() we see things like this:
>
> While new_diflags2 isn't exactly inode geometry, it at least is part
> of the inode allocation. Folio min order for file data has nothing
> to do with this at all.
>
> > The only other place we might store it is the struct xfs_mount, but
> > given all the inode allocation constants are already in the embedded
> > mp->m_ino_geo structure, it just seems like a much better idea to
> > put it will all the other inode allocation constants than dump it
> > randomly into the struct xfs_mount....
>
> Well, it is very closely elated to say the m_blockmask field in
> struct xfs_mount. The again modern CPUs tend to get a you simple
> subtraction for free in most pipelines doing other things, so I'm
> not really sure it's worth caching for use in inode allocation to
> start with, but I don't care strongly about that.
But there will also be an extra conditional apart from subtraction
right?
Initially it was something like this:
@@ -73,6 +73,7 @@ xfs_inode_alloc(
xfs_ino_t ino)
{
struct xfs_inode *ip;
+ int min_order = 0;
/*
* XXX: If this didn't occur in transactions, we could drop GFP_NOFAIL
@@ -88,7 +89,8 @@ xfs_inode_alloc(
/* VFS doesn't initialise i_mode or i_state! */
VFS_I(ip)->i_mode = 0;
VFS_I(ip)->i_state = 0;
- mapping_set_large_folios(VFS_I(ip)->i_mapping);
+ min_order = max(min_order, ilog2(mp->m_sb.sb_blocksize) - PAGE_SHIFT);
+ mapping_set_folio_orders(VFS_I(ip)->i_mapping, min_order, MAX_PAGECACHE_ORDER);
XFS_STATS_INC(mp, vn_active);
ASSERT(atomic_read(&ip->i_pincount) == 0);
@@ -313,6 +315,7 @@ xfs_reinit_inode(
dev_t dev = inode->i_rdev;
kuid_t uid = inode->i_uid;
kgid_t gid = inode->i_gid;
+ int min_order = 0;
error = inode_init_always(mp->m_super, inode);
@@ -323,7 +326,8 @@ xfs_reinit_inode(
inode->i_rdev = dev;
inode->i_uid = uid;
inode->i_gid = gid;
- mapping_set_large_folios(inode->i_mapping);
+ min_order = max(min_order, ilog2(mp->m_sb.sb_blocksize) - PAGE_SHIFT);
+ mapping_set_folio_orders(inode->i_mapping, min_order, MAX_PAGECACHE_ORDER);
return error;
}
It does introduce a conditional in the inode allocation hot path so I
went with what Chinner proposed as it is something we use when we
initialize an inode.