Re: [PATCH 1/2] lockref: speculatively spin waiting for the lock to be released

From: Mateusz Guzik
Date: Thu Jun 13 2024 - 14:48:14 EST


On Thu, Jun 13, 2024 at 8:43 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, 13 Jun 2024 at 11:13, Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:
> >
> > I would assume the rule is pretty much well known and instead an
> > indicator where is what (as added in my comments) would be welcome.
>
> Oh, the rule is well-known, but I think what is worth pointing out is
> the different classes of fields, and the name[] field in particular.
>
> This ordering was last really updated back in 2011, by commit
> 44a7d7a878c9 ("fs: cache optimise dentry and inode for rcu-walk"). And
> it was actually somewhat intentional at the time. Quoting from that
> commit:
>
> We also fit in 8 bytes of inline name in the first 64 bytes, so for short
> names, only 64 bytes needs to be touched to perform the lookup. We should
> get rid of the hash->prev pointer from the first 64 bytes, and fit 16 bytes
> of name in there, which will take care of 81% rather than 32% of the kernel
> tree.
>

Right. Things degrading by accident is why I suggested BUILD_BUG_ON.

> but what has actually really changed - and that I didn't even realize
> until I now did a 'pahole' on it, was that this was all COMPLETELY
> broken by
>
> seqcount_spinlock_t d_seq;
>
> because seqcount_spinlock_t has been entirely broken and went from
> being 4 bytes back when, to now being 64 bytes.
>
> Crazy crazy.
>
> How did that happen?
>

perhaps lockdep in your config?

this is the layout on my prod config:
struct dentry {
unsigned int d_flags; /* 0 4 */
seqcount_spinlock_t d_seq; /* 4 4 */
struct hlist_bl_node d_hash; /* 8 16 */
struct dentry * d_parent; /* 24 8 */
struct qstr d_name; /* 32 16 */
struct inode * d_inode; /* 48 8 */
unsigned char d_iname[40]; /* 56 40 */
/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
const struct dentry_operations * d_op; /* 96 8 */
struct super_block * d_sb; /* 104 8 */
long unsigned int d_time; /* 112 8 */
void * d_fsdata; /* 120 8 */
/* --- cacheline 2 boundary (128 bytes) --- */
struct lockref d_lockref
__attribute__((__aligned__(8))); /* 128 8 */
union {
struct list_head d_lru; /* 136 16 */
wait_queue_head_t * d_wait; /* 136 8 */
}; /* 136 16 */
struct hlist_node d_sib; /* 152 16 */
struct hlist_head d_children; /* 168 8 */
union {
struct hlist_node d_alias; /* 176 16 */
struct hlist_bl_node d_in_lookup_hash; /* 176 16 */
struct callback_head d_rcu
__attribute__((__aligned__(8))); /* 176 16 */
} d_u __attribute__((__aligned__(8))); /* 176 16 */

/* size: 192, cachelines: 3, members: 16 */
/* forced alignments: 2 */
} __attribute__((__aligned__(8)));


--
Mateusz Guzik <mjguzik gmail.com>