Re: [PATCH v3] sched: reorder some fields in struct rq
From: Peter Zijlstra
Date: Wed Oct 29 2025 - 05:14:43 EST
On Tue, Oct 28, 2025 at 03:15:27PM -0700, Blake Jones wrote:
> Hi Peter,
>
> Thanks for your questions. Here are my thoughts:
>
> On Tue, Oct 28, 2025 at 4:29 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> > queue_mask is modified right along with nr_running -- every
> > enqueue/dequeue. It seems weird to move it away from nr_running.
> >
>
> My main goal with this change is to optimize the first line for read-mostly
> usage - specifically, for the loop in update_sg_lb_stats(), which loads
> data from many rq's without locking them. Data from some Google fleet
> profilers showed about 38x as many loads as stores to these fields.
>
> From what I could tell from your patch, queue_mask is write-mostly -
> enqueue_task() and dequeue_task() update it, and sched_balance_newidle()
> stores to it and loads it later. That's why I didn't put it in the first
> line.
>
> That said, the place where I relocated it to is somewhat arbitrary. I
> mostly put it there because it had some open space on that line, though
> its new neighbors prev_mm and next_balance are also updated somewhat
> frequently. If you have another suggestion I'm quite open to it.
Right. I missed the read-heavy part. It still feels somewhat weird to
split variables you know are modified together, but alas.
Also, I might be reworking/removing queue_mask again soon, we'll see if
that pans out.
> > Does it not make more sense to put the lock in the same cacheline as the
> > clock fields?
> >
>
> Great question. When I was first playing around with this reordering, I did
> some more substantial movement within struct rq. But for the versions I've
> sent out, I decided to keep it focused on just a few areas where I could
> tell there was clustered usage. I didn't want to over-fit the placement of
> a whole lot of fields in struct rq, since the code inevitably changes.
>
> So I'd be open to moving the lock to the same line as the clock fields if
> you think it would be a perf win, but I don't personally have any perf
> numbers that could demonstrate the merits of doing that. WDYT?
Most places that take rq->lock also update rq->clock. So I was wondering
if it made sense to put them together. I don't have numbers either.
> > nr_iowait has cross CPU updates and would be subject to false sharing.
> >
>
> Mmm, good point. I moved it back before the clock-related cache line to
> balance out moving "clock" and "clock_update_flags" to that line, since
> otherwise it was going to add an extra line to the structure. But also,
> it had been on the same line as the various clock fields, so that false
> sharing was already not doing us any favors.
>
> The line that I've moved it to has your queue_mask, so that's probably not
> the best place. How about if I move it to the end of the structure, next to
> cfsb_csd and cfsb_csd_list, since those don't seem to get used very often?
Works for me.
> > Does the thing want a __no_randomize_layout ?
> >
>
> Well, this is by far one of the hottest kernel data structures we see at
> Google, and it's not an obvious thing for a security vuln to target. So I'd
> be quite happy to add __no_randomize_layout; if you agree, I'll add it.
Lets make it so.
Thanks!