Re: Update cacheline size on X86_GENERIC
From: Nick Piggin
Date: Sat Oct 11 2008 - 09:48:53 EST
On Sunday 12 October 2008 00:11, Andi Kleen wrote:
> > > > That would be nice. It would be interesting to know what is causing
> > > > the slowdown.
> > >
> > > At least that test is extremly cache footprint sensitive. A lot of the
> > > cache misses are surprisingly in hd_struct, because it runs
> > > with hundred of disks and each needs hd_struct references in the fast
> > > path. The recent introduction of fine grained per partition statistics
> > > caused a large slowdown. But I don't think kernel workloads
> > > are normally that extremly cache sensitive.
> >
> > That's interesting. struct device is pretty big. I wonder if fields
>
> Yes it is (it actually can be easily shrunk -- see willy's recent
> patch to remove the struct completion from knodes), but that won't help
> because it will always
> be larger than a cache line and it's in the middle, so the
> accesses to first part of it and last part of it will be separate.
>
> > couldn't be rearranged to minimise the fastpath cacheline footprint?
> > I guess that's already been looked at?
>
> Yes, but not very intensively. So far I was looking for more
> detailed profiling data to see the exact accesses.
>
> Of course if you have any immediate ideas that could be tried too.
No immediate ideas. Jens probably is a good person to cc. With direct IO
workloads, hd_struct should mostly only be touched in partition remapping
and IO accounting.
start_sect, nr_sects would be read for partition remapping.
*dkstats will be read to do accounting (dkstats for UP is written, but
false sharing doesn't matter on UP), as does partno.
These could all go together at the top of the struct perhaps.
struct device->parent gets read as well. This might go at the top of
struct device, which could come next.
stamp and in_flight are tricky, as they get both read and written often
:(
Still, you might just be able to fit them into the same 64-byte cacheline
as well as all the above fields.
At this point, you would want to cacheline align hd_struct. So if you
want to do that dynamically, you would need to change the disk_part_tbl
scheme (but at least you could test with static annotations first).
The other thing I notice is the block layer has some functions which
have error paths that have BDEVNAME_SIZE size arrays for error cases,
which gcc may not do well at. Probably they should go out to noinline
functions.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/