Re: [PATCH v2 2/2] mm/pageblock: remove false sharing in pageblock_flags

From: Alexander Duyck
Date: Wed Aug 19 2020 - 12:50:58 EST


On Wed, Aug 19, 2020 at 1:11 AM Alex Shi <alex.shi@xxxxxxxxxxxxxxxxx> wrote:
>
>
>
> 在 2020/8/19 下午3:57, Anshuman Khandual 写道:
> >
> >
> > On 08/19/2020 11:17 AM, Alex Shi wrote:
> >> Current pageblock_flags is only 4 bits, so it has to share a char size
> >> in cmpxchg when get set, the false sharing cause perf drop.
> >>
> >> If we incrase the bits up to 8, false sharing would gone in cmpxchg. and
> >> the only cost is half char per pageblock, which is half char per 128MB
> >> on x86, 4 chars in 1 GB.
> >
> > Agreed that increase in memory utilization is negligible here but does
> > this really improve performance ?
> >
>
> It's no doubt in theory. and it would had a bad impact according to
> commit e380bebe4771548 mm, compaction: keep migration source private to a single
>
> but I do have some problem in running thpscale/mmtest. I'd like to see if anyone
> could give a try.
>
> BTW, I naturally hate the false sharing even it's in theory. Anyone who doesn't? :)

You keep bringing up false sharing but you don't fix the false sharing
by doing this. You are still allowing the flags for multiple
pageblocks per cacheline so you still have false sharing even after
this.

What I believe you are attempting to address is the fact that multiple
pageblocks share a single long value and that long is being used with
a cmpxchg so you end up with multiple threads potentially all banging
on the same value and watching it change. However the field currently
consists of only 4 bits, 3 of them for migratetype and 1 for the skip
bit. In the case of the 3 bit portion a cmpxchg makes sense and is
usually protected by the zone lock so you would only have one thread
accessing it in most cases with the possible exception of a section
that spans multiple zones.

For the case such as the skip bit and MIGRATE_UNMOVABLE (0x0) where we
would be clearing or setting the entire mask maybe it would make more
sense to simply use an atomic_or or atomic_and depending on if you are
setting or clearing the flag? It would allow you to avoid the spinning
or having to read the word before performing the operation since you
would just be directly applying an AND or OR via a mask value.