Re: [PATCH 1/3] lockdep: Apply crossrelease to PG_locked locks

From: Byungchul Park
Date: Thu Nov 23 2017 - 22:03:02 EST


On Thu, Nov 16, 2017 at 02:07:46PM +0100, Michal Hocko wrote:
> On Thu 16-11-17 21:48:05, Byungchul Park wrote:
> > On 11/16/2017 9:02 PM, Michal Hocko wrote:
> > > for each struct page. So you are doubling the size. Who is going to
> > > enable this config option? You are moving this to page_ext in a later
> > > patch which is a good step but it doesn't go far enough because this
> > > still consumes those resources. Is there any problem to make this
> > > kernel command line controllable? Something we do for page_owner for
> > > example?
> >
> > Sure. I will add it.
> >
> > > Also it would be really great if you could give us some measures about
> > > the runtime overhead. I do not expect it to be very large but this is
> >
> > The major overhead would come from the amount of additional memory
> > consumption for 'lockdep_map's.
>
> yes
>
> > Do you want me to measure the overhead by the additional memory
> > consumption?
> >
> > Or do you expect another overhead?
>
> I would be also interested how much impact this has on performance. I do
> not expect it would be too large but having some numbers for cache cold
> parallel kbuild or other heavy page lock workloads.

Hello Michal,

I measured 'cache cold parallel kbuild' on my qemu machine. The result
varies much so I cannot confirm, but I think there's no meaningful
difference between before and after applying crossrelease to page locks.

Actually, I expect little overhead in lock_page() and unlock_page() even
after applying crossreleas to page locks, but only expect a bit overhead
by additional memory consumption for 'lockdep_map's per page.

I run the following instructions within "QEMU x86_64 4GB memory 4 cpus":

make clean
echo 3 > drop_caches
time make -j4

The results are:

# w/o page lock tracking

At the 1st try,
real 5m28.105s
user 17m52.716s
sys 3m8.871s

At the 2nd try,
real 5m27.023s
user 17m50.134s
sys 3m9.289s

At the 3rd try,
real 5m22.837s
user 17m34.514s
sys 3m8.097s

# w/ page lock tracking

At the 1st try,
real 5m18.158s
user 17m18.200s
sys 3m8.639s

At the 2nd try,
real 5m19.329s
user 17m19.982s
sys 3m8.345s

At the 3rd try,
real 5m19.626s
user 17m21.363s
sys 3m9.869s

I think thers's no meaningful difference on my small machine.

--
Thanks,
Byungchul