Re: [PATCH v2 0/5] Switch arm64 over to qrwlock

From: Will Deacon
Date: Mon Oct 09 2017 - 05:59:39 EST


Hi Yury,

On Mon, Oct 09, 2017 at 12:30:52AM +0300, Yury Norov wrote:
> On Fri, Oct 06, 2017 at 02:34:37PM +0100, Will Deacon wrote:
> > This is version two of the patches I posted yesterday:
> >
> > http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534666.html
> >
> > I'd normally leave it longer before posting again, but Peter had a good
> > suggestion to rework the layout of the lock word, so I wanted to post a
> > version that follows that approach.
> >
> > I've updated my branch if you're after the full patch stack:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git qrwlock
> >
> > As before, all comments (particularly related to testing and performance)
> > welcome!
> >
> I tested your patches with locktorture and found measurable performance
> regression. I also respin the patch of Jan Glauber [1], and I also
> tried Jan's patch with patch 5 from this series. Numbers differ a lot
> from my previous measurements, but since that I changed working
> station and use qemu with the support of parallel threads.
> Spinlock Read-RW lock Write-RW lock
> Vanilla: 129804626 12340895 14716138
> This series: 113718002 10982159 13068934
> Jan patch: 117977108 11363462 13615449
> Jan patch + #5: 121483176 11696728 13618967
>
> The bottomline of discussion [1] was that queued locks are more
> effective when SoC has many CPUs. And 4 is not many. My measurement
> was made on the 4-CPU machine, and it seems it confirms that. Does
> it make sense to make queued locks default for many-CPU machines only?

Just to confirm, you're running this under qemu on an x86 host, using full
AArch64 system emulation? If so, I really don't think we should base the
merits of qrwlocks on arm64 around this type of configuration. Given that
you work for a silicon vendor, could you try running on real arm64 hardware
instead, please? My measurements on 6-core and 8-core systems look a lot
better with qrwlock than what we currently have in mainline, and they
also fix a real starvation issue reported by Jeremy [1].

I'd also add that lock fairness comes at a cost, so I'd expect a small drop
in total throughput for some workloads. I encourage you to try passing
different arguments to locktorture to see this in action. For example, on
an 8-core machine:

# insmod ./locktorture.ko nwriters_stress=2 nreaders_stress=8 torture_type="rw_lock_irq" stat_interval=2

-rc3:

Writes: Total: 6612 Max/Min: 0/0 Fail: 0
Reads : Total: 1265230 Max/Min: 0/0 Fail: 0
Writes: Total: 6709 Max/Min: 0/0 Fail: 0
Reads : Total: 1916418 Max/Min: 0/0 Fail: 0
Writes: Total: 6725 Max/Min: 0/0 Fail: 0
Reads : Total: 5103727 Max/Min: 0/0 Fail: 0

notice how the writers are really struggling here (you only have to tweak a
bit more and you get RCU stalls, lose interrupts etc).

With the qrwlock:

Writes: Total: 47962 Max/Min: 0/0 Fail: 0
Reads : Total: 277903 Max/Min: 0/0 Fail: 0
Writes: Total: 100151 Max/Min: 0/0 Fail: 0
Reads : Total: 525781 Max/Min: 0/0 Fail: 0
Writes: Total: 155284 Max/Min: 0/0 Fail: 0
Reads : Total: 767703 Max/Min: 0/0 Fail: 0

which is an awful lot better for maximum latency and fairness, despite the
much lower reader count.

> There were 2 preparing patches in the series:
> [PATCH 1/3] kernel/locking: #include <asm/spinlock.h> in qrwlock
> and
> [PATCH 2/3] asm-generic: don't #include <linux/atomic.h> in qspinlock_types.h
>
> 1st patch is not needed anymore because Babu Moger submitted similar patch that
> is already in mainline: 9ab6055f95903 ("kernel/locking: Fix compile error with
> qrwlock.c"). Could you revisit second patch?

Sorry, not sure what you're asking me to do here.

Will

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/534299.html