Re: [PATCH v15] x86/split_lock: Enable split lock detection by kernel
From: Luck, Tony
Date: Sat Jan 25 2020 - 21:53:21 EST
On Sat, Jan 25, 2020 at 06:51:31PM -0500, Arvind Sankar wrote:
> On Sat, Jan 25, 2020 at 01:50:03PM -0800, Luck, Tony wrote:
> > >
> > > I might be missing something but shouldnt this be !nextflag given the
> > > flag being unset is when the task wants sld?
> >
> > That logic is convoluted ... but Thomas showed me a much better
> > way that is also much simpler ... so this code has gone now. The
> > new version is far easier to read (argument is flags for the new task
> > that we are switching to)
> >
> > void switch_to_sld(unsigned long tifn)
> > {
> > __sld_msr_set(tifn & _TIF_SLD);
> > }
> >
> > -Tony
>
> why doesnt this have the same problem though? tifn & _TIF_SLD still
> needs to be logically negated no?
There's something very odd happening. I added this trace code:
if ((tifp ^ tifn) & _TIF_SLD) {
pr_info("switch from %d (%d) to %d (%d)\n",
task_tgid_nr(prev_p), (tifp & _TIF_SLD) != 0,
task_tgid_nr(next_p), (tifn & _TIF_SLD) != 0);
switch_to_sld(tifn);
}
Then ran:
$ taskset -cp 10 $$ # bind everything to just one CPU
pid 3205's current affinity list: 0-55
pid 3205's new affinity list: 10
$ ./spin & # infinite loop
[1] 3289
$ ./split_lock_test & # 10 * split lock with udelay(1000) between
[2] 3294
I was expecting to see transitions back & forward between the "spin"
process (which won't have TIF_SLD set) and the test program (which
will have it set after the first split executes).
But I see:
[ 83.871629] x86/split lock detection: #AC: split_lock_test/3294 took a split_lock trap at address: 0x4007fc
[ 83.871638] process: switch from 3294 (1) to 3289 (0)
[ 83.882583] process: switch from 3294 (1) to 3289 (0)
[ 83.893555] process: switch from 3294 (1) to 3289 (0)
[ 83.904528] process: switch from 3294 (1) to 3289 (0)
[ 83.915501] process: switch from 3294 (1) to 3289 (0)
[ 83.926475] process: switch from 3294 (1) to 3289 (0)
[ 83.937448] process: switch from 3294 (1) to 3289 (0)
[ 83.948421] process: switch from 3294 (1) to 3289 (0)
[ 83.959394] process: switch from 3294 (1) to 3289 (0)
[ 83.970439] process: switch from 3294 (1) to 3289 (0)
i.e. only the switches from the test process to the spinner.
So split-lock testing is disabled when we first hit the #AC
and is never re-enabled because we don't pass through this
code when switching to the spinner.
So you are right that the argument is inverted. We should be
ENABLING split lock when switching to the spin loop process
and we actually disable.
So why don't we come through __switch_to_xtra() when the spinner
runs out its time slice (or the udelay interrupt happens and
preempts the spinner)?
-Tony