Re: [PATCH v4] sysfs: fix kobject refcount to address races with kobject removal
From: Luis Chamberlain
Date: Thu Jul 22 2021 - 17:31:49 EST
On Wed, Jul 21, 2021 at 01:30:29PM +0200, Greg KH wrote:
> On Thu, Jul 01, 2021 at 03:48:16PM -0700, Luis Chamberlain wrote:
> > On Fri, Jun 25, 2021 at 02:56:03PM -0700, Luis Chamberlain wrote:
> > > On Thu, Jun 24, 2021 at 01:09:03PM +0200, Greg KH wrote:
> > > > thanks for making this change and sticking with it!
> > > >
> > > > Oh, and with this change, does your modprobe/rmmod crazy test now work?
> > >
> > > It does but I wrote a test_syfs driver and I believe I see an issue with
> > > this. I'll debug a bit more and see what it was, and I'll then also use
> > > the driver to demo the issue more clearly, and then verification can be
> > > an easy selftest test.
> >
> > OK my conclusion based on a new selftest driver I wrote is we can drop
> > this patch safely. The selftest will cover this corner case well now.
> >
> > In short: the kernfs active reference will ensure the store operation
> > still exists. The kernfs mutex is not enough, but if the driver removes
> > the operation prior to getting the active reference, the write will just
> > fail. The deferencing inside of the sysfs operation is abstract to
> > kernfs, and while kernfs can't do anything to prevent a driver from
> > doing something stupid, it at least can ensure an open file ensure the
> > op is not removed until the operation completes.
>
> Ok, so all is good?
It would seem to be the case.
> Then why is your zram test code blowing up so badly?
I checked the logs for the backtrace where the crash did happen
and we did see clear evidence of the race we feared here. The *first*
bug that happened was the CPU hotplug race:
[132004.787099] Error: Removing state 61 which has instances left.
[132004.787124] WARNING: CPU: 17 PID: 9307 at ../kernel/cpu.c:1879 __cpuhp_remove_state_cpuslocked+0x1c4/0x1d0
After this the crash happen:
[132005.254022] BUG: Unable to handle kernel instruction fetch
[132005.254049] Faulting instruction address: 0xc0080000004a0c24
[132005.254059] Oops: Kernel access of bad area, sig: 11 [#1]
And that's when the backtrace does come up with race. Given the first
race though, I think we can be skeptical of the rest, specially since
I cannot reproduce with a self bombing selftest.
> Where is the reference counting going wrong?
It's not clear, as the misuse with the CPU multistate could lead
to to us leaking per cpu stuct zcomp instances, leaving these
behind as there is no one to remove them. I can't think of the
relationship of this leak and the crash other then memory pressure.
Because of this and the deadlock which is easily triggerable,
I decided to write a selftest to allow is to more cleanly be
able to reproduce any races we can dream up of.
Luis