Re: [PATCH] Avoid null-pointer access in w1/slaves/w1_therm

From: David Fries
Date: Sun Mar 08 2015 - 17:15:16 EST

On Wed, Mar 04, 2015 at 06:36:41PM +0300, ??????? ??????? wrote:
> Hi David
> 02.03.2015, 03:17, "David Fries" <david@xxxxxxxxx>:
> > You are correct, it would be a race condition if it doesn't increment
> > the refcnt before unlocking the mutex, and it should get the mutex
> > before unref.  Here's an updated version, I haven't even tried to
> > compile it.
> >
> > What do you think Evgeniy?
> Sounds correct, it should do the trick, please cook up a patch, Thorsten, can you test it?

I quickly found out that strategy wasn't going to work.
w1_unref_slave calls w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl);
which calls sysfs_remove_groups which deadlocks,

[<ffffffff81047426>] ? wait_woken+0x54/0x54
[<ffffffff810e7d6e>] ? kernfs_remove_by_name_ns+0x67/0x82
[<ffffffff810e97c8>] ? remove_files+0x30/0x58
[<ffffffff810e9b8a>] ? sysfs_remove_group+0x60/0x7b
[<ffffffff810e9bc2>] ? sysfs_remove_groups+0x1d/0x2f
[<ffffffffa01951ae>] ? w1_unref_slave+0xd9/0x13b [wire]
[<ffffffffa01a516e>] ? w1_slave_show+0x11a/0x335 [w1_therm]

Here's a different strategy, add a w1_therm family_data specific mutex
so w1_therm_remove_slave won't return while another thread is in
w1_slave_show. Thoughts?

I included three patches, the first is the proposed fix, the second
makes it more reproducible (and since my testing doesn't have external
power I had to ignore that check), the third is just some debugging
messages. For testing I'm starting a read from w1_slave then manually
remove the sensor, which seems to do the trick.

echo 28-[ds18b20_id] > /sys/devices/w1_bus_master1/w1_master_remove

Give it a try and let me know how it goes.

My test system host is running 3.16, I'm using kvm to test 3.19 giving
it access to the USB one wire dongle, and there's some refcnt problem
I've not been able to track down. Once and a while when I have the
kvm grab the USB device the host will start printing a refcount
problem, which takes a reboot to clear, which is annoying because the
reason to test in kvm is to not break the host.

w1_master_driver w1_bus_master1: Waiting for w1_bus_master1 to become
free: refcnt=1.