Re: general protection fault in find_device
From: Nikolay Borisov
Date: Mon Jun 18 2018 - 09:43:45 EST
On 18.06.2018 16:32, David Sterba wrote:
> On Mon, Jun 18, 2018 at 10:03:18AM +0300, Nikolay Borisov wrote:
>> So this suggests some inconsistency on fs_devices->devices list. On a
>> quick look indeed it doesn't seem clear what the locking rules for this
>> list are. In device_list_add in the !device case a device is added with
>> fs_devices->device_list_Mutex held and using list_add_rcu. In the same
>> function if we want to read the list ie invoke find_devices (because we
>> have found an fsid) we are using plain list_for_each_entry (ie not the
>> _rcu version and i don't see device_list_mutex being held while
>> iterating the list). Additionally in btrfs_free_extra_devids the
>> fs_devices->devices list is iterated with uuid_mutex being held and not
>> device_list_mutex. In open_fs_devices we don't get any protection
>> whatsoever while reading the list.
>
> The uuid_mutex or device_list_mutex is provided by a caller up the
> stack.
>
>> Same thing in
>> btrfs_find_next_active_device. If the list is supposed to be
>> RCU-protected then the rules are:
>>
>> 1. There needs to be an out of band (ie not RCU) mutual exclusion of
>> modifiers
>
> that's device_list_mutex for fs_devices::devices
>
>> 2. Iterating the list should use _rcu list primitives.
>>
>> Currently I don't see those 2 invariants being enforced in every code path.
>
> Where is it not enforced for example?
Admittedly I didn't check the whole call chain but for example in
find_device it's used "naked". Perhaps putting some lockdep_assert in
various places dealing with fs_devices->devices list would help ?
>
> If the device_list_mutex is held, list traversal does not use
> list_for_each_entry_rcu, otherwise it does (eg the DEV_INFO ioctl or
> btrfs_show_devname).
>
> The problem that triggers this report is IMO in device_list_add that
> uses the device list unprotected. Anand sent patches for that, but they
> were titled as 'cleanups' so I skipped them for the merge window.
>
> Candidate fixes are:
>
> https://patchwork.kernel.org/patch/10437705/
> https://patchwork.kernel.org/patch/10437713/
Yep those 2 definitely look like fixing unlocked accesses to
fs_devices->devices list
>