Re: Broken pci_block_user_cfg_access interface

From: Jan Kiszka
Date: Thu Aug 25 2011 - 09:16:25 EST


On 2011-08-25 15:12, Brian King wrote:
> On 08/25/2011 08:06 AM, Brian King wrote:
>> On 08/25/2011 04:40 AM, Michael S. Tsirkin wrote:
>>> On Thu, Aug 25, 2011 at 11:19:54AM +0200, Jan Kiszka wrote:
>>>> On 2011-08-24 17:02, Brian King wrote:
>>>>> On 08/24/2011 05:43 AM, Jan Kiszka wrote:
>>>>>> Hi,
>>>>>>
>>>>>> trying to port the generic device interrupt masking pattern of
>>>>>> uio_pci_generic to KVM's device assignment code, I stumbled over some
>>>>>> fundamental problem with the current pci_block/unblock_user_cfg_access
>>>>>> interface: it does not provide any synchronization between blocking
>>>>>> sides. This allows user space to trigger a kernel BUG, just run two
>>>>>>
>>>>>> while true; do echo 1 > /sys/bus/pci/devices/<some-device>/reset; done
>>>>>>
>>>>>> loops in parallel and watch the kernel oops.
>>>>>>
>>>>>> Instead of some funky open-coded locking mechanism, we would rather need
>>>>>> a plain mutex across both the user space access (via sysfs) and the
>>>>>> sections guarded by pci_block/unblock_user_cfg_access so far. But I'm
>>>>>> not sure which of them already allow sleeping, specifically if the IPR
>>>>>> driver would be fine with such a change. Can someone in the CC list
>>>>>> comment on this?
>>>>>
>>>>> The ipr driver calls pci_block/unblock_user_cfg_access from interrupt
>>>>> context, so a mutex won't work.
>>>>
>>>> Ugh. What precisely does it have to do with the config space while
>>>> running inside an IRQ handler (or holding a lock that synchronizes it
>>>> with such a handler)?
>>>>
>>>>> When the pci_block/unblock API was
>>>>> originally added, it did not have the checking it has today to detect
>>>>> if it is being called nested. This was added some time later. The
>>>>
>>>> For a reason...
>>>>
>>>>> API that works best for the ipr driver is to allow for many block calls,
>>>>> but a single unblock call unblocks access. It seems like what might
>>>>> work well in the case above is a block count. Each call to pci_block
>>>>> increments a count. Each pci_unblock decrements the count and only
>>>>> actually do the unblock if the count drops to zero. It should be reasonably
>>>>> simple for ipr to use that sort of an API as well.
>>>>
>>>> That will just paper over the underlying bug: multiple kernel users (!=
>>>> sysfs access) fiddle with the config space in an unsynchronized fashion.
>>>> Think of sysfs-triggered pci_reset_function while your ipr driver does
>>>> its accesses.
>>>>
>>>> So it's pointless to tweak the current pci_block semantics, we rather
>>>> need to establish a new mechanism that synchronizes *all* users of the
>>>> config space.
>>>>
>>>> Jan
>>>
>>> It does look like all of the problems are actually around reset.
>>> So maybe all we need to do is synchronize the sysfs-triggered
>>> pci_reset_function with pci_block/unblock_user_cfg_access?
>>>
>>> In other words, when reset is triggered from sysfs, it
>>> should obey pci_block/unblock_user_cfg_access
>>> restrictions?
>>>
>>> It does not look like reset needs to sleep, so fixing
>>> that should not be hard, right?
>>
>> This sounds reasonable to me. Although I think we still have the driver issue
>> I described in my previous mail. Perhaps the best way to resolve that would
>> be to allow the adapter driver to register a reset function so that the
>> driver could be the one driving the reset, allowing the driver to synchronize
>> the reset with whatever else might be going on and also then reinitialize
>> the adapter firmware, etc. If no driver was loaded or no driver specific
>> reset function registered, the current reset mechanism would be invoked.
>
> This would also allow the driver to do unique types of resets for different
> adapter types. Some of the adapters the ipr driver supports need to get
> reset via BIST, others via PCIe warm reset, etc.

Is this broken ATM? I thought the PCI core would simply try all methods
+ has a quirks section for completely funky devices.

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/