Re: Broken pci_block_user_cfg_access interface

From: Brian King
Date: Thu Aug 25 2011 - 09:02:47 EST


On 08/25/2011 04:19 AM, Jan Kiszka wrote:
> On 2011-08-24 17:02, Brian King wrote:
>> On 08/24/2011 05:43 AM, Jan Kiszka wrote:
>>> Hi,
>>>
>>> trying to port the generic device interrupt masking pattern of
>>> uio_pci_generic to KVM's device assignment code, I stumbled over some
>>> fundamental problem with the current pci_block/unblock_user_cfg_access
>>> interface: it does not provide any synchronization between blocking
>>> sides. This allows user space to trigger a kernel BUG, just run two
>>>
>>> while true; do echo 1 > /sys/bus/pci/devices/<some-device>/reset; done
>>>
>>> loops in parallel and watch the kernel oops.
>>>
>>> Instead of some funky open-coded locking mechanism, we would rather need
>>> a plain mutex across both the user space access (via sysfs) and the
>>> sections guarded by pci_block/unblock_user_cfg_access so far. But I'm
>>> not sure which of them already allow sleeping, specifically if the IPR
>>> driver would be fine with such a change. Can someone in the CC list
>>> comment on this?
>>
>> The ipr driver calls pci_block/unblock_user_cfg_access from interrupt
>> context, so a mutex won't work.
>
> Ugh. What precisely does it have to do with the config space while
> running inside an IRQ handler (or holding a lock that synchronizes it
> with such a handler)?

The ipr driver can get an error interrupt which will trigger the driver
to reset the adapter. While the adapter is going through reset we need
to ensure user config accesses are blocked, since many ipr adapters
won't respond on the PCI bus during this time.


>> API that works best for the ipr driver is to allow for many block calls,
>> but a single unblock call unblocks access. It seems like what might
>> work well in the case above is a block count. Each call to pci_block
>> increments a count. Each pci_unblock decrements the count and only
>> actually do the unblock if the count drops to zero. It should be reasonably
>> simple for ipr to use that sort of an API as well.
>
> That will just paper over the underlying bug: multiple kernel users (!=
> sysfs access) fiddle with the config space in an unsynchronized fashion.
> Think of sysfs-triggered pci_reset_function while your ipr driver does
> its accesses.

I took a look at the sysfs triggered pci reset function and don't see any way
that the controlling device driver ever gets to be involved in this reset.
If code outside the ipr driver were to reset the adapter, the adapter firmware
would be left in an uninitialized state and until scsi core starts timing
out ops and driving EH, the card would be unusable. I can't imagine the
ipr driver is unique in this.

-Brian


--
Brian King
Linux on Power Virtualization
IBM Linux Technology Center


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/