Re: [PATCH v2] uio/uio_pci_generic: Introduce refcnt on open/release
From: Greg KH
Date: Wed Apr 13 2022 - 07:21:04 EST
On Wed, Apr 13, 2022 at 07:09:57PM +0800, Yao Hongbo wrote:
>
> 在 2022/4/13 下午5:43, Greg KH 写道:
> > On Wed, Apr 13, 2022 at 05:25:40PM +0800, Yao Hongbo wrote:
> > > 在 2022/4/13 下午4:51, Michael S. Tsirkin 写道:
> > > > On Wed, Apr 13, 2022 at 09:33:17AM +0200, Greg KH wrote:
> > > > > On Wed, Apr 13, 2022 at 03:01:42PM +0800, Yao Hongbo wrote:
> > > > > > If two userspace programs both open the PCI UIO fd, when one
> > > > > > of the program exits uncleanly, the other will cause IO hang
> > > > > > due to bus-mastering disabled.
> > > > > >
> > > > > > It's a common usage for spdk/dpdk to use UIO. So, introduce refcnt
> > > > > > to avoid such problems.
> > > > > Why do you have multiple userspace programs opening the same device?
> > > > > Shouldn't they coordinate?
> > > > Or to restate, I think the question is, why not open the device
> > > > once and pass the FD around?
> > > Hmm, it will have the same result, no matter whether opening the same
> > > device or pass the FD around.
> > How? You only open once, and close once. Where is the multiple closes?
> >
> > > Our expectation is that even if the primary process exits abnormally, the
> > > second process can still send
> > >
> > > or receive data.
> > Then use the same file descriptor.
>
>
> Yes, we can use the same file descriptor.
>
> but since the pcie bus-master has been disabled by the primary process,
>
> the seconday process cannot continue to operate.
Really? With the same file descriptor? Try it and see. release should
only be called when the file descriptor is closed.
> > > The impact of disabling pci bus-master is relatively large, and we should
> > > make some restrictions on
> > > this behavior.
> > Why? UIO is "you better really really know what you are doing to use
> > this interface", right? Just duplicate the fd and pass it around if you
> > must have multiple accesses to the same device.
> >
> > And again, this will be a functional change. How can you handle your
> > userspace on older kernels if you make this change?
>
> Without this change, our userspace cannot work properly on older kernels.
What change broke your userspace?
> Our userspace only use the "multi process mode" feature of the spdk.
>
> The SPDK links:
> https://spdk.io/doc/app_overview.html
>
> "Multi process mode
> When --shm-id is specified, the application is started in multi-process
> mode.
>
> Applications using the same shm-id share their memory and NVMe devices.
>
> The first app to start with a given id becomes a primary process, with the
> rest,
>
> called secondary processes, only attaching to it. When the primary process
> exits,
>
> the secondary ones continue to operate, but no new processes can be attached
>
> at this point. All processes within the same shm-id group must use the same
> --single-file-segments setting."
Please work with the spdk users, I know nothing about that mess, sorry.
greg k-h