Re: [PATCH] dma-buf: Move sysfs work out of DMA-BUF export/release path

From: Greg Kroah-Hartman
Date: Tue Jan 11 2022 - 06:17:00 EST


On Tue, Jan 11, 2022 at 11:58:07AM +0100, Christian König wrote:
> > > This is also not a problem due to the high number of DMA-BUF
> > > exports during launch time, as even a single export can be delayed for
> > > an unpredictable amount of time. We cannot eliminate DMA-BUF exports
> > > completely during app-launches and we are unfortunately seeing reports
> > > of the exporting process occasionally sleeping long enough to cause
> > > user-visible jankiness :(
> > >
> > > We also looked at whether any optimizations are possible from the
> > > kernfs implementation side[1] but the semaphore is used quite extensively
> > > and it looks like the best way forward would be to remove sysfs
> > > creation/teardown from the DMA-BUF export/release path altogether. We
> > > have some ideas on how we can reduce the code-complexity in the
> > > current patch. If we manage to
> > > simplify it considerably, would the approach of offloading sysfs
> > > creation and teardown into a separate thread be acceptable Christian?
>
> At bare minimum I suggest to use a work_struct instead of re-inventing that
> with kthread.
>
> And then only put the exporting of buffers into the background and not the
> teardown.
>
> > > Thank you for the guidance!
> > One worry I have here with doing this async that now userspace might
> > have a dma-buf, but the sysfs entry does not yet exist, or the dma-buf
> > is gone, but the sysfs entry still exists. That's a bit awkward wrt
> > semantics.
> >
> > Also I'm pretty sure that if we can hit this, then other subsystems
> > using kernfs have similar problems, so trying to fix this in kernfs
> > with slightly more fine-grained locking sounds like a much more solid
> > approach. The linked patch talks about how the big delays happen due
> > to direct reclaim, and that might be limited to specific code paths
> > that we need to look at? As-is this feels a bit much like papering
> > over kernfs issues in hackish ways in sysfs users, instead of tackling
> > the problem at its root.
>
> Which is exactly my feeling as well, yes.

More and more people are using sysfs/kernfs now for things that it was
never designed for (i.e. high-speed statistic gathering). That's not
the fault of kernfs, it's the fault of people thinking it can be used
for stuff like that :)

But delays like this is odd, tearing down sysfs attributes should
normally _never_ be a fast-path that matters to system throughput. So
offloading it to a workqueue makes sense as the attributes here are for
objects that are on the fast-path.

thanks,

greg k-h