Re: regression: gpiolib: switch the line state notifier to atomic unexpected impact on performance
From: David Jander
Date: Tue Mar 11 2025 - 07:04:08 EST
Dear Bartosz,
On Tue, 11 Mar 2025 11:21:10 +0100
Bartosz Golaszewski <brgl@xxxxxxxx> wrote:
> On Tue, Mar 11, 2025 at 11:01 AM David Jander <david@xxxxxxxxxxx> wrote:
> >
> >
> > Dear Bartosz,
> >
> > I noticed this because after updating the kernel from 6.11 to 6.14 a
> > user-space application that uses GPIOs heavily started getting extremely slow,
> > to the point that I will need to heavily modify this application in order to
> > be usable again.
> > I traced the problem down to the following patch that went into 6.13:
> >
> > fcc8b637c542 gpiolib: switch the line state notifier to atomic
> >
> > What happens here, is that gpio_chrdev_release() now calls
> > atomic_notifier_chain_unregister(), which uses RCU, and as such must call
> > synchronize_rcu(). synchronize_rcu() waits for the RCU grace time to expire
> > before returning and according to the documentation can cause a delay of up to
> > several milliseconds. In fact it seems to take between 8-10ms on my system (an
> > STM32MP153C single-core Cortex-A7).
> >
> > This has the effect that the time it takes to call close() on a /dev/gpiochipX
> > takes now ~10ms each time. If I git-revert this commit, close() will take less
> > than 1ms.
> >
>
> Thanks for the detailed report!
Thanks to you for making this patch in such a way that it is easy to revert
without breaking stuff! That was a read time-saver while diagnosing.
> > 10ms doesn't sound like much, but it is more ~10x the time it tool before,
> > and unfortunately libgpiod code calls this function very often in some places,
> > especially in find_line() if your board has many gpiochips (mine has 16
> > chardevs).
>
> Yeah, I imagine it can affect the speed of execution of gpiofind,
> gpiodetect and any other program that iterates over all character
> devices.
Indeed, it does. My application is written in python and uses the python gpiod
module. Even in such an environment the impact is killing.
> > The effect can easily be reproduced with the gpiofind tool:
> >
> > Running on kernel 6.12:
> >
> > $ time gpiofind LPOUT0
> > gpiochip7 9
> > real 0m 0.02s
> > user 0m 0.00s
> > sys 0m 0.01s
> >
> > Running on kernel 6.13:
> >
> > $ time gpiofind LPOUT0
> > gpiochip7 9
> > real 0m 0.19s
> > user 0m 0.00s
> > sys 0m 0.01s
> >
> > That is almost a 10x increase in execution time of the whole program!!
> >
> > On kernel 6.13, after git revert -n fcc8b637c542 time is back to what it was
> > on 6.12.
> >
> > Unfortunately I can't come up with an easy solution to this problem, that's
> > why I don't have a patch to propose. Sorry for that.
> >
> > I still think it is a bit alarming this change has such a huge impact. IMHO it
> > really shouldn't. What can be done about this? Is it maybe possible to defer
> > unregistering and freeing to a kthread and return from the release function
> > earlier?
> >
>
> This was my first idea too. Alternatively we can switch to using a raw
> notifier and provide a spinlock ourselves.
That would probably be a good alternative, although gpiod_line_state_notify()
wouldn't benefit from the zero-lock RCU implementation and incur a spin_lock
penalty. Arguably, this is probably a lot more performance-critical than
closing the chardev, so maybe the atomic notifier isn't a bad idea... we just
need to deal with the writing side so that user-space doesn't have to wait for
the RCU grace period?
Certainly, I suppose switching to the raw notifier is the easier fix.
OTOH, I know from my own experience that the penalty of a spin-lock does
matter sometimes and not having it in the performance critical path is
probably nice to have.
Best regards,
--
David Jander