Re: [GIT PULL] gpio: fixes for v6.1-rc8 - take 2
From: Linus Torvalds
Date: Sun Dec 04 2022 - 15:18:05 EST
On Sat, Dec 3, 2022 at 7:05 AM Bartosz Golaszewski <brgl@xxxxxxxx> wrote:
>
> Here's a fixed PR from the GPIO subsystem for the next rc.
No, this cannot be right.
That last commit seems *very* dubious, and in particular all those
if (!down_read_trylock(&gdev->sem))
return EPOLLHUP | EPOLLERR;
are a sign that something is very very wrong there.
Either the lock is necessary or it isn't, and "trylock" isn't the way
to deal with it, with random failures if you cannot take the lock.
If you are using "trylock" because the data structure might go away
from under you, you have already lost, and the code is buggy.
And if the data structure cannot go away from under you, you should
do an unconditional lock, and then check "gdev->chip" for being NULL
once you have gotten the lock (the same way you did in open()).
But a "trylock and return error if it failed" just means that now you
are randomly returning errors to user space, which is entirely
undebuggable and makes no sense.
Or, alternatively, the trylock succeeds - because it hits fully
*after* gpiochip_remove() has finished, and now ->chip is NULL anyway,
which is what you claim to protect against.
End result: "trylock" can never be right in this kind of context.
That "call_locked() helper might make sense more along the lines of
ret = -ENODEV;
down_read(&gdev->sem))
// Does the device still exist?
if (gdev->chip)
ret = func(file, cmd, arg);
up_read(&gdev->sem);
return ret;
or similar. Not with that odd "try to lock, and if that fails, assume error".
And again - if the trylock is there because 'gdev' itself might go
away at any time and you can't afford to wait on the lock, then it's
broken regardless (and the above suggestion won't help either)
Anyway: the end result of this all is that I think this is a
fundamental bug in the gpio layer, and rc7 (soon to be rc8) is too
late to try these kinds of unfinished games.
Fix it properly for 6.2, and make it back-portable, because I'm not
pulling something like this right now.
Linus