On Thu, 24 Oct 2024 08:49:30 +0200, Bartosz Golaszewski <brgl@xxxxxxxx> said:
On Wed, Oct 23, 2024 at 11:05 PM Mark Brown <broonie@xxxxxxxxxx> wrote:
On Fri, Oct 18, 2024 at 11:10:16AM +0200, Bartosz Golaszewski wrote:
From: Bartosz Golaszewski <bartosz.golaszewski@xxxxxxxxxx>
We currently only notify user-space about line config changes that are
made from user-space. Any kernel config changes are not signalled.
Let's improve the situation by emitting the events closer to the source.
To that end let's call the relevant notifier chain from the functions
setting direction, gpiod_set_config(), gpiod_set_consumer_name() and
gpiod_toggle_active_low(). This covers all the options that we can
inform the user-space about. We ignore events which don't have
corresponding flags exported to user-space on purpose - otherwise the
user would see a config-changed event but the associated line-info would
remain unchanged.
Today's -next is not booting on several of my platforms, including
beaglebone-black, i.MX8MP-EVK and pine64plus. Bisects are pointing at
this commit, and i.MX8MP-EVK is actually giving some console output:
[ 2.502208] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 2.511036] Mem abort info:
...
[ 2.679934] Call trace:
[ 2.682379] gpiod_direction_output+0x34/0x5c
[ 2.686745] i2c_register_adapter+0x59c/0x670
[ 2.691111] __i2c_add_numbered_adapter+0x58/0xa8
[ 2.695822] i2c_add_adapter+0xa0/0xd0
[ 2.699578] i2c_add_numbered_adapter+0x2c/0x38
[ 2.704117] i2c_imx_probe+0x2d0/0x640
which looks plausible given the change.
Full boot log for i.MX8MP-EVK:
https://lava.sirena.org.uk/scheduler/job/887083
Bisect log for that, the others look similar (the long run of good/bad
tags at the start for random commits is my automation pulling test
results it already knows about in the affected range to try to speed up
the bisect):
Hi Mark!
Any chance you could post the output of
scripts/faddr2line drivers/gpio/gpiolib.o gpiod_direction_output+0x34/0x5c
for that build?
Bart
While I can't really reproduce it, I've looked at what could be wrong and
figured that the NULL-pointer in question can possibly be the
line_state_notifier.
I realized that for some historical reasons we add the GPIO device to the
global list before it's fully set up - including initializing the notifier.
In some corner cases (devlinks borked?) this could lead to consumers requesting
GPIOs before their provider is ready.
Mark: could you try the following diff and let me know if it works?
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index ae758ba6dc3d..4258acac0162 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -987,45 +987,6 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
gdev->ngpio = gc->ngpio;
gdev->can_sleep = gc->can_sleep;
-
- scoped_guard(mutex, &gpio_devices_lock) {
- /*
- * TODO: this allocates a Linux GPIO number base in the global
- * GPIO numberspace for this chip. In the long run we want to
- * get *rid* of this numberspace and use only descriptors, but
- * it may be a pipe dream. It will not happen before we get rid
- * of the sysfs interface anyways.
- */
- base = gc->base;
- if (base < 0) {
- base = gpiochip_find_base_unlocked(gc->ngpio);
- if (base < 0) {
- ret = base;
- base = 0;
- goto err_free_label;
- }
-
- /*
- * TODO: it should not be necessary to reflect the
- * assigned base outside of the GPIO subsystem. Go over
- * drivers and see if anyone makes use of this, else
- * drop this and assign a poison instead.
- */
- gc->base = base;
- } else {
- dev_warn(&gdev->dev,
- "Static allocation of GPIO base is deprecated, use dynamic allocation.\n");
- }
-
- gdev->base = base;
-
- ret = gpiodev_add_to_list_unlocked(gdev);
- if (ret) {
- chip_err(gc, "GPIO integer space overlap, cannot add chip\n");
- goto err_free_label;
- }
- }
-
ATOMIC_INIT_NOTIFIER_HEAD(&gdev->line_state_notifier);
BLOCKING_INIT_NOTIFIER_HEAD(&gdev->device_notifier);
@@ -1103,6 +1064,45 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
if (ret)
goto err_remove_irqchip;
}
+
+ scoped_guard(mutex, &gpio_devices_lock) {
+ /*
+ * TODO: this allocates a Linux GPIO number base in the global
+ * GPIO numberspace for this chip. In the long run we want to
+ * get *rid* of this numberspace and use only descriptors, but
+ * it may be a pipe dream. It will not happen before we get rid
+ * of the sysfs interface anyways.
+ */
+ base = gc->base;
+ if (base < 0) {
+ base = gpiochip_find_base_unlocked(gc->ngpio);
+ if (base < 0) {
+ ret = base;
+ base = 0;
+ goto err_free_label;
+ }
+
+ /*
+ * TODO: it should not be necessary to reflect the
+ * assigned base outside of the GPIO subsystem. Go over
+ * drivers and see if anyone makes use of this, else
+ * drop this and assign a poison instead.
+ */
+ gc->base = base;
+ } else {
+ dev_warn(&gdev->dev,
+ "Static allocation of GPIO base is deprecated, use dynamic allocation.\n");
+ }
+
+ gdev->base = base;
+
+ ret = gpiodev_add_to_list_unlocked(gdev);
+ if (ret) {
+ chip_err(gc, "GPIO integer space overlap, cannot add chip\n");
+ goto err_free_label;
+ }
+ }
+
return 0;
err_remove_irqchip:
Thanks
Bartosz
Attachment:
gpiod_oops_decoded.gz
Description: application/gzip
Attachment:
gpiod_oops_after_patch_decoded.gz
Description: application/gzip