Re: [RFC] power: supply: bq27xxx_battery: Fix polling interval after re-bind

From: Andrew F. Davis
Date: Tue May 26 2020 - 21:16:34 EST


On 5/25/20 7:32 AM, Krzysztof Kozlowski wrote:
> This reverts commit 8cfaaa811894a3ae2d7360a15a6cfccff3ebc7db.
>
> If device was unbound and bound, the polling interval would be set to 0.
> This is both unexpected and messes up with other bq27xxx devices (if
> more than one battery device is used).
>
> This reset of polling interval was added in commit 8cfaaa811894
> ("bq27x00_battery: Fix OOPS caused by unregistring bq27x00 driver")
> stating that power_supply_unregister() calls get_property(). However in
> Linux kernel v3.1 and newer, such call trace does not exist.
> Unregistering power supply does not call get_property() on unregistered
> power supply.
>
> Fixes: 8cfaaa811894 ("bq27x00_battery: Fix OOPS caused by unregistring bq27x00 driver")
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Krzysztof Kozlowski <krzk@xxxxxxxxxx>
>
> ---
>
> I really could not identify the issue being fixed in offending commit
> 8cfaaa811894 ("bq27x00_battery: Fix OOPS caused by unregistring bq27x00
> driver"), therefore maybe I missed here something important.
>
> Please share your thoughts on this.


I'm having a hard time finding the OOPS also. Maybe there is a window
where the poll function is running or about to run where
cancel_delayed_work_sync() is called and cancels the work, only to have
an interrupt or late get_property call in to the poll function and
re-schedule it.

What we really need is to do is look at how we are handling the polling
function. It gets called from the workqueue, from a threaded interrupt
context, and from a power supply framework callback, possibly all at the
same time. Sometimes its protected by a lock, sometimes not. Updating
the device's cached data should always be locked.

What's more is the poll function is self-arming, so if we call
cancel_delayed_work_sync() (remove it from the work queue then then wait
for it to finish if running), are we sure it wont have just re-arm itself?

We should make the only way we call the poll function be through the
work queue, (plus make sure all accesses to the cache are locked).

Andrew


> ---
> drivers/power/supply/bq27xxx_battery.c | 8 --------
> 1 file changed, 8 deletions(-)
>
> diff --git a/drivers/power/supply/bq27xxx_battery.c b/drivers/power/supply/bq27xxx_battery.c
> index 942c92127b6d..4c94ee72de95 100644
> --- a/drivers/power/supply/bq27xxx_battery.c
> +++ b/drivers/power/supply/bq27xxx_battery.c
> @@ -1905,14 +1905,6 @@ EXPORT_SYMBOL_GPL(bq27xxx_battery_setup);
>
> void bq27xxx_battery_teardown(struct bq27xxx_device_info *di)
> {
> - /*
> - * power_supply_unregister call bq27xxx_battery_get_property which
> - * call bq27xxx_battery_poll.
> - * Make sure that bq27xxx_battery_poll will not call
> - * schedule_delayed_work again after unregister (which cause OOPS).
> - */
> - poll_interval = 0;
> -
> cancel_delayed_work_sync(&di->work);
>
> power_supply_unregister(di->bat);
>