Re: [PATCH] powerpc/pseries: remove variable 'status' set but not used

From: Tyrel Datwyler
Date: Wed Nov 20 2019 - 16:35:26 EST


On 11/18/19 9:53 PM, Michael Ellerman wrote:
> Chen Wandun <chenwandun@xxxxxxxxxx> writes:
>> Fixes gcc '-Wunused-but-set-variable' warning:
>>
>> arch/powerpc/platforms/pseries/ras.c: In function ras_epow_interrupt:
>> arch/powerpc/platforms/pseries/ras.c:319:6: warning: variable status set but not used [-Wunused-but-set-variable]
>
> Thanks for the patch.
>
> But it almost certainly is wrong to not check the status.

Agreed, I started drafting a NACK response, but got sidetracked.

>
> It's calling firmware and just assuming that the call succeeded. It then
> goes on to use the result that should have been written by firmware, but
> is now potentially random junk.
>
> So I'd much rather a patch to change it to check the status.

+1

>
>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>> index 1d7f973..4a61d0f 100644
>> --- a/arch/powerpc/platforms/pseries/ras.c
>> +++ b/arch/powerpc/platforms/pseries/ras.c
>> @@ -316,12 +316,11 @@ static irqreturn_t ras_hotplug_interrupt(int irq, void *dev_id)
>> /* Handle environmental and power warning (EPOW) interrupts. */
>> static irqreturn_t ras_epow_interrupt(int irq, void *dev_id)
>> {
>> - int status;
>> int state;
>> int critical;
>>
>> - status = rtas_get_sensor_fast(EPOW_SENSOR_TOKEN, EPOW_SENSOR_INDEX,
>> - &state);
>> + rtas_get_sensor_fast(EPOW_SENSOR_TOKEN, EPOW_SENSOR_INDEX,
>> + &state);
>
> This is calling a helper which already does some translation of the
> return value, any value < 0 indicates an error.

There are three possible architected failures here: Hardware, Non-existant
sensor, and an DR isolation error which namely would be reported in the status
as -EIO, -EINVAL, and -EFAULT. Further, the EPOW sensor is required, and is not
a DR entity so we can never get an -EINVAL or -EFAULT (baring broken firmware).
This leaves -EIO (HARDWARE_ERROR) and as I mention further down this will
generate its own error log in response. So, I don't think we need to do any
reporting here, and just return.

>
>> @@ -330,12 +329,12 @@ static irqreturn_t ras_epow_interrupt(int irq, void *dev_id)
>>
>> spin_lock(&ras_log_buf_lock);
>>
>> - status = rtas_call(ras_check_exception_token, 6, 1, NULL,
>> - RTAS_VECTOR_EXTERNAL_INTERRUPT,
>> - virq_to_hw(irq),
>> - RTAS_EPOW_WARNING,
>> - critical, __pa(&ras_log_buf),
>> - rtas_get_error_log_max());
>> + rtas_call(ras_check_exception_token, 6, 1, NULL,
>> + RTAS_VECTOR_EXTERNAL_INTERRUPT,
>> + virq_to_hw(irq),
>> + RTAS_EPOW_WARNING,
>> + critical, __pa(&ras_log_buf),
>> + rtas_get_error_log_max());
>
> This is directly calling firmware.
>
> As documented in LoPAPR, a negative status indicates an error, 0
> indicates a new error log was found (ie. the function should continue),
> or 1 there was no error log (ie. nothing to do).

It is highly unlikely that we will find no new error log since we are processing
an interrupt that supposedly fired to tell us there is a new one. However, the
ras_log_buf is never zeroed so in the unlikely case there is no new error log we
will parse stale data from the previous log. Better safe than sorry and just return.

In the case of an error the only error code we supposedly can get here is -1
(HARDWARE_ERROR), and the RTAS handling will generate an error log in response
to that. So, I don't think we need to report anything here. I would suggest for
the (status != 0) case that you just return.

-Tyrel

>
> cheers
>
>> log_error(ras_log_buf, ERR_TYPE_RTAS_LOG, 0);
>>
>> --
>> 2.7.4