Hi Tyler,Hello James,
On 31/07/17 17:15, Baicar, Tyler wrote:
On 7/29/2017 12:53 AM, Borislav Petkov wrote:Wouldn't this mean acking on a timer for ghes_poll_func()?
On Fri, Jul 28, 2017 at 04:25:03PM -0600, Tyler Baicar wrote:I think the better thing to do in this case is still send the ack. If
Currently we acknowledge errors before clearing the error status.If the first ghes_read_estatus() fails and we jump straight to that
This could cause a new error to be populated by firmware in-between
the error acknowledgment and the error status clearing which would
cause the second error's status to be cleared without being handled.
So, clear the error status before acknowledging the errors.
Also, make sure to acknowledge the error if the error status read
fails.
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d661d45..6a6895a 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -743,17 +743,15 @@ static int ghes_proc(struct ghes *ghes)
}
ghes_do_proc(ghes, ghes->estatus);
+out:
label...
+ ghes_clear_estatus(ghes);... and ACK the error anyway, even the status read failed, wouldn't that
/*
* GHESv2 type HEST entries introduce support for error acknowledgment,
* so only acknowledge the error if this support is present.
*/
if (is_hest_type_generic_v2(ghes)) {
rc = ghes_ack_error(ghes->generic_v2);
confuse the firmware?
ghes_read_estatus() fails, then
either we are unable to read the estatus or the estatus is empty/invalid. For
the first case, there's
not much that can be done. The second case would be a FW bug with populating the
estatus.
What happens if:
kernel: read error-status-block(this is probably only a problem for polling as there is no notification)
kernel: nothing here
firmware: error! write to error-status-block
kernel: write to ack register
Once FW notifies the OS of the first error, it shouldn't be touching the memory region until
If we do not send the ack, then we will be in a scenario where FW will not sendBecause we haven't yet handled the first one...
any more errors.
I thought GHESv2's ack was also used to catch errors that occur while an earlier
error is being handled. But from the text in ACPI 6.2's 18.3.2.8 the 'ack' is
only described as releasing the memory region, not completion of the error handler.