On Thu, 12 Sep 2019 07:50:03 +0100,Arnld, would love your inputs before publishing v3 to avoid ping-pong.
"Shenhar, Talel" <talel@xxxxxxxxxx> wrote:
Hi Marc,Independently from whether this has any material impact on performance
On 9/11/2019 5:15 PM, Marc Zyngier wrote:
[+James]You are correct. Barriers are not needed, In v1 this driver indeed
Hi Talel,
On Tue, 10 Sep 2019 20:05:09 +0100,
Talel Shenhar <talel@xxxxxxxxxx> wrote:
+ log1 = readl(pos->mmio_base + AL_POS_ERROR_LOG_1);Do you actually need the implied barriers? I'd expect that relaxed
accesses should be enough.
used _relaxed versions.
Due to request coming from Arnd in v1 patch series I removed it. As
this is not data path I had no strong objection for removing it.
(this obviously isn't a hot path, unless it can be directly generated
by userspace or a guest), I believe it is important to use the right
type of accessor, if only because code gets copied around... Others
would probably argue that this is the very reason why we should always
use the "safe" option...
Not sure what do you mean by debug tool. this is used to capture iliigle access and allow panic() based on them or simple logging.
OK, so that the moment, this is more of a debug tool than anythingIndeed this information arriving from the HW is asynchronous.+ if (!FIELD_GET(AL_POS_ERROR_LOG_1_VALID, log1))What is this information? How do we make use of it? Given that this is
+ return IRQ_NONE;
+
+ log0 = readl(pos->mmio_base + AL_POS_ERROR_LOG_0);
+ writel(0, pos->mmio_base + AL_POS_ERROR_LOG_1);
+
+ addr = FIELD_GET(AL_POS_ERROR_LOG_0_ADDR_LOW, log0);
+ addr |= (FIELD_GET(AL_POS_ERROR_LOG_1_ADDR_HIGH, log1) << 32);
+ request_id = FIELD_GET(AL_POS_ERROR_LOG_1_REQUEST_ID, log1);
+ bresp = FIELD_GET(AL_POS_ERROR_LOG_1_BRESP, log1);
+
+ dev_err(&pdev->dev, "addr=0x%llx request_id=0x%x bresp=0x%x\n",
+ addr, request_id, bresp);
asynchronous, how do we correlate it to the offending software?
There is no direct method to get the offending software.
There are all kinds of hacks we do to find the offending software once
we find this error. most of the time its a new patch introduced but
some of the time is just digging.
else, right?
I'd argue the opposite! Because you obviously don't let a read-onlyThe whole think looks to me like a poor man's EDAC handling, and I'dThis logic was not plugged into EDAC as there is no "Correctable"
expect to be plugged in that subsystem instead. Any reason why this
isn't the case? It would certainly make the handling uniform for the
user.
error here. its just error event. Not all errors are EDAC in the sense
of Error Detection And *Correction*. There are no correctable errors
for this driver.
register being written to, the error has been corrected, and you
signal the correction status.
So plugging it under EDAC seems like abusing the EDAC system.My choice would be to plug it into the EDAC subsystem, and report all
Now that I've emphasize the reason for not putting this under EDAC,
what do you think? should this "only uncorrectable event" driver
should be part of EDAC?
interrupts as "Corrected" events. Optionally, and only if you are
debugging something that requires it, report the error as
"Uncorrectable", in which case the EDAC subsystem should trigger a
panic.
At least you'd get the infrastructure, logging and tooling that the
EDAC subsystem offers (parsing the kernel log doesn't really count).
Thanks,
M.