Hi Tyler,Okay, I'll add the comment in the next patchset.
On 18/01/17 23:51, Baicar, Tyler wrote:
On 1/18/2017 7:50 AM, James Morse wrote:Ah, not instead of, (well, not initially!).
On 12/01/17 18:15, Tyler Baicar wrote:So move the nmi_enter/exit calls into do_sea of the previous patch? I can do
ARM APEI extension proposal added SEA (Synchrounous ExternalCan we move this into the arch code? Its because we got here from a
Abort) notification type for ARMv8.
Add a new GHES error source handling function for SEA. If an error
source's notification type is SEA, then this function can be registered
into the SEA exception handler. That way GHES will parse and report
SEA exceptions when they occur.
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 2acbc60..87efe26 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -767,6 +772,62 @@ static struct notifier_block ghes_notifier_sci = {
.notifier_call = ghes_notify_sci,
};
+#ifdef CONFIG_HAVE_ACPI_APEI_SEA
+static LIST_HEAD(ghes_sea);
+
+static int ghes_notify_sea(struct notifier_block *this,
+ unsigned long event, void *data)
+{
+ struct ghes *ghes;
+ int ret = NOTIFY_DONE;
+
+ nmi_enter();
synchronous-exception that makes this nmi-like, I think it only makes sense for
it be called from under /arch/.
that in the next patchset.
Where did the rcu_read_lock() go? I can see its missing from ghes_notify_nmi()Sorry, I thought we wanted nmi_enter/exit instead of the rcu_read_lock/unlock. I
too, but I don't know enough about RCU to know if that's safe!
The second paragraph in the comment above rcu_read_lock() describes it as
preventing call_rcu() during a read-side critical section that was running
concurrently. Doesn't this mean we can race with ghes_sea_remove() on another
CPU because we wait for the wrong grace period?
The same comment talks about how these read-side critical sections can nest, so
I think its quite safe to make these 'lock' calls here.
guess the rcu locks
will not cause the deadlock scenario you described in the previous patchset if
we have the
nmi_enter/exit wrapped around the rcu critical section.
The nmi_enter()/nmi_exit() thing was to fix the APEI interrupting APEI problem.
This is only a problem for notification types which can interrupt
interrupts-masked code, of which SEA is one. (and x86's NMI is the other).
I think I've found the answer to why the rcu_read_lock() isn't needed.
synchronize_sched() has:
* This means that all preempt_disable code sequences, including NMI andsynchronize_rcu() has the same innards, so I'm convinced this its safe not to
* non-threaded hardware-interrupt handlers, in progress on entry will
* have completed before this primitive returns.
have those calls in here. Could we have a comment along the lines of:
synchronize_rcu() will wait for nmi_exit(), so no need to rcu_read_lock().
(The more I learn about RCU the scarier it becomes!)Looks simple enough, should I force it to 2 in all cases, or add a check for CONFIG_HAVE_ACPI_APEI_SEA
There are two other things that need changing to make the in_nmi() code path
work on arm64.
Always reserve the virtual-address-space forcing GHES_IOREMAP_PAGES to be 2
regardless of CONFIG_HAVE_ACPI_APEI_NMI. This is almost revert of
594c7255dce7a13cac50cf2470cc56e2c3b0494e (but that did a few other things too).
We also need to fix ghes_ioremap_pfn_nmi() to use arch_apei_get_mem_attribute()So just change the call to ioremap_page_range to:
and not assume PAGE_KERNEL.