On Wed, 21 Feb 2018, Tariq Toukan wrote:
On 20/02/2018 8:18 PM, Thomas Gleixner wrote:
On Tue, 20 Feb 2018, Thomas Gleixner wrote:
On Tue, 20 Feb 2018, Tariq Toukan wrote:
Is there CPU hotplugging in play?
No.
Ok.
I'll come back to you tomorrow with a plan how to debug that after staring
into the code some more.
Do you have a rough idea what the test case is doing?
It arbitrary appears in different flows, like sending traffic or interface
configuration changes.
Hmm. Looks like memory corruption, but I can't pin point it.
Find below a debug patch which should prevent the crash and might give us
some insight into the type of corruption.
Please enable the irq_matrix and vector allocation trace points.
echo 1 >/sys/kernel/debug/tracing/events/irq_matrix/enable
echo 1 >/sys/kernel/debug/tracing/events/irq_vectors/vector*/enable
When the problem triggers the bogus vector is printed and the trace is
frozen. Please provide dmesg and the tracebuffer output.
Thanks,
tglx
8<--------------
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -822,6 +822,12 @@ static void free_moved_vector(struct api
unsigned int cpu = apicd->prev_cpu;
bool managed = apicd->is_managed;
+ if (vector < FIRST_EXTERNAL_VECTOR || vector >= FIRST_SYSTEM_VECTOR) {
+ tracing_off();
+ pr_err("Trying to clear prev_vector: %u\n", vector);
+ goto out;
+ }
+
/*
* This should never happen. Managed interrupts are not
* migrated except on CPU down, which does not involve the
@@ -833,6 +839,7 @@ static void free_moved_vector(struct api
trace_vector_free_moved(apicd->irq, cpu, vector, managed);
irq_matrix_free(vector_matrix, cpu, vector, managed);
per_cpu(vector_irq, cpu)[vector] = VECTOR_UNUSED;
+out:
hlist_del_init(&apicd->clist);
apicd->prev_vector = 0;
apicd->move_in_progress = 0;