Re: Panic when cpu hot-remove

From: fandongdong
Date: Thu Jun 18 2015 - 03:56:39 EST




å 2015/6/18 15:27, fandongdong åé:


å 2015/6/18 13:40, Jiang Liu åé:
On 2015/6/17 22:36, Alex Williamson wrote:
On Wed, 2015-06-17 at 13:52 +0200, Joerg Roedeljoro wrote:
On Wed, Jun 17, 2015 at 10:42:49AM +0000, èåå wrote:
Hi maintainer,

We found a problem that a panic happen when cpu was hot-removed. We also trace the problem according to the calltrace information.
An endless loop happen because value head is not equal to value tail forever in the function qi_check_fault( ).
The location code is as follows:


do {
if (qi->desc_status[head] == QI_IN_USE)
qi->desc_status[head] = QI_ABORT;
head = (head - 2 + QI_LENGTH) % QI_LENGTH;
} while (head != tail);
Hmm, this code interates only over every second QI descriptor, and tail
probably points to a descriptor that is not iterated over.

Jiang, can you please have a look?
I think that part is normal, the way we use the queue is to always
submit a work operation followed by a wait operation so that we can
determine the work operation is complete. That's done via
qi_submit_sync(). We have had spurious reports of the queue getting
impossibly out of sync though. I saw one that was somehow linked to the
I/O AT DMA engine. Roland Dreier saw something similar[1]. I'm not
sure if they're related to this, but maybe worth comparing. Thanks,
Thanks, Alex and Joerg!

Hi Dongdong,
Could you please help to give some instructions about how to
reproduce this issue? I will try to reproduce it if possible.
Thanks!
Gerry
Hi Gerry,

We're running kernel 4.1.0 on a 4-socket system and we want to offline socket 1.
Steps as follows:

echo 1 > /sys/firmware/acpi/hotplug/force_remove
echo 1 > /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:01/eject

Thanks!
Dongdong
Alex

[1] http://lists.linuxfoundation.org/pipermail/iommu/2015-January/011502.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/