On 2015/6/18 15:54, fandongdong wrote:Yes.
Hi Dongdong,
å 2015/6/18 15:27, fandongdong åé:
å 2015/6/18 13:40, Jiang Liu åé:
On 2015/6/17 22:36, Alex Williamson wrote:Hi Gerry,
On Wed, 2015-06-17 at 13:52 +0200, Joerg Roedeljoro wrote:Thanks, Alex and Joerg!
On Wed, Jun 17, 2015 at 10:42:49AM +0000, èåå wrote:I think that part is normal, the way we use the queue is to always
Hi maintainer,Hmm, this code interates only over every second QI descriptor, and
We found a problem that a panic happen when cpu was hot-removed.
We also trace the problem according to the calltrace information.
An endless loop happen because value head is not equal to value
tail forever in the function qi_check_fault( ).
The location code is as follows:
do {
if (qi->desc_status[head] == QI_IN_USE)
qi->desc_status[head] = QI_ABORT;
head = (head - 2 + QI_LENGTH) % QI_LENGTH;
} while (head != tail);
tail
probably points to a descriptor that is not iterated over.
Jiang, can you please have a look?
submit a work operation followed by a wait operation so that we can
determine the work operation is complete. That's done via
qi_submit_sync(). We have had spurious reports of the queue getting
impossibly out of sync though. I saw one that was somehow linked to
the
I/O AT DMA engine. Roland Dreier saw something similar[1]. I'm not
sure if they're related to this, but maybe worth comparing. Thanks,
Hi Dongdong,
Could you please help to give some instructions about how to
reproduce this issue? I will try to reproduce it if possible.
Thanks!
Gerry
We're running kernel 4.1.0 on a 4-socket system and we want to
offline socket 1.
Steps as follows:
echo 1 > /sys/firmware/acpi/hotplug/force_remove
echo 1 > /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:01/eject
I failed to reproduce this issue on my side. Some please help
to confirm?
1) Is this issue reproducible on your side?
2) Does this issue happen if you disable irqbalance service on youYes.
system?
3) Has the corresponding PCI host bridge been removed before removingNo, we will try to remove it before removing the socket later.
the socket?
>From the log message, we only noticed log messages for CPU and memory,
but not messages for PCI (IOMMU) devices. And this log message
"[ 149.976493] acpi ACPI0004:01: Still not present"
implies that the socket has been powered off during the ejection.
So the story may be that you powered off the socket while the host
bridge on the socket is still in use.
Thanks!
Gerry
.