Re: [Xen-devel] linux 4.4 Regression: 100% cpu usage on idle pv guest under Xen with single vcpu.

From: Sander Eikelenboom
Date: Tue Dec 01 2015 - 18:49:00 EST


On 2015-12-02 00:41, Boris Ostrovsky wrote:
On 12/01/2015 06:30 PM, Sander Eikelenboom wrote:
On 2015-12-02 00:19, Boris Ostrovsky wrote:
On 12/01/2015 06:00 PM, Sander Eikelenboom wrote:
On 2015-12-01 23:47, Boris Ostrovsky wrote:
On 11/30/2015 05:55 PM, Sander Eikelenboom wrote:
On 2015-11-30 23:54, Boris Ostrovsky wrote:
On 11/30/2015 04:46 PM, Sander Eikelenboom wrote:
On 2015-11-30 22:45, Konrad Rzeszutek Wilk wrote:
On Sat, Nov 28, 2015 at 04:47:43PM +0100, Sander Eikelenboom wrote:
Hi all,

I have just tested a 4.4-rc2 kernel (current linus tree) + the tip tree
pulled on top.

Running this kernel under Xen on PV-guests with multiple vcpus goes well (on
idle < 10% cpu usage),
but a guest with only a single vcpu doesn't idle at all, it seems a kworker
thread is stuck:
root 569 98.0 0.0 0 0 ? R 16:02 12:47
[kworker/0:1]

Running a 4.3 kernel works fine with a single vpcu, bisecting would probably
quite painful since there were some breakages this merge window with respect
to Xen pv-guests.

There are some differences in the diff's from booting a 4.3, 4.4-single,
4.4-multi cpu boot:

Boris has been tracking a bunch of them. I am attaching the latest set of
patches I've to carry on top of v4.4-rc3.

Hi Konrad,

i will test those, see if it fixes all my issues and report back

They shouldn't help you ;-( (and I just saw a message from you confirming this)

The first one fixes a 32-bit bug (on bare metal too). The second fixes
a fatal bug for 32-bit PV guests. The other two are code
improvements/cleanup.

One of these patches also fixes a bug i was having with a pci-passthrough device in
a HVM that wasn't working (depending on which dom0-kernel i was using (4.3 or 4.4)),
but didn't report yet.

Fingers crossed but i think this pv-guest single vcpu issue is the last i'm troubled by for now ;)

I could not reproduce this, including with your kernel config file.

Hmm that's unpleasant :-\

Hmm other strange thing is it doesn't seem to affect dom0 (which is also a PV guest), but only unprivileged ones
All unprivileged pv-guests seem to have the irq issue, but only with a single vcpu i see to get the stuck kworker thread that got my attention, with a 2 vcpu that doesn't seem to happen, but you still get the dmesg output and warnings about hvc)

Could it be that:

arch/x86/include/asm/i8259.h
static inline int nr_legacy_irqs(void)
{
return legacy_pic->nr_legacy_irqs;
}

returns something different in some circumstances ?

It should return 16 pre-8c058b0b9c34d8c8d7912880956543769323e2d8 and 0
after that commit.

This is the last number that you see in
NR_IRQS:4352 nr_irqs:48 0
line.

I think you should be able to safely revert both
b4ff8389ed14b849354b59ce9b360bdefcdbf99c and
8c058b0b9c34d8c8d7912880956543769323e2d8 and see if it makes any
difference.


-boris


That was already underway compiling :)

And it does reveal that reverting both fixes the issue, no stuck kworker thread .. and no:
genirq: Flags mismatch irq 8. 00000000 (hvc_console) vs. 00000000 (rtc0)
hvc_open: request_irq failed with rc -16.


Let me try it again tomorrow. Can you post your guest config file, Xen
version and host HW (Intel or AMD)? 'xl info' maybe?

-boris

Guest config file == dom0 config file == the one i send you earlier.
Host is an AMD Phenom X6.

# xl info
host : serveerstertje
release : 4.4.0-rc3-20151201-linus-doflr-boris+
version : #1 SMP Tue Dec 1 19:02:58 CET 2015
machine : x86_64
nr_cpus : 6
max_cpu_id : 5
nr_nodes : 1
cores_per_socket : 6
threads_per_core : 1
cpu_mhz : 3200
hw_caps : 178bf3ff:efd3fbff:00000000:00011300:00802001:00000000:000037ff:00000000
virt_caps : hvm hvm_directio
total_memory : 20479
free_memory : 7745
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
free_cpus : 0
xen_major : 4
xen_minor : 7
xen_extra : -unstable
xen_version : 4.7-unstable
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler : credit
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset : Thu Nov 26 20:58:13 2015 +0100 git:5252636-dirty
xen_commandline : dom0_mem=1536M,max:1536M loglvl=all loglvl_guest=all console_timestamps=datems vga=gfx-1280x1024x32 cpuidle cpufreq=xen com1=38400,8n1 console=vga,com1 ivrs_ioapic[6]=00:14.0 iommu=on,verbose,debug,amd-iommu-debug conring_size=128k ucode=-1
cc_compiler : gcc-4.9.real (Debian 4.9.2-10) 4.9.2
cc_compile_by : root
cc_compile_domain : dyndns.org
cc_compile_date : Thu Nov 26 21:18:41 CET 2015
xend_config_format : 4

If you need and can get more info by letting me run a debug patch for you (because you can't reproduce) don't hesitate to send one :)

Thanks so far !

--
Sander




What i did get was an conflict reverting b4ff8389ed14b849354b59ce9b360bdefcdbf99c:
arch/arm64/include/asm/irq.h, although that shouldn't matter because we are on x86 and not on arm.

-- Sander



-- Sander


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/