Userspace woes with 5.1.5 due to TIPC

From: Mihai Moldovan
Date: Thu May 30 2019 - 15:44:50 EST


Hi


I've had a few issues lately (mainly bad RAM only, hopefully, which should be
fixed now) and generally upgraded everything.

With 5.1.5, though, some programs exhibited very weird behavior: Chromium
crashed while starting up due to not being able to launch a new zygote process,
albeit started when using --no-sandbox (likely because that didn't try to create
other processes); Opera (based upon Chromium) failed to start with SIGILL, but
that was only a red herring triggered by the same problem I guess; Firefox
started up, but was unable to render any content because its multi-process IPC
didn't work (i.e., it couldn't start new rendering processes). Interestingly,
most other programs I use daily still worked, even though they used networking
and IPC (command-line browsers, MATE Terminal, electron-based programs), so this
bug didn't make the machine completely unusable.

Since I've been using 5.1.3 without problems before and the issue was
straight-forward to test for, I did a bisection run and came to that conclusion:

================================ bisect log ================================
Bisecting: 124 revisions left to test after this (roughly 7 steps)
[ee4c3e283f8f3286bea60e9038adc70436d87d02] s390/mm: convert to the generic
get_user_pages_fast code
Bisecting: 62 revisions left to test after this (roughly 6 steps)
[f7346dc0634cbad7fca5d951b91ad2e13f497b0b] clk: mediatek: Disable tuner_en
before change PLL rate
Bisecting: 30 revisions left to test after this (roughly 5 steps)
[5ac8e698528149bb1618111d64e22bd8bb784256] parisc: Allow live-patching of
__meminit functions
Bisecting: 15 revisions left to test after this (roughly 4 steps)
[c89c9af998fef2af4e5b2b35fb723693f17e05ef] mlxsw: core: Prevent QSFP module
initialization for old hardware
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[912d8c4cf9f19c93dfdf06b822eeadec9d71494d] net: test nouarg before dereferencing
zerocopy pointers
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[92166190b8282d9925e90a66961879782c50d037] rtnetlink: always put IFLA_LINK for
links with a link-netnsid
Bisecting: 1 revision left to test after this (roughly 1 step)
[7d29c9ad0ed525c1b10e29cfca4fb1eece1e93fb] vsock/virtio: free packets during the
socket release
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[2d08f204328acaf85ac2c6fe5d5d9d4760f12e13] tipc: fix modprobe tipc failed after
switch order of device registration
2d08f204328acaf85ac2c6fe5d5d9d4760f12e13 is the first bad commit
commit 2d08f204328acaf85ac2c6fe5d5d9d4760f12e13
Author: Junwei Hu <hujunwei4@xxxxxxxxxx>
Date: Fri May 17 19:27:34 2019 +0800

tipc: fix modprobe tipc failed after switch order of device registration

[ Upstream commit 532b0f7ece4cb2ffd24dc723ddf55242d1188e5e ]

Error message printed:
modprobe: ERROR: could not insert 'tipc': Address family not
supported by protocol.
when modprobe tipc after the following patch: switch order of
device registration, commit 7e27e8d6130c
("tipc: switch order of device registration to fix a crash")

Because sock_create_kern(net, AF_TIPC, ...) is called by
tipc_topsrv_create_listener() in the initialization process
of tipc_net_ops, tipc_socket_init() must be execute before that.

I move tipc_socket_init() into function tipc_init_net().

Fixes: 7e27e8d6130c
("tipc: switch order of device registration to fix a crash")
Signed-off-by: Junwei Hu <hujunwei4@xxxxxxxxxx>
Reported-by: Wang Wang <wangwang2@xxxxxxxxxx>
Reviewed-by: Kang Zhou <zhoukang7@xxxxxxxxxx>
Reviewed-by: Suanming Mou <mousuanming@xxxxxxxxxx>
Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>

:040000 040000 13d9b014338ccf6ae0c32bdb2be779870bbf97da
df8a9c2a9f1f8df212999c2904632a77adb03782 M net
=============================== / bisect log ===============================

My kernel config is tailored to my machine, so probably not very useful to
others, but I'm including it anyway. The most obvious point being CONFIG_TIPC=y,
i.e., TIPC being built statically into the kernel. Not sure why I've done that
in the first place because TIPC is not something that would be useful to me, but
I often err on the "might be useful later" side. I might rethink that decision
and just disable TIPC for good in the future.


With this patch applied, the kernel generally spews out a few wonky messages
that I've never seen before. For completeness sake, I've attached a ring buffer
log from running the last working and first bad version.

================================ TIPC messages ================================
NET: Registered protocol family 30
Failed to register TIPC socket type
=============================== / TIPC messages ===============================


Now, blindly reverting the patch would obviously a bad idea, since that would
mean trading one regression for the (initial) other one. I'm thus CCing the
maintainers to help.



Mihai

Attachment: config-5.1.3.xz
Description: application/xz

Attachment: dmesg-5.1.4-00013-g2d08f204328a.log.xz
Description: application/xz

Attachment: dmesg-5.1.4-00012-g7d29c9ad0ed5.log.xz
Description: application/xz

Attachment: signature.asc
Description: OpenPGP digital signature