Re: Regression in 4.6.0-git - bisected to commit dd254f5a382c

From: Larry Finger
Date: Mon May 23 2016 - 22:56:18 EST


On 05/23/2016 07:18 PM, Al Viro wrote:
On Mon, May 23, 2016 at 04:30:43PM -0500, Larry Finger wrote:
The mainline kernels past 4.6.0 fail hang when logging in. There are no
error messages, and the machine seems to be waiting for some event that
never happens.

The problem has been bisected to commit dd254f5a382c ("fold checks into
iterate_and_advance()"). The bisection has been verified.

The problem is the call from iov_iter_advance(). When I reinstated the old
macro with a new name and used it in that routine, the system works.
Obviously, the call that seems to be incorrect has some benefits. My
quich-and-dirty patch is attached.

I will be willing to test any patch you prepare.

Hangs where and how? A reproducer, please... This is really weird - the
only change there is in the cases when
* iov_iter_advance(i, n) is called with n greater than the remaining
amount. It's a bug, plain and simple - old variant would've been left in
seriously buggered state and at the very least we want to catch any such
places for the sake of backports
* iov_iter_advance(i, 0) - both old and new code leave *i unchanged,
but the old one dereferences i->iov[0], which be pointing beyond the end of
array by that point. The value read from there was not used by the old code,
at that.

Could you slap WARN_ON(size > i->count) in the very beginning of
iov_iter_advance() (the mainline variant) and see what triggers on your
reproducer?

The hang is when you try to log in. It asks for a password and the system never returns, and nothing is logged. The system will switch between the various CTRL-ALT-Fn screens, but that is about the most it will do.

Adding WARN_ON(size > i->count) showed nothing. I got the same result for WARN_ON(!i->count). A WARN_ON(!size) does trigger the following traceback:

[ 15.030907] ------------[ cut here ]------------
[ 15.030913] WARNING: CPU: 0 PID: 353 at lib/iov_iter.c:529 iov_iter_advance+0xf6/0x240
[ 15.030914] Modules linked in: af_packet nfs fscache arc4 rtsx_pci_sdmmc mmc_core rtsx_pci_ms memstick x86_pkg_temp_thermal kvm_intel iwlmvm kvm mac80211 snd_hda_c
odec_generic irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper iwlwifi ablk_helper cryptd snd_h
da_intel snd_hda_codec e1000e snd_hwdep snd_hda_core snd_pcm cfg80211 pcspkr serio_raw rtsx_pci snd_timer xhci_pci snd lpc_ich ptp mfd_core pps_core xhci_hcd soundcor
e thermal toshiba_acpi toshiba_bluetooth sparse_keymap wmi rfkill battery acpi_cpufreq ac processor dm_mod i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sy
simgblt fb_sys_fops drm sr_mod cdrom video button sg autofs4
[ 15.030965] CPU: 0 PID: 353 Comm: systemd-journal Not tainted 4.6.0-09084-g75b5796-dirty #89
[ 15.030966] Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20 04/17/2014
[ 15.030968] 0000000000000000 ffff88021fc07d40 ffffffff813e4d1e 0000000000000000
[ 15.030972] 0000000000000000 ffff88021fc07d80 ffffffff810702b1 00000211c4105ac0
[ 15.030975] ffff88021fc07e08 0000000000000000 ffff88021fc07f08 ffffffff814bacc0
[ 15.030978] Call Trace:
[ 15.030981] [<ffffffff813e4d1e>] dump_stack+0x67/0x99
[ 15.030985] [<ffffffff810702b1>] __warn+0xd1/0xf0
[ 15.030989] [<ffffffff814bacc0>] ? tty_compat_ioctl+0xe0/0xe0
[ 15.030991] [<ffffffff8107039d>] warn_slowpath_null+0x1d/0x20
[ 15.030994] [<ffffffff813f7716>] iov_iter_advance+0xf6/0x240
[ 15.030997] [<ffffffff81223161>] do_loop_readv_writev+0x51/0xc0
[ 15.030999] [<ffffffff814bacc0>] ? tty_compat_ioctl+0xe0/0xe0
[ 15.031002] [<ffffffff812245ff>] do_readv_writev+0x1ef/0x210
[ 15.031006] [<ffffffff81238c86>] ? do_vfs_ioctl+0x96/0x6a0
[ 15.031008] [<ffffffff8122484f>] vfs_writev+0x3f/0x50
[ 15.031010] [<ffffffff812248b5>] do_writev+0x55/0xd0
[ 15.031013] [<ffffffff812259a0>] SyS_writev+0x10/0x20
[ 15.031016] [<ffffffff81794b65>] entry_SYSCALL_64_fastpath+0x18/0xa8
[ 15.031019] ---[ end trace 8c776b094504066d ]---

Two of these are logged for each boot.

If I make iov_iter_advance() look as follows, my system will boot:

void iov_iter_advance(struct iov_iter *i, size_t size)
{
WARN_ON(!size);
if (size)
iterate_and_advance(i, size, v, 0, 0, 0)
else
iterate_and_advance_nocheck(i, size, v, 0, 0, 0)
}
EXPORT_SYMBOL(iov_iter_advance);

Larry