Re: general protection fault in kernfs_kill_sb

From: Tetsuo Handa
Date: Wed May 02 2018 - 06:38:31 EST


On 2018/04/20 11:44, Eric Biggers wrote:
> Fix for the kernfs bug is now queued in vfs/for-linus:
>
> #syz fix: kernfs: deal with early sget() failures

Well, the following patches

rpc_pipefs: deal with early sget() failures
kernfs: deal with early sget() failures
procfs: deal with early sget() failures
nfsd_umount(): deal with early sget() failures
nfs: avoid double-free on early sget() failures

are dropped from vfs.git#for-linus while this report is marked
as "#syz fix: kernfs: deal with early sget() failures". The patch which
actually went to linux.git is 8e04944f0ea8b838.

#syz fix: mm,vmscan: Allow preallocating memory for register_shrinker().



By the way, we still have NULL pointer dereference (as of f2125992e7cb25ec
on linux.git) shown below due to calling deactivate_locked_super() without
successful fill_super().

----------
[ 162.865231] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 162.873678] PGD 130487067 P4D 130487067 PUD 138750067 PMD 0
[ 162.879845] Oops: 0000 [#1] SMP
[ 162.883295] Modules linked in:
[ 162.886648] CPU: 2 PID: 15505 Comm: a.out Kdump: loaded Tainted: G T 4.17.0-rc3+ #522
[ 162.894891] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 162.899415] RIP: 0010:__list_del_entry_valid+0x29/0x90
[ 162.901609] RSP: 0018:ffffc90001e07ce0 EFLAGS: 00010207
[ 162.903834] RAX: 0000000000000000 RBX: ffff880132359580 RCX: dead000000000200
[ 162.906825] RDX: 0000000000000000 RSI: 00000000e85efffd RDI: ffff880132359598
[ 162.909863] RBP: ffffc90001e07ce0 R08: ffffffff815269c5 R09: 0000000000000004
[ 162.912923] R10: ffffc90001e07ce0 R11: ffffffff840f2060 R12: ffff880134c6e000
[ 162.915929] R13: ffff88013a014f00 R14: ffff880134c6e000 R15: ffff880132359580
[ 162.918927] FS: 00007f6525b13740(0000) GS:ffff88013a680000(0000) knlGS:0000000000000000
[ 162.922325] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 162.924751] CR2: 0000000000000000 CR3: 0000000134b2e006 CR4: 00000000000606e0
[ 162.927236] Call Trace:
[ 162.928153] kernfs_kill_sb+0x2e/0x90
[ 162.929377] sysfs_kill_sb+0x22/0x40
[ 162.930571] deactivate_locked_super+0x50/0x90
[ 162.931958] kernfs_mount_ns+0x283/0x290
[ 162.933126] sysfs_mount+0x74/0xf0
[ 162.934146] mount_fs+0x46/0x1a0
[ 162.935137] vfs_kern_mount.part.28+0x67/0x190
[ 162.936449] do_mount+0x7b0/0x11f0
[ 162.937473] ? memdup_user+0x5e/0x90
[ 162.938541] ? copy_mount_options+0x1a4/0x2d0
[ 162.939828] ksys_mount+0xab/0x120
[ 162.940954] __x64_sys_mount+0x26/0x30
[ 162.942153] do_syscall_64+0x7b/0x260
[ 162.943274] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 162.944903] RIP: 0033:0x7f6525640aaa
[ 162.946012] RSP: 002b:00007ffc4f4e6f78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 162.948291] RAX: ffffffffffffffda RBX: 000000000000000e RCX: 00007f6525640aaa
[ 162.950396] RDX: 0000000000400896 RSI: 000000000040089c RDI: 0000000000400896
[ 162.952480] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000002
[ 162.954545] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400694
[ 162.957760] R13: 00007ffc4f4e7080 R14: 0000000000000000 R15: 0000000000000000
[ 162.963839] Code: 00 00 55 48 8b 07 48 b9 00 01 00 00 00 00 ad de 48 8b 57 08 48 89 e5 48 39 c8 74 27 48 b9 00 02 00 00 00 00 ad de 48 39 ca 74 2c <48> 8b 32 48 39 fe 75 35 48 8b 50 08 48 39 f2 75 40 b8 01 00 00
[ 162.974795] RIP: __list_del_entry_valid+0x29/0x90 RSP: ffffc90001e07ce0
[ 162.976752] CR2: 0000000000000000
----------

Below patch can avoid NULL pointer dereference at kernfs_kill_sb().

----------
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 26dd9a5..498c044 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -314,6 +314,7 @@ struct dentry *kernfs_mount_ns(struct file_system_type *fs_type, int flags,
if (!info)
return ERR_PTR(-ENOMEM);

+ INIT_LIST_HEAD(&info->node);
info->root = root;
info->ns = ns;

----------

But there remains a refcount bug because deactivate_locked_super() from
kernfs_mount_ns() triggers kobj_ns_drop() from sysfs_kill_sb() via
sb->kill_sb() when kobj_ns_drop() is always called by sysfs_mount()
if kernfs_mount_ns() returned an error.

----------
static void *net_grab_current_ns(void)
{
struct net *ns = current->nsproxy->net_ns;
#ifdef CONFIG_NET_NS
if (ns)
refcount_inc(&ns->passive);
if (ns && !strcmp(current->comm, "a.out"))
printk("net_grab_current_ns: %px %d %d\n", ns,
refcount_read(&ns->passive), refcount_read(&ns->count));
#endif
return ns;
}

void net_drop_ns(void *p)
{
struct net *ns = p;
if (ns && !strcmp(current->comm, "a.out")) {
printk("net_drop_ns: %px %d %d\n", ns,
refcount_read(&ns->passive), refcount_read(&ns->count));
dump_stack();
}
if (ns && refcount_dec_and_test(&ns->passive))
net_free(ns);
}
----------

----------
Normal case
[ 79.283244] net_grab_current_ns: ffff88012e570080 2 1
[ 79.299881] net_drop_ns: ffff88012e570080 2 1
[ 79.303463] CPU: 0 PID: 15294 Comm: a.out Kdump: loaded Tainted: G T 4.17.0-rc3+ #527
[ 79.310509] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 79.316055] Call Trace:
[ 79.317367] dump_stack+0xe9/0x148
[ 79.319154] net_drop_ns+0xa1/0xb0
[ 79.320903] ? get_net_ns_by_id+0x170/0x170
[ 79.323053] kobj_ns_drop+0x61/0x70
[ 79.324856] sysfs_kill_sb+0x2f/0x40
[ 79.326750] deactivate_locked_super+0x50/0x90
[ 79.329053] deactivate_super+0x61/0x90
[ 79.331032] cleanup_mnt+0x49/0x90
[ 79.332794] __cleanup_mnt+0x16/0x20
[ 79.334641] task_work_run+0xb3/0xf0
[ 79.336485] exit_to_usermode_loop+0x152/0x160
[ 79.338785] do_syscall_64+0x237/0x260
[ 79.340710] entry_SYSCALL_64_after_hwframe+0x49/0xbe

[ 79.357961] net_grab_current_ns: ffff88012e570080 2 1
[ 79.360275] net_drop_ns: ffff88012e570080 2 1
[ 79.362469] CPU: 0 PID: 15294 Comm: a.out Kdump: loaded Tainted: G T 4.17.0-rc3+ #527
[ 79.366436] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 79.369596] Call Trace:
[ 79.370363] dump_stack+0xe9/0x148
[ 79.371444] net_drop_ns+0xa1/0xb0
[ 79.372504] ? get_net_ns_by_id+0x170/0x170
[ 79.373836] kobj_ns_drop+0x61/0x70
[ 79.374912] sysfs_mount+0xd2/0xf0
[ 79.375976] ? lockdep_init_map+0x9/0x10
[ 79.377343] mount_fs+0x46/0x1a0
[ 79.378365] vfs_kern_mount.part.28+0x67/0x190
[ 79.379850] do_mount+0x7b0/0x11f0
[ 79.381001] ? memdup_user+0x5e/0x90
[ 79.382213] ? copy_mount_options+0x1a4/0x2d0
[ 79.383514] ksys_mount+0xab/0x120
[ 79.384544] __x64_sys_mount+0x26/0x30
[ 79.385761] do_syscall_64+0x7b/0x260
[ 79.386942] entry_SYSCALL_64_after_hwframe+0x49/0xbe
----------

----------
Error case
[ 79.664326] net_grab_current_ns: ffff88012e570080 2 1
[ 79.666073] net_drop_ns: ffff88012e570080 2 1
[ 79.667504] CPU: 1 PID: 15294 Comm: a.out Kdump: loaded Tainted: G T 4.17.0-rc3+ #527
[ 79.670197] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 79.673282] Call Trace:
[ 79.674041] dump_stack+0xe9/0x148
[ 79.675064] net_drop_ns+0xa1/0xb0
[ 79.676085] ? get_net_ns_by_id+0x170/0x170
[ 79.677326] kobj_ns_drop+0x61/0x70
[ 79.678902] sysfs_kill_sb+0x2f/0x40
[ 79.680144] deactivate_locked_super+0x50/0x90
[ 79.681613] kernfs_mount_ns+0x28f/0x2a0
[ 79.682884] sysfs_mount+0x74/0xf0
[ 79.683906] mount_fs+0x46/0x1a0
[ 79.684879] vfs_kern_mount.part.28+0x67/0x190
[ 79.686193] do_mount+0x7b0/0x11f0
[ 79.687234] ? memdup_user+0x5e/0x90
[ 79.688305] ? copy_mount_options+0x1a4/0x2d0
[ 79.689592] ksys_mount+0xab/0x120
[ 79.690617] __x64_sys_mount+0x26/0x30
[ 79.691735] do_syscall_64+0x7b/0x260
[ 79.692833] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 79.694391] RIP: 0033:0x7fefaf3c4aaa
[ 79.695744] RSP: 002b:00007ffe74af7fd8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 79.698209] RAX: ffffffffffffffda RBX: 000000000000000e RCX: 00007fefaf3c4aaa
[ 79.700386] RDX: 0000000000400896 RSI: 000000000040089c RDI: 0000000000400896
[ 79.702472] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000002
[ 79.704552] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400694
[ 79.706624] R13: 00007ffe74af80e0 R14: 0000000000000000 R15: 0000000000000000
[ 79.708802] net_drop_ns: ffff88012e570080 1 1
[ 79.710317] CPU: 1 PID: 15294 Comm: a.out Kdump: loaded Tainted: G T 4.17.0-rc3+ #527
[ 79.713255] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 79.716704] Call Trace:
[ 79.717463] dump_stack+0xe9/0x148
[ 79.718487] net_drop_ns+0xa1/0xb0
[ 79.719510] ? get_net_ns_by_id+0x170/0x170
[ 79.720771] kobj_ns_drop+0x61/0x70
[ 79.721818] sysfs_mount+0xd2/0xf0
[ 79.722840] mount_fs+0x46/0x1a0
[ 79.723815] vfs_kern_mount.part.28+0x67/0x190
[ 79.725178] do_mount+0x7b0/0x11f0
[ 79.726272] ? memdup_user+0x5e/0x90
[ 79.727361] ? copy_mount_options+0x1a4/0x2d0
[ 79.728811] ksys_mount+0xab/0x120
[ 79.730002] __x64_sys_mount+0x26/0x30
[ 79.731253] do_syscall_64+0x7b/0x260
[ 79.732477] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 79.734039] RIP: 0033:0x7fefaf3c4aaa
[ 79.735136] RSP: 002b:00007ffe74af7fd8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 79.737340] RAX: ffffffffffffffda RBX: 000000000000000e RCX: 00007fefaf3c4aaa
[ 79.739427] RDX: 0000000000400896 RSI: 000000000040089c RDI: 0000000000400896
[ 79.741576] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000002
[ 79.743771] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400694
[ 79.746102] R13: 00007ffe74af80e0 R14: 0000000000000000 R15: 0000000000000000
[ 79.748604] ------------[ cut here ]------------
[ 79.750409] ODEBUG: free active (active state 0) object type: timer_list hint: can_stat_update+0x0/0x3b0
[ 79.753308] WARNING: CPU: 1 PID: 15294 at lib/debugobjects.c:329 debug_print_object+0x6a/0x90
[ 79.755786] Modules linked in:
[ 79.756812] CPU: 1 PID: 15294 Comm: a.out Kdump: loaded Tainted: G T 4.17.0-rc3+ #527
[ 79.759725] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[ 79.763120] RIP: 0010:debug_print_object+0x6a/0x90
[ 79.764684] RSP: 0018:ffffc900070a3ca0 EFLAGS: 00010086
[ 79.766384] RAX: 0000000000000000 RBX: ffff88012fabea00 RCX: ffffffff8126085e
[ 79.768482] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88013a6565b0
[ 79.770537] RBP: ffffc900070a3cb8 R08: 0000000000000000 R09: 0000000000000001
[ 79.772692] R10: ffffc900070a3c18 R11: ffff88012f350140 R12: ffffffff840c6e40
[ 79.774883] R13: ffffffff83ca7711 R14: 0000000000000002 R15: ffff88012e5722c0
[ 79.777012] FS: 00007fefaf897740(0000) GS:ffff88013a640000(0000) knlGS:0000000000000000
[ 79.779762] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 79.779765] CR2: 00007f9542e78090 CR3: 000000012f336002 CR4: 00000000000606e0
[ 79.779814] Call Trace:
[ 79.779825] debug_check_no_obj_freed+0x184/0x1ff
[ 79.779831] kmem_cache_free+0x228/0x280
[ 79.779838] net_drop_ns+0x74/0xb0
[ 79.779845] ? get_net_ns_by_id+0x170/0x170
[ 79.790008] kobj_ns_drop+0x61/0x70
[ 79.791177] sysfs_mount+0xd2/0xf0
[ 79.792311] mount_fs+0x46/0x1a0
[ 79.793343] vfs_kern_mount.part.28+0x67/0x190
[ 79.794653] do_mount+0x7b0/0x11f0
[ 79.795796] ? memdup_user+0x5e/0x90
[ 79.796961] ? copy_mount_options+0x1a4/0x2d0
[ 79.798365] ksys_mount+0xab/0x120
[ 79.799502] __x64_sys_mount+0x26/0x30
[ 79.800780] do_syscall_64+0x7b/0x260
[ 79.801886] entry_SYSCALL_64_after_hwframe+0x49/0xbe
----------

Since sysfs_mount() is the only user who calls kernfs_mount_ns() with ns != NULL,
is it OK to do sysfs specific hack shown below? Or, should we avoid calling
deactivate_locked_super() when kernfs_fill_super() failed?

----------
diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 26dd9a5..498c044 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -332,6 +333,9 @@ struct dentry *kernfs_mount_ns(struct file_system_type *fs_type, int flags,

error = kernfs_fill_super(sb, magic);
if (error) {
+ /* Avoid double kobj_ns_drop(KOBJ_NS_TYPE_NET, ns) */
+ if (ns)
+ kobj_ns_grab_current(KOBJ_NS_TYPE_NET);
deactivate_locked_super(sb);
return ERR_PTR(error);
}
----------