Re: oops when swapping on latest kernel git 3.15-rc5

From: Michal Hocko
Date: Mon May 19 2014 - 04:23:42 EST


On Sun 18-05-14 12:15:48, Hugh Dickins wrote:
> On Sun, 18 May 2014, Branimir Maksimovic wrote:
>
> > Ia hev discovered this accidentaly when tried to see how oom killer
> > works. Program is this:
> >
> > #include <unistd.h>
> > #include <cstring>
> > #include <exception>
> > #include <iostream>
> >
> > int counter=0;
> > int main()
> > try
> > {
> > for(;;++counter)
> > {
> > char* p = new char[1024*1024];
> > memset(p,1,1024*1024);
> > std::cout<<counter<<'\n';
> > // if(counter > 24000)sleep(100);
> >
> > }
> > }catch(const std::exception& e)
> > {
> > std::cout<<"exception:"<<e.what()<<" count:"<<counter<<std::endl;
> > }
> >
> > After running this program system froze after some time. Programs could be
> > started but they will not finish.
> > Fortunatelly I could paste dmesg output:
> >
> > [ 388.522421] BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000340
> > [ 388.522427] IP: [<ffffffff81185b0b>]
> > get_mem_cgroup_from_mm.isra.42+0x2b/0x60
>
> Thank you very much for reporting. That BUG is a 3.15-rc regression.
> 3.14's try_get_mem_cgroup_from_mm() had protection against NULL mm,
> as when exiting. That was correctly removed as unnecessary by one
> 3.15 commit, but a new caller added in a later commit: which made
> it necessary again, as you have now found.

Good timing. I had a similar report on Friday from our internal testing
and was waiting for the over weekend testing results. Will post the
patch in a minute.

> Easily fixable, but opinions will differ on the right way to write it
> (and I'm rather out of touch with the current flux in css_tryget and
> root_mem_cgroup), so Cc'ing Hannes and Michal for the definitive fix.

Yes, I went with get_mem_cgroup_from_mm way. But Johannes is on vacation
AFAIK. So I would rather go with this more conservative approach and
make some additional cleanup later if necessary.

> > [ 388.522435] PGD 3f233c067 PUD 3f20f7067 PMD 0
> > [ 388.522439] Oops: 0000 [#1] SMP
> > [ 388.522441] Modules linked in: snd_hrtimer pci_stub vboxpci(OE)
> > vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) cuse rfcomm bnep bluetooth
> > binfmt_misc intel_rapl x86_pkg_temp_thermal intel_powerclamp crct10dif_pclmul
> > crc32_pclmul ghash_clmulni_intel aesni_intel snd_hda_codec_hdmi aes_x86_64
> > lrw gf128mul glue_helper snd_hda_codec_realtek snd_hda_codec_generic
> > ablk_helper cryptd gspca_spca561 gspca_main videodev mxm_wmi snd_hda_intel
> > snd_hda_controller snd_hda_codec microcode snd_hwdep joydev snd_pcm
> > snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq dm_multipath scsi_dh
> > snd_seq_device snd_timer mei_me snd mei lpc_ich wmi soundcore video mac_hid
> > serio_raw parport_pc ppdev nct6775 hwmon_vid coretemp nvidia(POE) drm lp
> > parport btrfs xor raid6_pq hid_generic usbhid hid psmouse e1000e ahci libahci
> > ptp pps_core
> > [ 388.522494] CPU: 1 PID: 160 Comm: kworker/u8:5 Tainted: P OE
> > 3.15.0-rc5-core2-custom #159
> > [ 388.522496] Hardware name: System manufacturer System Product Name/MAXIMUS
> > V GENE, BIOS 1903 08/19/2013
> > [ 388.522498] task: ffff880404e349b0 ti: ffff88040486a000 task.ti:
> > ffff88040486a000
> > [ 388.522500] RIP: 0010:[<ffffffff81185b0b>] [<ffffffff81185b0b>]
> > get_mem_cgroup_from_mm.isra.42+0x2b/0x60
> > [ 388.522504] RSP: 0000:ffff88040486bab8 EFLAGS: 00010246
> > [ 388.522506] RAX: 0000000000000000 RBX: ffffea000a416340 RCX:
> > 0000000000000a40
> > [ 388.522508] RDX: ffff88041efe8a40 RSI: ffffea000a416340 RDI:
> > 0000000000000340
> > [ 388.522509] RBP: ffff88040486bab8 R08: 000000000001cb56 R09:
> > 0000000000072d5a
> > [ 388.522511] R10: 0000000000000000 R11: 0000000000000005 R12:
> > ffff88040486bb00
> > [ 388.522512] R13: 00000000000000d0 R14: 0000000000000000 R15:
> > ffff8803f3fe82f8
> > [ 388.522515] FS: 0000000000000000(0000) GS:ffff88041ec80000(0000)
> > knlGS:0000000000000000
> > [ 388.522517] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 388.522518] CR2: 0000000000000340 CR3: 00000003ee44d000 CR4:
> > 00000000001407e0
> > [ 388.522520] Stack:
> > [ 388.522521] ffff88040486baf0 ffffffff8118abf5 ffffffff8112ce1a
> > 0000000000000000
> > [ 388.522524] ffffea000a416340 0000000000000003 00000000ffffffef
> > ffff88040486bb18
> > [ 388.522527] ffffffff8118b1cc ffff88040486baf8 000000000001cb56
> > 0000000000000000
> > [ 388.522530] Call Trace:
> > [ 388.522536] [<ffffffff8118abf5>] __mem_cgroup_try_charge_swapin+0x45/0xf0
> > [ 388.522539] [<ffffffff8112ce1a>] ? __lock_page+0x6a/0x70
> > [ 388.522543] [<ffffffff8118b1cc>] mem_cgroup_charge_file+0x9c/0xe0
> > [ 388.522548] [<ffffffff8114599c>] shmem_getpage_gfp+0x62c/0x770
> > [ 388.522552] [<ffffffff81145b18>] shmem_write_begin+0x38/0x40
> > [ 388.522555] [<ffffffff8112d1c5>] generic_perform_write+0xc5/0x1c0
> > [ 388.522559] [<ffffffff811ad53a>] ? file_update_time+0x8a/0xd0
> > [ 388.522563] [<ffffffff8112f211>] __generic_file_aio_write+0x1d1/0x3f0
> > [ 388.522567] [<ffffffff81084fc1>] ? enqueue_entity+0x291/0xb90
> > [ 388.522570] [<ffffffff8112f47f>] generic_file_aio_write+0x4f/0xc0
> > [ 388.522574] [<ffffffff81192eaa>] do_sync_write+0x5a/0x90
> > [ 388.522578] [<ffffffff810c53c1>] do_acct_process+0x4b1/0x550
> > [ 388.522582] [<ffffffff810c5acd>] acct_process+0x6d/0xa0
> > [ 388.522587] [<ffffffff810667d0>] ? manage_workers.isra.25+0x2a0/0x2a0
> > [ 388.522590] [<ffffffff8104d937>] do_exit+0x827/0xa70
> > [ 388.522594] [<ffffffff8106699e>] ? worker_thread+0x1ce/0x3a0
> > [ 388.522597] [<ffffffff810667d0>] ? manage_workers.isra.25+0x2a0/0x2a0
> > [ 388.522600] [<ffffffff8106cad3>] kthread+0xc3/0xf0
> > [ 388.522604] [<ffffffff8106ca10>] ? kthread_create_on_node+0x180/0x180
> > [ 388.522608] [<ffffffff816bfe6c>] ret_from_fork+0x7c/0xb0
> > [ 388.522611] [<ffffffff8106ca10>] ? kthread_create_on_node+0x180/0x180

Hmm, this is slightly different from what I saw. The kernel thread is
common as well as swapcache mem_cgroup_charge_file path. We just got
there from a different path (shmem_file_splice_read). This looks like
accounting is done on tmpfs?

> Or does that backtrace say that it's a kernel thread that was exiting
> (and being accounted)? A kernel thread would not have had an mm in the
> first place.
>
> I know very little about accounting (acct_process etc). You said above
> "Programs could be started but they will not finish": I'll assume that
> hitting such a BUG inside acct_process() led to that.

That sounds possible.

[...]
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/