Re: [GIT PULL] ext4 updates for v5.14

From: Jon Hunter
Date: Fri Jul 02 2021 - 05:57:48 EST


Hi Ted, Zhang,

On 30/06/2021 21:49, Theodore Ts'o wrote:
> The following changes since commit 614124bea77e452aa6df7a8714e8bc820b489922:
>
> Linux 5.13-rc5 (2021-06-06 15:47:27 -0700)
>
> are available in the Git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git tags/ext4_for_linus
>
> for you to fetch changes up to 16aa4c9a1fbe763c147a964cdc1f5be8ed98ed13:
>
> jbd2: export jbd2_journal_[un]register_shrinker() (2021-06-30 11:05:00 -0400)
>
> ----------------------------------------------------------------
> In addition to bug fixes and cleanups, there are two new features for
> ext4 in 5.14:
> - Allow applications to poll on changes to /sys/fs/ext4/*/errors_count
> - Add the ioctl EXT4_IOC_CHECKPOINT which allows the journal to be
> checkpointed, truncated and discarded or zero'ed.
>
> ----------------------------------------------------------------

...

> Zhang Yi (12):
> ext4: cleanup in-core orphan list if ext4_truncate() failed to get a transaction handle
> ext4: remove check for zero nr_to_scan in ext4_es_scan()
> ext4: correct the cache_nr in tracepoint ext4_es_shrink_exit
> jbd2: remove the out label in __jbd2_journal_remove_checkpoint()
> jbd2: ensure abort the journal if detect IO error when writing original buffer back
> jbd2: don't abort the journal when freeing buffers
> jbd2: remove redundant buffer io error checks
> jbd2,ext4: add a shrinker to release checkpointed buffers


I have noticed that with next-20210701 that one of our eMMC tests
started failing on all our ARM and ARM64 platforms and bisect is
pointing to commit 4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to
release checkpointed buffers"). Today I am seeing the same failure
on the mainline.

Looking at the kernel logs I see the following crash ...

[ 74.430365] Unable to handle kernel paging request at virtual address ffff8001e353a000
[ 74.438304] Mem abort info:
[ 74.441110] ESR = 0x96000005
[ 74.444226] EC = 0x25: DABT (current EL), IL = 32 bits
[ 74.449548] SET = 0, FnV = 0
[ 74.452595] EA = 0, S1PTW = 0
[ 74.455740] FSC = 0x05: level 1 translation fault
[ 74.460620] Data abort info:
[ 74.463504] ISV = 0, ISS = 0x00000005
[ 74.467343] CM = 0, WnR = 0
[ 74.470314] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000081adc000
[ 74.477013] [ffff8001e353a000] pgd=10000002771ff803, p4d=10000002771ff803, pud=0000000000000000
[ 74.485718] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[ 74.491284] Modules linked in: tegra_drm snd_soc_tegra186_dspk cec snd_soc_tegra210_dmic snd_soc_tegra210_admaif snd_soc_tegra_pcm snd_soc_tegra210_i2s drm_kms_helper drm snd_soc_tegra210_ahub tegra210_adma crct10dif_ce snd_hda_codec_hdmi snd_soc_tegra_audio_graph_card snd_soc_audio_graph_card snd_hda_tegra snd_soc_simple_card_utils snd_hda_codec at24 tegra_bpmp_thermal snd_hda_core tegra_aconnect tegra_xudc ina3221 host1x ip_tables x_tables ipv6
[ 74.530804] CPU: 0 PID: 936 Comm: umount Tainted: G S 5.13.0-next-20210701-gfb0ca446157a #1
[ 74.540446] Hardware name: NVIDIA Jetson TX2 Developer Kit (DT)
[ 74.546354] pstate: a0000005 (NzCv daif -PAN -UAO -TCO BTYPE=--)
[ 74.552354] pc : percpu_counter_add_batch+0x30/0x118
[ 74.557317] lr : __jbd2_journal_remove_checkpoint+0x70/0x170
[ 74.562972] sp : ffff800013923b90
[ 74.566278] x29: ffff800013923b90 x28: ffff000080ba8d80 x27: 0000000000000000
[ 74.573408] x26: 0000000000000001 x25: 0000000000000006 x24: ffff000080ba8d80
[ 74.580536] x23: ffff00008965a450 x22: ffff800011ce9000 x21: ffff00008965a380
[ 74.587665] x20: ffffffffffffffff x19: ffff00008a9d8000 x18: 0000000000000011
[ 74.594792] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000038d
[ 74.601921] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 74.609048] x11: 0000000000000001 x10: 0000000000000960 x9 : ffff800013923b90
[ 74.616175] x8 : ffff000080ba9740 x7 : 0000000000000400 x6 : ffff00008965a0b0
[ 74.623304] x5 : ffff00008965a0b0 x4 : ffff8001e353a000 x3 : ffff000080ba8d80
[ 74.630430] x2 : 0000000000000020 x1 : 0000000000000000 x0 : ffff00008965a380
[ 74.637558] Call trace:
[ 74.640000] percpu_counter_add_batch+0x30/0x118
[ 74.644610] __jbd2_journal_remove_checkpoint+0x70/0x170
[ 74.649914] jbd2_log_do_checkpoint+0xa8/0x398
[ 74.654351] jbd2_journal_destroy+0x100/0x2a8
[ 74.658703] ext4_put_super+0x7c/0x388
[ 74.662449] generic_shutdown_super+0x70/0xf8
[ 74.666802] kill_block_super+0x1c/0x60
[ 74.670633] deactivate_locked_super+0x6c/0x98
[ 74.675071] deactivate_super+0x84/0x90
[ 74.678901] cleanup_mnt+0x8c/0x110
[ 74.682385] __cleanup_mnt+0x10/0x18
[ 74.685953] task_work_run+0x78/0x150
[ 74.689612] do_notify_resume+0x31c/0x498
[ 74.693618] work_pending+0xc/0x328
[ 74.697103] Code: 11000484 b9000864 d538d084 f9401001 (b8a46833)
[ 74.703186] ---[ end trace e18485293afc06e4 ]---


Is this causing problems for anyone else?

Thanks
Jon

--
nvpublic