Re: riscv32 EXT4 splat, 6.8 regression?

From: Christian Brauner
Date: Mon Apr 15 2024 - 09:21:29 EST


On Sun, Apr 14, 2024 at 04:08:11PM +0200, Björn Töpel wrote:
> Andreas Dilger <adilger@xxxxxxxxx> writes:
>
> > On Apr 13, 2024, at 8:15 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> >>
> >> On Sat, Apr 13, 2024 at 07:46:03PM -0600, Andreas Dilger wrote:
> >>
> >>> As to whether the 0xfffff000 address itself is valid for riscv32 is
> >>> outside my realm, but given that RAM is cheap it doesn't seem unlikely
> >>> to have 4GB+ of RAM and want to use it all. The riscv32 might consider
> >>> reserving this page address from allocation to avoid similar issues in
> >>> other parts of the code, as is done with the NULL/0 page address.
> >>
> >> Not a chance. *Any* page mapped there is a serious bug on any 32bit
> >> box. Recall what ERR_PTR() is...
> >>
> >> On any architecture the virtual addresses in range (unsigned long)-512..
> >> (unsigned long)-1 must never resolve to valid kernel objects.
> >> In other words, any kind of wraparound here is asking for an oops on
> >> attempts to access the elements of buffer - kernel dereference of
> >> (char *)0xfffff000 on a 32bit box is already a bug.
> >>
> >> It might be getting an invalid pointer, but arithmetical overflows
> >> are irrelevant.
> >
> > The original bug report stated that search_buf = 0xfffff000 on entry,
> > and I'd quoted that at the start of my email:
> >
> > On Apr 12, 2024, at 8:57 AM, Björn Töpel <bjorn@xxxxxxxxxx> wrote:
> >> What I see in ext4_search_dir() is that search_buf is 0xfffff000, and at
> >> some point the address wraps to zero, and boom. I doubt that 0xfffff000
> >> is a sane address.
> >
> > Now that you mention ERR_PTR() it definitely makes sense that this last
> > page HAS to be excluded.
> >
> > So some other bug is passing the bad pointer to this code before this
> > error, or the arch is not correctly excluding this page from allocation.
>
> Yeah, something is off for sure.
>
> (FWIW, I manage to hit this for Linus' master as well.)
>
> I added a print (close to trace_mm_filemap_add_to_page_cache()), and for
> this BT:
>
> [<c01e8b34>] __filemap_add_folio+0x322/0x508
> [<c01e8d6e>] filemap_add_folio+0x54/0xce
> [<c01ea076>] __filemap_get_folio+0x156/0x2aa
> [<c02df346>] __getblk_slow+0xcc/0x302
> [<c02df5f2>] bdev_getblk+0x76/0x7a
> [<c03519da>] ext4_getblk+0xbc/0x2c4
> [<c0351cc2>] ext4_bread_batch+0x56/0x186
> [<c036bcaa>] __ext4_find_entry+0x156/0x578
> [<c036c152>] ext4_lookup+0x86/0x1f4
> [<c02a3252>] __lookup_slow+0x8e/0x142
> [<c02a6d70>] walk_component+0x104/0x174
> [<c02a793c>] path_lookupat+0x78/0x182
> [<c02a8c7c>] filename_lookup+0x96/0x158
> [<c02a8d76>] kern_path+0x38/0x56
> [<c0c1cb7a>] init_mount+0x5c/0xac
> [<c0c2ba4c>] devtmpfs_mount+0x44/0x7a
> [<c0c01cce>] prepare_namespace+0x226/0x27c
> [<c0c011c6>] kernel_init_freeable+0x286/0x2a8
> [<c0b97ab8>] kernel_init+0x2a/0x156
> [<c0ba22ca>] ret_from_fork+0xe/0x20
>
> I get a folio where folio_address(folio) == 0xfffff000 (which is
> broken).
>
> Need to go into the weeds here...

I don't see anything obvious that could explain this right away. Did you
manage to reproduce this on any other architecture and/or filesystem?

Fwiw, iirc there were a bunch of fs/buffer.c changes that came in
through the mm/ layer between v6.7 and v6.8 that might also be
interesting. But really I'm poking in the dark currently.