Re: [syzbot] [mm?] general protection fault in hpage_collapse_scan_file
From: Zach O'Keefe
Date: Wed Apr 17 2024 - 14:57:01 EST
On Tue, Apr 16, 2024 at 4:07 PM Zach O'Keefe <zokeefe@xxxxxxxxxx> wrote:
>
> On Tue, Apr 9, 2024 at 5:32 PM Zach O'Keefe <zokeefe@xxxxxxxxxx> wrote:
> >
> > On Tue, Apr 9, 2024 at 4:46 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Tue, 09 Apr 2024 03:16:20 -0700 syzbot <syzbot+57adb2a4b9d206521bc2@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > > Hello,
> > > >
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit: 8568bb2ccc27 Add linux-next specific files for 20240405
> > > > git tree: linux-next
> > > > console+strace: https://syzkaller.appspot.com/x/log.txt?x=152f4805180000
> > > > kernel config: https://syzkaller.appspot.com/x/.config?x=48ca5acf8d2eb3bc
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=57adb2a4b9d206521bc2
> > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1268258d180000
> > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1256598d180000
> > >
> > > Help. From a quick look this seems to be claiming that collapse_file()
> > > got to
> > >
> > > VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
> > >
> > > with folio==NULL, but the code look solid regarding this.
> > >
> > > Given that we have a reproducer, can we expect the bot to perform a
> > > bisection for us?
> > >
> >
> > I often don't see a successful automatic bisect, even with
> > reproducers. Hit or miss. I will take a closer look tomorrow -- the
> > reproducer doesn't look to be doing anything crazy.
>
> I've only been able to reproduce this using the disk image provided by syzbot.
>
> What is happening is we are calling MADV_COLLAPSE on an empty mapping
> -- which actually reaches collapse_file() -> filemap_lock_folio()
> after page_cache_sync_readahead() attempt. This of course fails
> correctly, and I can see right before GPF that the returned pointer is
> 0xfffffffffffffffe, which is correctly ERR_PTR(-ENOENT). This should
> be causing us to take the if (IS_ERR(folio)) {..} path .. but we
> don't, and I don't know why. I haven't yet attempted to repro this
> against other images. Will continue looking, but wanted to provide
> some type of update -- even if it is a disappointing one -- so as to
> not appear like I've disappeared.
Ugh. Was looking at the wrong source. Thanks hughd@ for mentioning
that IS_ERR(folio) changed recently, else I'd have spent more time on
it. Fixed by https://lore.kernel.org/all/ZhIWX8K0E2tSyMSr@xxxxxxxxxxxxxxxxxxxx/
> Thanks,
> Zach
>
> > Thanks,
> > Zach