On Fri, May 31, 2024 at 11:07 AM Yang Shi <shy828301@xxxxxxxxx> wrote:
On Fri, May 31, 2024 at 10:46 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
On 31.05.24 18:50, Yang Shi wrote:
On Fri, May 31, 2024 at 1:24 AM kernel test robot <oliver.sang@xxxxxxxxx> wrote:
Hello,
kernel test robot noticed "kernel_BUG_at_include/linux/page_ref.h" on:
commit: efa7df3e3bb5da8e6abbe37727417f32a37fba47 ("mm: align larger anonymous mappings on THP boundaries")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
[test failed on linus/master e0cce98fe279b64f4a7d81b7f5c3a23d80b92fbc]
[test failed on linux-next/master 6dc544b66971c7f9909ff038b62149105272d26a]
in testcase: trinity
version: trinity-x86_64-6a17c218-1_20240527
with following parameters:
runtime: 300s
group: group-00
nr_groups: 5
compiler: gcc-13
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
we noticed the issue does not always happen. 34 times out of 50 runs as below.
the parent is clean.
1803d0c5ee1a3bbe efa7df3e3bb5da8e6abbe377274
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:50 68% 34:50 dmesg.Kernel_panic-not_syncing:Fatal_exception
:50 68% 34:50 dmesg.RIP:try_get_folio
:50 68% 34:50 dmesg.invalid_opcode:#[##]
:50 68% 34:50 dmesg.kernel_BUG_at_include/linux/page_ref.h
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
| Closes: https://lore.kernel.org/oe-lkp/202405311534.86cd4043-lkp@xxxxxxxxx
[ 275.267158][ T4335] ------------[ cut here ]------------
[ 275.267949][ T4335] kernel BUG at include/linux/page_ref.h:275!
[ 275.268526][ T4335] invalid opcode: 0000 [#1] KASAN PTI
[ 275.269001][ T4335] CPU: 0 PID: 4335 Comm: trinity-c3 Not tainted 6.7.0-rc4-00061-gefa7df3e3bb5 #1
[ 275.269787][ T4335] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 275.270679][ T4335] RIP: 0010:try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
[ 275.271159][ T4335] Code: c3 cc cc cc cc 44 89 e6 48 89 df e8 e4 54 11 00 eb ae 90 0f 0b 90 31 db eb d5 9c 58 0f 1f 40 00 f6 c4 02 0f 84 46 ff ff ff 90 <0f> 0b 48 c7 c6 a0 54 d2 87 48 89 df e8 a9 e9 ff ff 90 0f 0b be 04
If I read this BUG correctly, it is:
VM_BUG_ON(!in_atomic() && !irqs_disabled());
Yes, that seems to be the one.
try_grab_folio() actually assumes it is in an atomic context (irq
disabled or preempt disabled) for this call path. This is achieved by
disabling irq in gup fast or calling it in rcu critical section in
page cache lookup path
try_grab_folio()->try_get_folio()->folio_ref_try_add_rcu()
Is called (mm-unstable) from:
(1) gup_fast function, here IRQs are disable
(2) gup_hugepte(), possibly problematic
(3) memfd_pin_folios(), possibly problematic
(4) __get_user_pages(), likely problematic
(1) should be fine.
(2) is possibly problematic on the !fast path. If so, due to commit
a12083d721d7 ("mm/gup: handle hugepd for follow_page()") ? CCing Peter.
(3) is possibly wrong. CCing Vivek.
(4) is what we hit here
And try_grab_folio() is used when the folio is a large folio. The
We come via process_vm_rw()->pin_user_pages_remote()->__get_user_pages()->try_grab_folio()
That code was added in
commit 57edfcfd3419b4799353d8cbd6ce49da075cfdbd
Author: Peter Xu <peterx@xxxxxxxxxx>
Date: Wed Jun 28 17:53:07 2023 -0400
mm/gup: accelerate thp gup even for "pages != NULL"
The acceleration of THP was done with ctx.page_mask, however it'll be
ignored if **pages is non-NULL.
Likely the try_grab_folio() in __get_user_pages() is wrong?
As documented, we already hold a refcount. Likely we should better do a
folio_ref_add() and sanity check the refcount.
Yes, a plain folio_ref_add() seems ok for these cases.
In addition, the comment of folio_try_get_rcu() says, which is just a
wrapper of folio_ref_try_add_rcu():
You can also use this function if you're holding a lock that prevents
pages being frozen & removed; eg the i_pages lock for the page cache
or the mmap_lock or page table lock for page tables. In this case, it
will always succeed, and you could have used a plain folio_get(), but
it's sometimes more convenient to have a common function called from
both locked and RCU-protected contexts.
So IIUC we can use the plain folio_get() at least for
process_vm_readv/writev since mmap_lock is held in this path.
In essence, I think: try_grab_folio() should only be called from GUP-fast where
IRQs are disabled.
Yes, I agree. Just the fast path should need to call try_grab_folio().
try_grab_folio() also handles FOLL_PIN and FOLL_GET, so we may just
keep calling it and add a flag to try_grab_folio, just like:
if flag is true
folio_ref_add()
else
try_get_folio()