On Sun, May 7, 2023 at 9:58 AM Ruihan Li <lrh2000@xxxxxxxxxx> wrote:
Hi all,
Hi Ruihan,
Thank you for bisecting, and great analysis of the problem.
Recently, syzbot reported [1] ("kernel BUG in page_table_check_clear"). After
some bisection, I found out that when usbdev_mmap calls remap_pfn_range on
kmalloc'ed memory, it causes type confusion between struct folio and slab in
page_table_check. As I will explain below, it seems that both usb-side and
mm-side need some fixes. So I have cc'd linux-usb and linux-mm here, as well
as their maintainers, to see whether there are any comments on the proposed
way to fix.
[1] https://lore.kernel.org/all/000000000000258e5e05fae79fc1@xxxxxxxxxx/
To handle mmap requests for /dev/bus/usb/XXX/YYY, usbdev_mmap first allocates
memory by calling usb_alloc_coherent and then maps it into the user space
using remap_pfn_range:
static int usbdev_mmap(struct file *file, struct vm_area_struct *vma)
{
// ...
mem = usb_alloc_coherent(ps->dev, size, GFP_USER | __GFP_NOWARN,
&dma_handle);
// ...
if (hcd->localmem_pool || !hcd_uses_dma(hcd)) {
if (remap_pfn_range(vma, vma->vm_start,
virt_to_phys(usbm->mem) >> PAGE_SHIFT,
size, vma->vm_page_prot) < 0) {
// ...
}
}
// ...
}
Note that in this case, we consider the DMA-unavailable scenario, which, to be
honest, is rare in practice. However, syzbot emulates USB devices using a
module named dummy_hcd, and since these devices are emulated, DMA is not
available at all.
Meanwhile, usb_alloc_coherent calls hcd_buffer_alloc, which uses kmalloc for
memory allocation:
void *hcd_buffer_alloc(
struct usb_bus *bus,
size_t size,
gfp_t mem_flags,
dma_addr_t *dma
)
{
// ...
if (!hcd_uses_dma(hcd)) {
*dma = ~(dma_addr_t) 0;
return kmalloc(size, mem_flags);
}
// ...
}
However, during the update of the page table to map the kmalloc'd page into
the user space, page_table_check_set attempts to determine whether the page is
anonymous using PageAnon:
static void page_table_check_set(struct mm_struct *mm, unsigned long addr,
unsigned long pfn, unsigned long pgcnt,
bool rw)
{
// ...
anon = PageAnon(page);
for (i = 0; i < pgcnt; i++) {
// ...
if (anon) {
BUG_ON(atomic_read(&ptc->file_map_count));
BUG_ON(atomic_inc_return(&ptc->anon_map_count) > 1 && rw);
} else {
BUG_ON(atomic_read(&ptc->anon_map_count));
BUG_ON(atomic_inc_return(&ptc->file_map_count) < 0);
}
// ...
}
// ...
}
This call to PageAnon is invalid for slab pages because slab reuses the bits
in struct page/folio to store its internal states, and the anonymity bit only
exists in struct page/folio. As a result, the counters are incorrectly updated
and checked in page_table_check_set and page_table_check_clear, leading to the
bug being raised.
We should change anon boolean to be:
anon = !PageSlab(page) && PageAnon(page);
This way, we will have a correct way to determine anon pages, and the
rest will fall into file_map_count. The file (non-anon) PTEs are OK to
be double mapped, so there won't be any problems from page_table_check
point of view even if it is a slab page.