Re: [PATCH v3 1/5] Xarray: Do not return sibling entries from xas_find_marked()

From: Kemeng Shi
Date: Mon Dec 16 2024 - 02:06:13 EST




on 12/13/2024 2:12 PM, Baolin Wang wrote:
>
>
> On 2024/12/13 20:25, Kemeng Shi wrote:
>> Similar to issue fixed in commit cbc02854331ed ("XArray: Do not return
>> sibling entries from xa_load()"), we may return sibling entries from
>> xas_find_marked as following:
>>      Thread A:               Thread B:
>>                              xa_store_range(xa, entry, 6, 7, gfp);
>>                 xa_set_mark(xa, 6, mark)
>>      XA_STATE(xas, xa, 6);
>>      xas_find_marked(&xas, 7, mark);
>>      offset = xas_find_chunk(xas, advance, mark);
>>      [offset is 6 which points to a valid entry]
>>                              xa_store_range(xa, entry, 4, 7, gfp);
>>      entry = xa_entry(xa, node, 6);
>>      [entry is a sibling of 4]
>>      if (!xa_is_node(entry))
>>          return entry;
>>
>> Skip sibling entry like xas_find() does to protect caller from seeing
>> sibling entry from xas_find_marked() or caller may use sibling entry
>> as a valid entry and crash the kernel.
>>
>> Besides, load_race() test is modified to catch mentioned issue and modified
>> load_race() only passes after this fix is merged.
>>
>> Here is an example how this bug could be triggerred in tmpfs which
>> enables large folio in mapping:
>> Let's take a look at involved racer:
>> 1. How pages could be created and dirtied in shmem file.
>> write
>>   ksys_write
>>    vfs_write
>>     new_sync_write
>>      shmem_file_write_iter
>>       generic_perform_write
>>        shmem_write_begin
>>         shmem_get_folio
>>          shmem_allowable_huge_orders
>>          shmem_alloc_and_add_folios
>>          shmem_alloc_folio
>>          __folio_set_locked
>>          shmem_add_to_page_cache
>>           XA_STATE_ORDER(..., index, order)
>>           xax_store()
>>        shmem_write_end
>>         folio_mark_dirty()
>>
>> 2. How dirty pages could be deleted in shmem file.
>> ioctl
>>   do_vfs_ioctl
>>    file_ioctl
>>     ioctl_preallocate
>>      vfs_fallocate
>>       shmem_fallocate
>>        shmem_truncate_range
>>         shmem_undo_range
>>          truncate_inode_folio
>>           filemap_remove_folio
>>            page_cache_delete
>>             xas_store(&xas, NULL);
>>
>> 3. How dirty pages could be lockless searched
>> sync_file_range
>>   ksys_sync_file_range
>>    __filemap_fdatawrite_range
>>     filemap_fdatawrite_wbc
>
> Seems not a good example, IIUC, tmpfs doesn't support writeback (mapping_can_writeback() will return false), right?
>
Ahhh, right. Thank you for correcting me. Then I would like to use nfs as low-level
filesystem in example and the potential crash could be triggered in the same steps.

Invovled racers:
1. How pages could be created and dirtied in nfs.
write
ksys_write
vfs_write
new_sync_write
nfs_file_write
generic_perform_write
nfs_write_begin
fgf_set_order
__filemap_get_folio
nfs_write_end
nfs_update_folio
nfs_writepage_setup
nfs_mark_request_dirty
filemap_dirty_folio
__folio_mark_dirty
__xa_set_mark

2. How dirty pages could be deleted in nfs.
ioctl
do_vfs_ioctl
file_ioctl
ioctl_preallocate
vfs_fallocate
nfs42_fallocate
nfs42_proc_deallocate
truncate_pagecache_range
truncate_inode_pages_range
truncate_inode_folio
filemap_remove_folio
page_cache_delete
xas_store(&xas, NULL);


3. How dirty pages could be lockless searched
sync_file_range
ksys_sync_file_range
__filemap_fdatawrite_range
filemap_fdatawrite_wbc
do_writepages
writeback_use_writepage
writeback_iter
writeback_get_folio
filemap_get_folios_tag
find_get_entry
folio = xas_find_marked()
folio_try_get(folio)

Steps to crash kernel:
1.Create 2.Search 3.Delete
/* write page 2,3 */
write
...
nfs_write_begin
fgf_set_order
__filemap_get_folio
...
xa_store(&xas, folio)
nfs_write_end
...
__folio_mark_dirty

/* sync page 2 and page 3 */
sync_file_range
...
find_get_entry
folio = xas_find_marked()
/* offset will be 2 */
offset = xas_find_chunk()

/* delete page 2 and page 3 */
ioctl
...
xas_store(&xas, NULL);

/* write page 0-3 */
write
...
nfs_write_begin
fgf_set_order
__filemap_get_folio
...
xa_store(&xas, folio)
nfs_write_end
...
__folio_mark_dirty

/* get sibling entry from offset 2 */
entry = xa_entry(.., 2)
/* use sibling entry as folio and crash kernel */
folio_try_get(folio)