On Wed, Apr 3, 2024 at 4:01 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
Thanks for the reminder. But I don't find the spot where holding
On 03.04.24 07:50, Zhaoyang Huang wrote:
On Tue, Apr 2, 2024 at 8:58 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
On 01.04.24 10:17, zhaoyang.huang wrote:
From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
An VM_BUG_ON in step 9 of [1] could happen as the refcnt is dropped
unproperly during the procedure of read_pages()->readahead_folio->folio_put.
This is introduced by commit 9fd472af84ab ("mm: improve cleanup when
->readpages doesn't process all pages")'.
key steps of[1] in brief:
2'. Thread_truncate get folio to its local fbatch by find_get_entry in step 2
7'. Last refcnt remained which is not as expect as from alloc_pages
but from thread_truncate's local fbatch in step 7
8'. Thread_reclaim succeed to isolate the folio by the wrong refcnt(not
the value but meaning) in step 8
9'. Thread_truncate hit the VM_BUG_ON in step 9
[1]
Thread_readahead:
0. folio = filemap_alloc_folio(gfp_mask, 0);
(refcount 1: alloc_pages)
1. ret = filemap_add_folio(mapping, folio, index + i, gfp_mask);
(refcount 2: alloc_pages, page_cache)
[not going into all details, just a high-level remark]
page_cache_ra_unbounded() does a filemap_invalidate_lock_shared(), which
is a down_read_trylock(&mapping->invalidate_lock).
That is, all read_pages() calls in mm/readahead.c happen under
mapping->invalidate_lock in read mode.
... and ...
Thread_truncate:
2. folio = find_get_entries(&fbatch_truncate);
(refcount 3: alloc_pages, page_cache, fbatch_truncate))
truncation, such as truncate_inode_pages() must be called under
mapping->invalidate_lock held in write mode. So naive me would have
thought that readahead and truncate cannot race in that way.
[...]
"mapping->invalidate_lock" by check the callstack of
'kill_bdev()->truncate_inode_pages()->truncate_inode_pages_range()',
or the lock is holded beyond this?
Sorry, it is theoretically yet according to my understanding.
Thanks for feedback. Above callstack is a theoretical issue so far
Something that would help here is an actual reproducer that triggersthis
issue.
To me, it's unclear at this point if we are talking about an actual
issue or a theoretical issue?
which is arised from an ongoing analysis of a practical livelock issue
generated by folio_try_get_rcu which is related to abnormal folio
refcnt state. So do you think this callstack makes sense?
I'm not an expert on that code, and only spent 5 min looking into the
code. So my reasoning about invalidate_lock above might be completely wrong.
It would be a very rare race that was not reported so far in practice.
And it certainly wouldn't be the easiest one to explain, because the
call chain above is a bit elaborate and does not explain which locks are
involved and how they fail to protect us from any such race.
For this case in particular, I think we really need a real reproducer to
convince people that the actual issue does exist and the fix actually
resolves the issue.