On Tue, Jun 11, 2024 at 4:49 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote:
@@ -515,8 +600,11 @@ void swap_read_folio(struct folio *folio, boolSorry I did not make myself clearer. I did not mean that you should
synchronous,
Yes, makes sense. Thanks for explaining this.I think the assumption is incorrect. I think we would just check ifSo yes, with this patch I tested swap out of large zero folio, but whenpsi_memstall_enter(&pflags);We don't currently support swapping in large folios, but it is a work
}
delayacct_swapin_start();
-
- if (zswap_load(folio)) {
+ if (swap_zeromap_folio_test(folio)) {
+ folio_zero_fill(folio);
+ folio_mark_uptodate(folio);
+ folio_unlock(folio);
in progress, and this will break once we have it.
swap_zeromap_folio_test() will return false even if parts of the folio
are in fact zero-filled. Then, we will go read those from disk swap,
essentially corrupting data.
swapping in it was page by page. My assumption was that swap_read_folio
(when support is added) would only pass a large folio that was earlier
swapped out as a large folio. So if a zero filled large folio was
swapped out, the zeromap will be set for all the pages in that folio,
and at large folio swap in (when it is supported), it will see that all
the bits in the zeromap for that folio are set, and will just
folio_zero_fill.
If even a single page in large folio has non-zero data then zeromap will
not store it and it will go to either zswap or disk, and at read time as
all the bits in zeromap are not set, it will still goto either zswap or
disk, so I think this works?
Is my assumption wrong that only large folios can be swapped in only if
they were swapped out as large? I think this code works in that case.
contiguous PTEs have contiguous swap entries and swapin the folio as a
large folio in this case. It is likely that the swap entries are
contiguous because it was swapped out as a large folio, but I don't
think it's guaranteed.
For example, here is a patch that implements large swapin support forI think the solution to large folio swap-in for this optimization and
the synchronous swapin case, and I think it just checks that the PTEs
have contiguous swap entries:
https://lore.kernel.org/linux-mm/20240304081348.197341-6-21cnbao@xxxxxxxxx/
This makes a lot of sense because otherwise you'd have to keep track
of how the folios were composed at the time of swapout, to be able to
swap the same folios back in.
zswap is not in swap_read_folio in this patch-series or any call further
down the stack, as its too late by the time you reach here, but in
Barrys' patch. This needs to happen much earlier when deciding the size
of the folio at folio creation in alloc_swap_folio, after Barry checks
if (is_zswap_enabled())
goto fallback;
once the order is decided, we would need to check the indexes in the
zeromap array starting from swap_offset(entry) and ending at
swap_offset(entry) + 2^order are set. If no bits are set, or all bits
are set, then we allocate large folio. Otherwise, we goto fallback.
I think its better to handle this in Barrys patch. I feel this series is
close to its final state, i.e. the only diff I have for the next
revision is below to remove start/end_writeback for zer_filled case. I
will comment on Barrys patch once the I send out the next revision of this.
handle the large folio swapin here. This needs to be handled at a
higher level because as you mentioned, a large folio may be partially
in the zeromap, zswap, swapcache, disk, etc.
What I meant is that we should probably have a debug check to make
sure this doesn't go unhandled. For zswap, I am trying to add a
warning and fail the swapin operation if a large folio slips through
to zswap. We can do something similar here if folks agree this is the
right way in the interim:
https://lore.kernel.org/lkml/20240611024516.1375191-3-yosryahmed@xxxxxxxxxx/.
Maybe I am too paranoid, but I think it's easy to mess up these things
when working on large folio swapin imo.