[PATCH 1/2] mm: optimize truncation of shadow entries

From: Shakeel Butt
Date: Wed Sep 11 2024 - 13:38:50 EST


The kernel truncates the page cache in batches of PAGEVEC_SIZE. For each
batch, it traverses the page cache tree and collects the entries (folio
and shadow entries) in the struct folio_batch. For the shadow entries
present in the folio_batch, it has to traverse the page cache tree for
each individual entry to remove them. This patch optimize this by
removing them in a single tree traversal.

On large machines in our production which run workloads manipulating
large amount of data, we have observed that a large amount of CPUs are
spent on truncation of very large files (100s of GiBs file sizes). More
specifically most of time was spent on shadow entries cleanup, so
optimizing the shadow entries cleanup, even a little bit, has good
impact.

To evaluate the changes, we created 200GiB file on a fuse fs and in a
memcg. We created the shadow entries by triggering reclaim through
memory.reclaim in that specific memcg and measure the simple truncation
operation.

# time truncate -s 0 file

time (sec)
Without 5.164 +- 0.059
With-patch 4.21 +- 0.066 (18.47% decrease)

Signed-off-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>
---
mm/truncate.c | 50 +++++++++++++++++++++++---------------------------
1 file changed, 23 insertions(+), 27 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index 0668cd340a46..c7c19c816c2e 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -72,50 +72,46 @@ static void clear_shadow_entries(struct address_space *mapping,
static void truncate_folio_batch_exceptionals(struct address_space *mapping,
struct folio_batch *fbatch, pgoff_t *indices)
{
+ XA_STATE(xas, &mapping->i_pages, indices[0]);
+ int nr = folio_batch_count(fbatch);
+ struct folio *folio;
int i, j;
- bool dax;

/* Handled by shmem itself */
if (shmem_mapping(mapping))
return;

- for (j = 0; j < folio_batch_count(fbatch); j++)
+ for (j = 0; j < nr; j++)
if (xa_is_value(fbatch->folios[j]))
break;

- if (j == folio_batch_count(fbatch))
+ if (j == nr)
return;

- dax = dax_mapping(mapping);
- if (!dax) {
- spin_lock(&mapping->host->i_lock);
- xa_lock_irq(&mapping->i_pages);
+ if (dax_mapping(mapping)) {
+ for (i = j; i < nr; i++) {
+ if (xa_is_value(fbatch->folios[i]))
+ dax_delete_mapping_entry(mapping, indices[i]);
+ }
+ goto out;
}

- for (i = j; i < folio_batch_count(fbatch); i++) {
- struct folio *folio = fbatch->folios[i];
- pgoff_t index = indices[i];
-
- if (!xa_is_value(folio)) {
- fbatch->folios[j++] = folio;
- continue;
- }
+ xas_set_update(&xas, workingset_update_node);

- if (unlikely(dax)) {
- dax_delete_mapping_entry(mapping, index);
- continue;
- }
+ spin_lock(&mapping->host->i_lock);
+ xas_lock_irq(&xas);

- __clear_shadow_entry(mapping, index, folio);
+ xas_for_each(&xas, folio, indices[nr-1]) {
+ if (xa_is_value(folio))
+ xas_store(&xas, NULL);
}

- if (!dax) {
- xa_unlock_irq(&mapping->i_pages);
- if (mapping_shrinkable(mapping))
- inode_add_lru(mapping->host);
- spin_unlock(&mapping->host->i_lock);
- }
- fbatch->nr = j;
+ xas_unlock_irq(&xas);
+ if (mapping_shrinkable(mapping))
+ inode_add_lru(mapping->host);
+ spin_unlock(&mapping->host->i_lock);
+out:
+ folio_batch_remove_exceptionals(fbatch);
}

/**
--
2.43.5