On Tue, Aug 27, 2019 at 03:17:39PM +0300, Kirill A. Shutemov wrote:
On Tue, Aug 27, 2019 at 02:09:23PM +0200, Michal Hocko wrote:Below is what I've come up with. It appears to be functional.
On Tue 27-08-19 14:01:56, Vlastimil Babka wrote:It was not intended to fix the issue. It's fix for current logic. I'm
On 8/27/19 1:02 PM, Kirill A. Shutemov wrote:Ohh, right. I can see that in free_transhuge_page now. So fully mapped
On Tue, Aug 27, 2019 at 08:01:39AM +0200, Michal Hocko wrote:But that adding to queue doesn't affect whether the page will be freed
On Mon 26-08-19 16:15:38, Kirill A. Shutemov wrote:Well, you read correctly, but it was not intended. I screwed it up at some
Unmapped completely pages will be freed with current code. Deferred splitHmm, I am probably misreading the code but at least current Linus' tree
only applies to partly mapped THPs: at least on 4k of the THP is still
mapped somewhere.
reads page_remove_rmap -> [page_remove_anon_compound_rmap ->\ deferred_split_huge_page even
for fully mapped THP.
point.
See the patch below. It should make it work as intened.
It's not bug as such, but inefficientcy. We add page to the queue where
it's not needed.
immediately if there are no more partial mappings, right? I don't see
deferred_split_huge_page() pinning the page.
So your patch wouldn't make THPs freed immediately in cases where they
haven't been freed before immediately, it just fixes a minor
inefficiency with queue manipulation?
THPs really do not matter and what I have considered an odd case is
really happening more often.
That being said this will not help at all for what Yang Shi is seeing
and we need a more proactive deferred splitting as I've mentioned
earlier.
playing with the work approach now.
Any comments?
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d77d717c620c..c576e9d772b7 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -758,6 +758,8 @@ typedef struct pglist_data {
spinlock_t split_queue_lock;
struct list_head split_queue;
unsigned long split_queue_len;
+ unsigned int deferred_split_calls;
+ struct work_struct deferred_split_work;
#endif
/* Fields commonly accessed by the page reclaim scanner */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index de1f15969e27..12d109bbe8ac 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2820,22 +2820,6 @@ void free_transhuge_page(struct page *page)
free_compound_page(page);
}
-void deferred_split_huge_page(struct page *page)
-{
- struct pglist_data *pgdata = NODE_DATA(page_to_nid(page));
- unsigned long flags;
-
- VM_BUG_ON_PAGE(!PageTransHuge(page), page);
-
- spin_lock_irqsave(&pgdata->split_queue_lock, flags);
- if (list_empty(page_deferred_list(page))) {
- count_vm_event(THP_DEFERRED_SPLIT_PAGE);
- list_add_tail(page_deferred_list(page), &pgdata->split_queue);
- pgdata->split_queue_len++;
- }
- spin_unlock_irqrestore(&pgdata->split_queue_lock, flags);
-}
-
static unsigned long deferred_split_count(struct shrinker *shrink,
struct shrink_control *sc)
{
@@ -2901,6 +2885,44 @@ static struct shrinker deferred_split_shrinker = {
.flags = SHRINKER_NUMA_AWARE,
};
+void flush_deferred_split_queue(struct work_struct *work)
+{
+ struct pglist_data *pgdata;
+ struct shrink_control sc;
+
+ pgdata = container_of(work, struct pglist_data, deferred_split_work);
+ sc.nid = pgdata->node_id;
+ sc.nr_to_scan = 0; /* Unlimited */
+
+ deferred_split_scan(NULL, &sc);
+}
+
+#define NR_CALLS_TO_SPLIT 32
+#define NR_PAGES_ON_QUEUE_TO_SPLIT 16
+
+void deferred_split_huge_page(struct page *page)
+{
+ struct pglist_data *pgdata = NODE_DATA(page_to_nid(page));
+ unsigned long flags;
+
+ VM_BUG_ON_PAGE(!PageTransHuge(page), page);
+
+ spin_lock_irqsave(&pgdata->split_queue_lock, flags);
+ if (list_empty(page_deferred_list(page))) {
+ count_vm_event(THP_DEFERRED_SPLIT_PAGE);
+ list_add_tail(page_deferred_list(page), &pgdata->split_queue);
+ pgdata->split_queue_len++;
+ pgdata->deferred_split_calls++;
+ }
+
+ if (pgdata->deferred_split_calls > NR_CALLS_TO_SPLIT &&
+ pgdata->split_queue_len > NR_PAGES_ON_QUEUE_TO_SPLIT) {
+ pgdata->deferred_split_calls = 0;
+ schedule_work(&pgdata->deferred_split_work);
+ }
+ spin_unlock_irqrestore(&pgdata->split_queue_lock, flags);
+}
+
#ifdef CONFIG_DEBUG_FS
static int split_huge_pages_set(void *data, u64 val)
{
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9c9194959271..86af66d463e9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6636,11 +6636,14 @@ static unsigned long __init calc_memmap_size(unsigned long spanned_pages,
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+void flush_deferred_split_queue(struct work_struct *work);
static void pgdat_init_split_queue(struct pglist_data *pgdat)
{
spin_lock_init(&pgdat->split_queue_lock);
INIT_LIST_HEAD(&pgdat->split_queue);
pgdat->split_queue_len = 0;
+ pgdat->deferred_split_calls = 0;
+ INIT_WORK(&pgdat->deferred_split_work, flush_deferred_split_queue);
}
#else
static void pgdat_init_split_queue(struct pglist_data *pgdat) {}