[PATCH RFC v1 00/11] hwpoison improvement part 1

From: Naoya Horiguchi
Date: Fri Nov 09 2018 - 01:47:38 EST


Hi everyone,

I wrote hwpoison patches which partially mention the problems
discussed recently on this area [1].

Main point of this series is how we isolate faulty pages more
safely/reliable. As pointed out from Michal in thread [2], we can
have better isolation functions rather than what we currently have.
Patch 8/11 gives the implementation. As a result, the behavior of
poisoned pages (at least from soft-offline) are more predictable
and I think that memory hotremove should properly work with it.

The structure of this series:
- patch 1-7 are small fixes, preparation, and/or cleanup.
I can separate these out from main part if you like.
- patch 8 is core part of this series, providing some code
to pick out the target page from buddy allocator,
- patch 9-11 are changes on caller sides (hard-offline,
hotremove and unpoison.)

One big issue not addressed by this series is hard-offlining hugetlb,
which is still a todo unfortunately.

Another remaining work is to rework on the behavior of PG_hwpoison
flag from hard-offlining of in-use page. Even with this series,
hard-offline for in-use pages works as in the past (i.e. we still take
racy "set PG_hwpoison at first, then do some handling" approach.)
Without changing this, we can't be free from many "if (PageHWPoison)"
checks in mm code. So I'll think/try more about it after this one.

Anyway this is the first step for better solution (I believe,)
and any kind of help is applicated.

Thanks,
Naoya Horiguchi

[1]: https://lwn.net/Articles/753261/
[2]: https://lkml.org/lkml/2018/7/17/60
---
Summary:

Naoya Horiguchi (11):
mm: hwpoison: cleanup unused PageHuge() check
mm: soft-offline: add missing error check of set_hwpoison_free_buddy_page()
mm: move definition of num_poisoned_pages_inc/dec to include/linux/mm.h
mm: madvise: call soft_offline_page() without MF_COUNT_INCREASED
mm: hwpoison-inject: don't pin for hwpoison_filter()
mm: hwpoison: remove MF_COUNT_INCREASED
mm: remove flag argument from soft offline functions
mm: soft-offline: isolate error pages from buddy freelist
mm: hwpoison: apply buddy page handling code to hard-offline
mm: clear PageHWPoison in memory hotremove
mm: hwpoison: introduce clear_hwpoison_free_buddy_page()

drivers/base/memory.c | 2 +-
include/linux/mm.h | 22 ++++++---
include/linux/page-flags.h | 8 +++-
include/linux/swapops.h | 16 -------
mm/hwpoison-inject.c | 18 ++------
mm/madvise.c | 25 +++++-----
mm/memory-failure.c | 112 ++++++++++++++++++++++++++-------------------
mm/migrate.c | 9 ----
mm/page_alloc.c | 95 +++++++++++++++++++++++++++++++++++---
mm/sparse.c | 2 +-
10 files changed, 193 insertions(+), 116 deletions(-)