Re: [PATCH v3 00/15] HWPOISON: soft offline rework
From: Oscar Salvador
Date: Tue Jun 30 2020 - 02:35:40 EST
On Tue, 2020-06-30 at 01:08 -0400, Qian Cai wrote:
> On Wed, Jun 24, 2020 at 03:01:22PM +0000, nao.horiguchi@xxxxxxxxx
> wrote:
> > I rebased soft-offline rework patchset [1][2] onto the latest
> > mmotm. The
> > rebasing required some non-trivial changes to adjust, but mainly
> > that was
> > straightforward. I confirmed that the reported problem doesn't
> > reproduce on
> > compaction after soft offline. For more precise description of the
> > problem
> > and the motivation of this patchset, please see [2].
> >
> > I think that the following two patches in v2 are better to be done
> > with
> > separate work of hard-offline rework, so it's not included in this
> > series.
> >
> > - mm,hwpoison: Take pages off the buddy when hard-offlining
> > - mm/hwpoison-inject: Rip off duplicated checks
> >
> > These two are not directly related to the reported problem, so they
> > seems
> > not urgent. And the first one breaks num_poisoned_pages counting
> > in some
> > testcases, and The second patch needs more consideration about
> > commented point.
> >
> > Any comment/suggestion/help would be appreciated.
>
> Even after applied the compling fix,
>
> 20200628065409.GA546944@u2004/">https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/
>
> madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
> would succeed without this series. Steps:
>
> # git clone https://github.com/cailca/linux-mm
> # cd linux-mm; make
> # ./random 1 (Need at least two NUMA memory nodes)
> start: migrate_huge_offline
> - use NUMA nodes 0,4.
> - mmap and free 8388608 bytes hugepages on node 0
> - mmap and free 8388608 bytes hugepages on node 4
> madvise: Input/output error
I think I know why.
It's been a while since I took a look, but I compared the posted
patchset with my newest patchset I had ready and I saw I made some
changes with regard of hugetlb pages.
I will be taking a look, although it might be better to re-post the
patchset instead of adding a fix on top since the changes are a bit
substantial.
Thanks for reporting.
--
Oscar Salvador
SUSE L3