Re: [PATCH 1/2] mm, memory_hotplug: do not fail offlining too early

From: Michal Hocko
Date: Fri Oct 13 2017 - 07:58:43 EST


On Fri 13-10-17 22:42:46, Michael Ellerman wrote:
> Vlastimil Babka <vbabka@xxxxxxx> writes:
> > On 10/11/2017 08:51 AM, Michal Hocko wrote:
> >> On Wed 11-10-17 13:37:50, Michael Ellerman wrote:
> >>> Michal Hocko <mhocko@xxxxxxxxxx> writes:
> >>>> On Tue 10-10-17 23:05:08, Michael Ellerman wrote:
> >>>>> Michal Hocko <mhocko@xxxxxxxxxx> writes:
> >>>>>> From: Michal Hocko <mhocko@xxxxxxxx>
> >>>>>>
> >>>>>> Memory offlining can fail just too eagerly under a heavy memory pressure.
> ...
> >>>>>
> >>>>> This breaks offline for me.
> >>>>>
> >>>>> Prior to this commit:
> >>>>> /sys/devices/system/memory/memory0# time echo 0 > online
> >>>>> -bash: echo: write error: Device or resource busy
> >
> > Well, that means offline didn't actually work for that block even before
> > this patch, right? Is it even a movable_node block? I guess not?
>
> Correct. It should fail.
>
> >>>>> After:
> >>>>> /sys/devices/system/memory/memory0# time echo 0 > online
> >>>>> -bash: echo: write error: Device or resource busy
> >>>>>
> >>>>> real 2m0.009s
> >>>>> user 0m0.000s
> >>>>> sys 1m25.035s
> >>>>>
> >>>>> There's no way that block can be removed, it contains the kernel text,
> >>>>> so it should instantly fail - which it used to.
> >
> > Ah, right. So your complain is really about that the failure is not
> > instant anymore for blocks that can't be offlined.
>
> Yes. Previously it failed instantly, now it doesn't fail, and loops
> infinitely (once the 2 minute limit is removed).

Yeah it failed only because the migration code retried few times and we
bailed out which is wrong as well. I will send two patches as a reply to
this email.

> >> This is really strange! As you write in other email the page is
> >> reserved. That means that some of the earlier checks
> >> if (zone_idx(zone) == ZONE_MOVABLE)
> >> return false;
> >> mt = get_pageblock_migratetype(page);
> >> if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
> >
> > The MIGRATE_MOVABLE check is indeed bogus, because that doesn't
> > guarantee there are no unmovable pages in the block (CMA block OTOH
> > should be a guarantee).
>
> OK I'll try that and get back to you.

Thanks!
--
Michal Hocko
SUSE Labs