Re: [PATCH v1 1/2] powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable()
From: piliu
Date: Thu Apr 09 2020 - 10:01:52 EST
On 04/09/2020 03:26 PM, David Hildenbrand wrote:
> On 09.04.20 04:59, piliu wrote:
>>
>>
>> On 04/08/2020 10:46 AM, Baoquan He wrote:
>>> Add Pingfan to CC since he usually handles ppc related bugs for RHEL.
>>>
>>> On 04/07/20 at 03:54pm, David Hildenbrand wrote:
>>>> In commit 53cdc1cb29e8 ("drivers/base/memory.c: indicate all memory
>>>> blocks as removable"), the user space interface to compute whether a memory
>>>> block can be offlined (exposed via
>>>> /sys/devices/system/memory/memoryX/removable) has effectively been
>>>> deprecated. We want to remove the leftovers of the kernel implementation.
>>>
>>> Pingfan, can you have a look at this change on PPC? Please feel free to
>>> give comments if any concern, or offer ack if it's OK to you.
>>>
>>>>
>>>> When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
>>>> we'll start by:
>>>> 1. Testing if it contains any holes, and reject if so
>>>> 2. Testing if pages belong to different zones, and reject if so
>>>> 3. Isolating the page range, checking if it contains any unmovable pages
>>>>
>>>> Using is_mem_section_removable() before trying to offline is not only racy,
>>>> it can easily result in false positives/negatives. Let's stop manually
>>>> checking is_mem_section_removable(), and let device_offline() handle it
>>>> completely instead. We can remove the racy is_mem_section_removable()
>>>> implementation next.
>>>>
>>>> We now take more locks (e.g., memory hotplug lock when offlining and the
>>>> zone lock when isolating), but maybe we should optimize that
>>>> implementation instead if this ever becomes a real problem (after all,
>>>> memory unplug is already an expensive operation). We started using
>>>> is_mem_section_removable() in commit 51925fb3c5c9 ("powerpc/pseries:
>>>> Implement memory hotplug remove in the kernel"), with the initial
>>>> hotremove support of lmbs.
>>>>
>>>> Cc: Nathan Fontenot <nfont@xxxxxxxxxxxxxxxxxx>
>>>> Cc: Michael Ellerman <mpe@xxxxxxxxxxxxxx>
>>>> Cc: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
>>>> Cc: Paul Mackerras <paulus@xxxxxxxxx>
>>>> Cc: Michal Hocko <mhocko@xxxxxxxx>
>>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>>>> Cc: Oscar Salvador <osalvador@xxxxxxx>
>>>> Cc: Baoquan He <bhe@xxxxxxxxxx>
>>>> Cc: Wei Yang <richard.weiyang@xxxxxxxxx>
>>>> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
>>>> ---
>>>> .../platforms/pseries/hotplug-memory.c | 26 +++----------------
>>>> 1 file changed, 3 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> index b2cde1732301..5ace2f9a277e 100644
>>>> --- a/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
>>>> @@ -337,39 +337,19 @@ static int pseries_remove_mem_node(struct device_node *np)
>>>>
>>>> static bool lmb_is_removable(struct drmem_lmb *lmb)
>>>> {
>>>> - int i, scns_per_block;
>>>> - bool rc = true;
>>>> - unsigned long pfn, block_sz;
>>>> - u64 phys_addr;
>>>> -
>>>> if (!(lmb->flags & DRCONF_MEM_ASSIGNED))
>>>> return false;
>>>>
>>>> - block_sz = memory_block_size_bytes();
>>>> - scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
>>>> - phys_addr = lmb->base_addr;
>>>> -
>>>> #ifdef CONFIG_FA_DUMP
>>>> /*
>>>> * Don't hot-remove memory that falls in fadump boot memory area
>>>> * and memory that is reserved for capturing old kernel memory.
>>>> */
>>>> - if (is_fadump_memory_area(phys_addr, block_sz))
>>>> + if (is_fadump_memory_area(lmb->base_addr, memory_block_size_bytes()))
>>>> return false;
>>>> #endif
>>>> -
>>>> - for (i = 0; i < scns_per_block; i++) {
>>>> - pfn = PFN_DOWN(phys_addr);
>>>> - if (!pfn_in_present_section(pfn)) {
>>>> - phys_addr += MIN_MEMORY_BLOCK_SIZE;
>>>> - continue;
>>>> - }
>>>> -
>>>> - rc = rc && is_mem_section_removable(pfn, PAGES_PER_SECTION);
>>>> - phys_addr += MIN_MEMORY_BLOCK_SIZE;
>>>> - }
>>>> -
>>>> - return rc;
>>>> + /* device_offline() will determine if we can actually remove this lmb */
>>>> + return true;
>> So I think here swaps the check and do sequence. At least it breaks
>> dlpar_memory_remove_by_count(). It is doable to remove
>> is_mem_section_removable(), but here should be more effort to re-arrange
>> the code.
>>
>
> Thanks Pingfan,
>
> 1. "swaps the check and do sequence":
>
> Partially. Any caller of dlpar_remove_lmb() already has to deal with
> false positives. device_offline() can easily fail after
> dlpar_remove_lmb() == true. It's inherently racy.
>
> 2. "breaks dlpar_memory_remove_by_count()"
>
> Can you elaborate why it "breaks" it? It will simply try to
> offline+remove lmbs, detect that it wasn't able to offline+remove as
> much as it wanted (which could happen before as well easily), and re-add
> the already offlined+removed ones.
>
I overlooked the re-add logic. Then I think
dlpar_memory_remove_by_count() is OK with this patch.
> 3. "more effort to re-arrange the code"
>
> What would be your suggestion?
>
I had thought about merging the two loop "for_each_drmem_lmb()", and do
check inside the loop. But now it is needless.
The only concerned left is "if (lmbs_available < lmbs_to_remove)" fails
to alarm due to the weaken checking in lmb_is_removable(). Then after
heavy migration in offline_pages, we encounters this limit, and need to
re-add them back.
But I think it is a rare case plus hot-remove is also not a quite
frequent event. So it is worth to simplify the code by this patch.
Thanks for your classification.
For [1/2]
Reviewed-by: Pingfan Liu <piliu@xxxxxxxxxx>