Re: [PATCH v3 3/5] mm/memory-failure: improve memory failure action_result messages

From: Jane Chu
Date: Thu May 23 2024 - 13:39:22 EST


On 5/22/2024 1:37 PM, Oscar Salvador wrote:

On Tue, May 21, 2024 at 05:54:27PM -0600, Jane Chu wrote:
Added two explicit MF_MSG messages describing failure in get_hwpoison_page.
Attemped to document the definition of various action names, and made a few
adjustment to the action_result() calls.

Signed-off-by: Jane Chu <jane.chu@xxxxxxxxxx>
This looks much better, thanks:

Reviewed-by: Oscar Salvador <osalvador@xxxxxxx>

By the way, I was checking the block in memory_failure() that handles
refcount=0 pages, concretely the piece of code that handles buddy pages.

In there, if we fail to take the page off the buddy lists, we return
MF_FAILED, but I really think we should be returning MF_IGNORED.

I guess you mean this code -
        if (has_extra_refcount(ps, p, false))
                ret = MF_FAILED;
?

It appears in below code paths-
    hwpoison_user_mappings
      identify_page_state
        me_huge_page || me_swapcache_dirty || me_swapcache_clean
for LRU pages.

And for non-LRU
    if (!folio_test_lru(folio) && !folio_test_writeback(folio))
            goto identify_page_state;

My hunch is that the most common calling path would be: hwpoison_user_mappings has unmapped the page, then identify_page_state is called, but for some reason failed to take the page off the LRU.  The m-f() handler has isolated the page to avoid further MCE, so I think in general return MF_FAILED is okay.

That said, the line is not always clear, for example in the non-LRU case, where the m-f() handler may have done only a little, I guess I just need to let the case rest.

thanks,

-jane


Thoughts?