Re: [PATCH v17 17/21] mm/lru: replace pgdat lru_lock with lruvec lock

From: Alex Shi
Date: Tue Jul 28 2020 - 22:28:06 EST




å 2020/7/29 äå9:27, Alexander Duyck åé:
> On Tue, Jul 28, 2020 at 6:00 PM Alex Shi <alex.shi@xxxxxxxxxxxxxxxxx> wrote:
>>
>>
>>
>> å 2020/7/28 äå10:54, Alexander Duyck åé:
>>> On Tue, Jul 28, 2020 at 4:20 AM Alex Shi <alex.shi@xxxxxxxxxxxxxxxxx> wrote:
>>>>
>>>>
>>>>
>>>> å 2020/7/28 äå7:34, Alexander Duyck åé:
>>>>>> @@ -1876,6 +1876,12 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec,
>>>>>> * list_add(&page->lru,)
>>>>>> * list_add(&page->lru,) //corrupt
>>>>>> */
>>>>>> + new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
>>>>>> + if (new_lruvec != lruvec) {
>>>>>> + if (lruvec)
>>>>>> + spin_unlock_irq(&lruvec->lru_lock);
>>>>>> + lruvec = lock_page_lruvec_irq(page);
>>>>>> + }
>>>>>> SetPageLRU(page);
>>>>>>
>>>>>> if (unlikely(put_page_testzero(page))) {
>>>>> I was going through the code of the entire patch set and I noticed
>>>>> these changes in move_pages_to_lru. What is the reason for adding the
>>>>> new_lruvec logic? My understanding is that we are moving the pages to
>>>>> the lruvec provided are we not?If so why do we need to add code to get
>>>>> a new lruvec? The code itself seems to stand out from the rest of the
>>>>> patch as it is introducing new code instead of replacing existing
>>>>> locking code, and it doesn't match up with the description of what
>>>>> this function is supposed to do since it changes the lruvec.
>>>>
>>>> this new_lruvec is the replacement of removed line, as following code:
>>>>>> - lruvec = mem_cgroup_page_lruvec(page, pgdat);
>>>> This recheck is for the page move the root memcg, otherwise it cause the bug:
>>>
>>> Okay, now I see where the issue is. You moved this code so now it has
>>> a different effect than it did before. You are relocking things before
>>> you needed to. Don't forget that when you came into this function you
>>> already had the lock. In addition the patch is broken as it currently
>>> stands as you aren't using similar logic in the code just above this
>>> addition if you encounter an evictable page. As a result this is
>>> really difficult to review as there are subtle bugs here.
>>
>> Why you think its a bug? the relock only happens if locked lruvec is different.
>> and unlock the old one.
>
> The section I am talking about with the bug is this section here:
> while (!list_empty(list)) {
> + struct lruvec *new_lruvec = NULL;
> +
> page = lru_to_page(list);
> VM_BUG_ON_PAGE(PageLRU(page), page);
> list_del(&page->lru);
> if (unlikely(!page_evictable(page))) {
> - spin_unlock_irq(&pgdat->lru_lock);
> + spin_unlock_irq(&lruvec->lru_lock);
> putback_lru_page(page);
> - spin_lock_irq(&pgdat->lru_lock);
> + spin_lock_irq(&lruvec->lru_lock);

It would be still fine. The lruvec->lru_lock will be checked again before
we take and use it.
And this lock will optimized in patch 19th which did by Hugh Dickins.

> continue;
> }
>
> Basically it probably is not advisable to be retaking the
> lruvec->lru_lock directly as the lruvec may have changed so it
> wouldn't be correct for the next page. It would make more sense to be
> using your API and calling unlock_page_lruvec_irq and
> lock_page_lruvec_irq instead of using the lock directly.
>
>>>
>>> I suppose the correct fix is to get rid of this line, but it should
>>> be placed everywhere the original function was calling
>>> spin_lock_irq().
>>>
>>> In addition I would consider changing the arguments/documentation for
>>> move_pages_to_lru. You aren't moving the pages to lruvec, so there is
>>> probably no need to pass that as an argument. Instead I would pass
>>> pgdat since that isn't going to be moving and is the only thing you
>>> actually derive based on the original lruvec.
>>
>> yes, The comments should be changed with the line was introduced from long ago. :)
>> Anyway, I am wondering if it worth a v18 version resend?
>
> So I have been looking over the function itself and I wonder if it
> isn't worth looking at rewriting this to optimize the locking behavior
> to minimize the number of times we have to take the LRU lock. I have
> some code I am working on that I plan to submit as an RFC in the next
> day or so after I can get it smoke tested. The basic idea would be to
> defer returning the evictiable pages or freeing the compound pages
> until after we have processed the pages that can be moved while still
> holding the lock. I would think it should reduce the lock contention
> significantly while improving the throughput.
>

I had tried once, but the freeing page cross onto release_pages which hard to deal with.
I am very glad to wait your patch, and hope it could be resovled. :)

Thanks
Alex