Re: [PATCH] mm: Drop "PFNs busy" printk in an expected path.

From: Michal Nazarewicz
Date: Fri Dec 30 2016 - 02:11:23 EST


On Thu, Dec 29 2016, Eric Anholt wrote:
> Michal Nazarewicz <mina86@xxxxxxxxxx> writes:
>
>> On Thu, Dec 29 2016, Eric Anholt wrote:
>>> Michal Hocko <mhocko@xxxxxxxxxx> writes:
>>>
>>>> This has been already brought up
>>>> http://lkml.kernel.org/r/20161130092239.GD18437@xxxxxxxxxxxxxx and there
>>>> was a proposed patch for that which ratelimited the output
>>>> http://lkml.kernel.org/r/20161130132848.GG18432@xxxxxxxxxxxxxx resp.
>>>> http://lkml.kernel.org/r/robbat2-20161130T195244-998539995Z@xxxxxxxxxxxxxxxxxx
>>>>
>>>> then the email thread just died out because the issue turned out to be a
>>>> configuration issue. Michal indicated that the message might be useful
>>>> so dropping it completely seems like a bad idea. I do agree that
>>>> something has to be done about that though. Can we reconsider the
>>>> ratelimit thing?
>>>
>>> I agree that the rate of the message has gone up during 4.9 -- it used
>>> to be a few per second.
>>
>> Sounds like a regression which should be fixed.
>>
>> This is why I donât think removing the message is a good idea. If you
>> suddenly see a lot of those messages, something changed for the worse.
>> If you remove this message, you will never know.
>>
>>> However, if this is an expected path during normal operation,
>>
>> This depends on your definition of âexpectedâ and ânormalâ.
>>
>> In general, I would argue that the fact those ever happen is a bug
>> somewhere in the kernel â if memory is allocated as movable, it should
>> be movable damn it!
>
> I was taking "expected" from dae803e165a11bc88ca8dbc07a11077caf97bbcb --
> if this is a actually a bug, how do we go about debugging it?

Thatâs why Iâve pointed out that this depends on the definition. In my
opinion itâs a design bug which is now nearly impossible to fix in
efficient way.

The most likely issues is that some subsystem is allocating movable
memory but then either does not provide a way to actually move it
(thatâs an obvious bug in the code IMO) or pins the memory while some
transaction is performed and at the same time CMA tries to move it.

The latter case is really unavoidable at this point which is why this
message is âexpectedâ.

But if suddenly, the rate of the messages increases dramatically, you
have yourself a performance regression.

> I've had Raspbian carrying a patch downstream to remove the error
> message for 2 years now, and I either need to get this fixed or get this
> patch merged to Fedora and Debian as well, now that they're shipping
> some support for Raspberry Pi.

--
Best regards
ããã âðððð86â ãããããã
ÂIf at first you donât succeed, give up skydivingÂ