Re: [PATCH v2 1/3] mm: Trigger bug on if a section is not found in __section_nr

From: David Hildenbrand
Date: Fri Jun 28 2019 - 07:37:19 EST


On 27.06.19 10:10, Michal Hocko wrote:
> On Thu 27-06-19 10:50:57, Alastair D'Silva wrote:
>> On Wed, 2019-06-26 at 08:57 +0200, Michal Hocko wrote:
>>> On Wed 26-06-19 16:27:30, Alastair D'Silva wrote:
>>>> On Wed, 2019-06-26 at 08:21 +0200, Michal Hocko wrote:
>>>>> On Wed 26-06-19 16:11:21, Alastair D'Silva wrote:
>>>>>> From: Alastair D'Silva <alastair@xxxxxxxxxxx>
>>>>>>
>>>>>> If a memory section comes in where the physical address is
>>>>>> greater
>>>>>> than
>>>>>> that which is managed by the kernel, this function would not
>>>>>> trigger the
>>>>>> bug and instead return a bogus section number.
>>>>>>
>>>>>> This patch tracks whether the section was actually found, and
>>>>>> triggers the
>>>>>> bug if not.
>>>>>
>>>>> Why do we want/need that? In other words the changelog should
>>>>> contina
>>>>> WHY and WHAT. This one contains only the later one.
>>>>>
>>>>
>>>> Thanks, I'll update the comment.
>>>>
>>>> During driver development, I tried adding peristent memory at a
>>>> memory
>>>> address that exceeded the maximum permissable address for the
>>>> platform.
>>>>
>>>> This caused __section_nr to silently return bogus section numbers,
>>>> rather than complaining.
>>>
>>> OK, I see, but is an additional code worth it for the non-development
>>> case? I mean why should we be testing for something that shouldn't
>>> happen normally? Is it too easy to get things wrong or what is the
>>> underlying reason to change it now?
>>>
>>
>> It took me a while to identify what the problem was - having the BUG_ON
>> would have saved me a few hours.
>>
>> I'm happy to just have the BUG_ON 'nd drop the new error return (I
>> added that in response to Mike Rapoport's comment that the original
>> patch would still return a bogus section number).
>
> Well, BUG_ON is about the worst way to handle an incorrect input. You
> really do not want to put a production environment down just because
> there is a bug in a driver, right? There are still many {VM_}BUG_ONs
> in the tree and there is a general trend to get rid of many of them
> rather than adding new ones.

VM_BUG_ON is only really active with CONFIG_DEBUG_VM. On
!CONFIG_DEBUG_VM it translated to BUILD_BUG_ON_INVALID(), which is a
compile-time only check.

Or am I missing something?

--

Thanks,

David / dhildenb