Re: [PATCH] resource: re-factor page_is_ram()
From: Vaibhav Jain
Date: Thu Jun 16 2022 - 02:46:35 EST
Hi David,
Thanks for looking into this patch,
David Hildenbrand <david@xxxxxxxxxx> writes:
> On 01.06.22 18:32, Vaibhav Jain wrote:
>> Presently page_is_ram() relies on walk_system_ram_range() that performs a walk
>> on kernel iomem resources hierarchy with a dummy callback __is_ram(). Before
>> calling find_next_iomem_res(), walk_system_ram_range() does some book-keeping
>> which can be avoided for page_is_ram() use-case.
>>
>> Hence this patch proposes to update page_is_ram() to directly call
>> find_next_iomem_res() with minimal book-keeping needed.
>
> I consider the code harder to get compared to just reusing the
> more-generic and expressive walk_system_ram_range()
>
> It somehow feels like we're duplicating the code here just to optimize
> out a handful of instructions.
The only reason for existence of dummy callback __is_ram() is for
page_is_ram() to be able to use walk_system_ram_range(). For
page_is_ram() usecase what walk_system_ram_range() essentially does is
to iterate over find_next_iomem_res() and call __is_ram() which is not
really needed to page_is_ram().
The improvement to the gcc (v12.1.1) generated code (x86_64) for
page_is_ram is quite evident.
With the patch:
0x0000000000000920 <+0>: call 0x925 <page_is_ram+5>
0x0000000000000925 <+5>: shl $0xc,%rdi
0x0000000000000929 <+9>: xor %r8d,%r8d
0x000000000000092c <+12>: xor %ecx,%ecx
0x000000000000092e <+14>: mov $0x81000200,%edx
0x0000000000000933 <+19>: lea 0x1(%rdi),%rsi
0x0000000000000937 <+23>: call 0x7e0 <find_next_iomem_res>
0x000000000000093c <+28>: test %eax,%eax
0x000000000000093e <+30>: sete %al
0x0000000000000941 <+33>: movzbl %al,%eax
0x0000000000000944 <+36>: ret
0x0000000000000945 <+37>: int3
Without the patch:
0x0000000000001000 <+0>: call 0x1005 <page_is_ram+5>
0x0000000000001005 <+5>: shl $0xc,%rdi
0x0000000000001009 <+9>: lea 0xfff(%rdi),%rsi
0x0000000000001010 <+16>: cmp %rsi,%rdi
0x0000000000001013 <+19>: jae 0x1064 <page_is_ram+100>
0x0000000000001015 <+21>: sub $0x40,%rsp
0x0000000000001019 <+25>: xor %ecx,%ecx
0x000000000000101b <+27>: mov $0x81000200,%edx
0x0000000000001020 <+32>: mov %rsp,%r8
0x0000000000001023 <+35>: call 0x7e0 <find_next_iomem_res>
0x0000000000001028 <+40>: test %eax,%eax
0x000000000000102a <+42>: jne 0x105a <page_is_ram+90>
0x000000000000102c <+44>: mov (%rsp),%rax
0x0000000000001030 <+48>: mov $0x1,%ecx
0x0000000000001035 <+53>: lea 0xfff(%rax),%rdx
0x000000000000103c <+60>: mov 0x8(%rsp),%rax
0x0000000000001041 <+65>: shr $0xc,%rdx
0x0000000000001045 <+69>: add $0x1,%rax
0x0000000000001049 <+73>: shr $0xc,%rax
0x000000000000104d <+77>: cmp %rax,%rdx
0x0000000000001050 <+80>: jae 0x105a <page_is_ram+90>
0x0000000000001052 <+82>: mov %ecx,%eax
0x0000000000001054 <+84>: add $0x40,%rsp
0x0000000000001058 <+88>: ret
0x0000000000001059 <+89>: int3
0x000000000000105a <+90>: xor %ecx,%ecx
0x000000000000105c <+92>: add $0x40,%rsp
0x0000000000001060 <+96>: mov %ecx,%eax
0x0000000000001062 <+98>: ret
0x0000000000001063 <+99>: int3
0x0000000000001064 <+100>: xor %eax,%eax
0x0000000000001066 <+102>: ret
0x0000000000001067 <+103>: int3
Looking at the disassembly above, gcc has inlined both walk_system_ram_range()
and __is_ram() in page_is_ram(). This ends up in page_is_ram() calling
find_next_iomem_res() directly anyways with bunch of book-keeping
afterwards which can be avoided.
>
> If it doesn't make the code easier to read (at least for me), why do we
> care?
IMHO, calling find_next_iomem_res() from page_is_ram() instead of
calling walk_system_ram_range() makes it easy to trace the path of
page_is_ram(). Also the dummy callback makes the code flow seems strange
initially.
--
Cheers
~ Vaibhav