Re: [PATCH mm-new v6 2/5] mm: khugepaged: refine scan progress number
From: Lance Yang
Date: Sun Feb 08 2026 - 04:32:49 EST
On 2026/2/8 17:05, Dev Jain wrote:
On 06/02/26 4:42 pm, Vernon Yang wrote:
On Fri, Feb 06, 2026 at 10:02:48AM +0100, David Hildenbrand (Arm) wrote:
On 2/5/26 15:25, Dev Jain wrote:Hi David, Dev,
On 05/02/26 5:41 pm, David Hildenbrand (arm) wrote:I'm fine with the max(), but it still seems like adding complexity to
On 2/5/26 07:08, Vernon Yang wrote:
On Thu, Feb 5, 2026 at 5:35 AM David Hildenbrand (arm)Yes!
<david@xxxxxxxxxx> wrote:
I guess, your meaning is "min(_pte - pte + 1, HPAGE_PMD_NR)", not max().
Why do we even have to optimize this? :)
I'm also worried that the compiler can't optimize this since the body of
the loop is complex, as with Dev's opinion [1].
Premature ... ? :)
I mean .... we don't, but the alternate is a one liner using max().
optimize something that is nowhere prove to really be a problem.
I use "*cur_progress += 1" at the beginning of the loop, the compiler
optimize that. Assembly as follows:
60c1: 4d 29 ca sub %r9,%r10 // r10 is _pte, r9 is pte, r10 = _pte - pte
60c4: b8 00 02 00 00 mov $0x200,%eax // eax = HPAGE_PMD_NR
60c9: 44 89 5c 24 10 mov %r11d,0x10(%rsp) //
60ce: 49 c1 fa 03 sar $0x3,%r10 //
60d2: 49 83 c2 01 add $0x1,%r10 // r10 += 1
60d6: 49 39 c2 cmp %rax,%r10 // r10 = min(r10, eax)
60d9: 4c 0f 4f d0 cmovg %rax,%r10 //
60dd: 44 89 55 00 mov %r10d,0x0(%rbp) // *cur_progress = r10
To make the code simpler, Let us use "*cur_progress += 1".
Wow! Wasn't expecting that. What's your gcc version? I checked with
gcc 11.4.0 (looks pretty old) with both x86 and arm64, and it couldn't
optimize.
FWIW, 11.4.0 is newer that the minimum GCC version (8.1) required by
kernel. See Documentation/process/changes.rst
The optimization might just be version-dependent :)