Re: [PATCH mm-new v6 2/5] mm: khugepaged: refine scan progress number

From: Lance Yang

Date: Sun Feb 08 2026 - 04:32:49 EST




On 2026/2/8 17:05, Dev Jain wrote:

On 06/02/26 4:42 pm, Vernon Yang wrote:
On Fri, Feb 06, 2026 at 10:02:48AM +0100, David Hildenbrand (Arm) wrote:
On 2/5/26 15:25, Dev Jain wrote:
On 05/02/26 5:41 pm, David Hildenbrand (arm) wrote:
On 2/5/26 07:08, Vernon Yang wrote:
On Thu, Feb 5, 2026 at 5:35 AM David Hildenbrand (arm)
<david@xxxxxxxxxx> wrote:

I guess, your meaning is "min(_pte - pte + 1, HPAGE_PMD_NR)", not max().
Yes!


I'm also worried that the compiler can't optimize this since the body of
the loop is complex, as with Dev's opinion [1].
Why do we even have to optimize this? :)

Premature ... ? :)

I mean .... we don't, but the alternate is a one liner using max().
I'm fine with the max(), but it still seems like adding complexity to
optimize something that is nowhere prove to really be a problem.
Hi David, Dev,

I use "*cur_progress += 1" at the beginning of the loop, the compiler
optimize that. Assembly as follows:

60c1: 4d 29 ca sub %r9,%r10 // r10 is _pte, r9 is pte, r10 = _pte - pte
60c4: b8 00 02 00 00 mov $0x200,%eax // eax = HPAGE_PMD_NR
60c9: 44 89 5c 24 10 mov %r11d,0x10(%rsp) //
60ce: 49 c1 fa 03 sar $0x3,%r10 //
60d2: 49 83 c2 01 add $0x1,%r10 // r10 += 1
60d6: 49 39 c2 cmp %rax,%r10 // r10 = min(r10, eax)
60d9: 4c 0f 4f d0 cmovg %rax,%r10 //
60dd: 44 89 55 00 mov %r10d,0x0(%rbp) // *cur_progress = r10

To make the code simpler, Let us use "*cur_progress += 1".

Wow! Wasn't expecting that. What's your gcc version? I checked with
gcc 11.4.0 (looks pretty old) with both x86 and arm64, and it couldn't
optimize.

FWIW, 11.4.0 is newer that the minimum GCC version (8.1) required by
kernel. See Documentation/process/changes.rst

The optimization might just be version-dependent :)