Re: [PATCH 2/3] x86: mm: Change tlb_flushall_shift for IvyBridge

From: Alex Shi
Date: Sat Dec 14 2013 - 06:02:12 EST


On 12/13/2013 09:43 PM, Ingo Molnar wrote:
>
> * Alex Shi <alex.shi@xxxxxxxxxx> wrote:
>
>> On 12/13/2013 09:02 AM, Alex Shi wrote:
>>>>> You have not replied to this concern of mine: if my concern is valid
>>>>> then that invalidates much of the current tunings.
>>> The benefit from pretend flush range is not unconditional, since invlpg
>>> also cost time. And different CPU has different invlpg/flush_all
>>> execution time.
>>
>> TLB refill time is also different on different kind of cpu.
>>
>> BTW,
>> A bewitching idea is till attracting me.
>> https://lkml.org/lkml/2012/5/23/148
>> Even it was sentenced to death by HPA.
>> https://lkml.org/lkml/2012/5/24/143
>
> I don't think it was sentenced to death by HPA. What do the hardware
> guys say, is this safe on current CPUs?

This talking is fully public, no any other info I known.
At that time, I tried core2, nhm, wsm, snd, ivb, all kinds of machine I
can get. No issue found.

And assuming a rebase patch is testing in Fengguang's testing system
from last Friday, no bad news till now.
Fengugang, x86-tlb branch on my github tree.
>
> If yes then as long as we only activate this optimization for known
> models (and turn it off for unknown models) we should be pretty safe,
> even if the hw guys (obviously) don't want to promise this
> indefinitely for all Intel HT implementations in the future, right?

Agree with you.
>
>> That is that just flush one of thread TLB is enough for SMT/HT,
>> seems TLB is still shared in core on Intel CPU. This benefit is
>> unconditional, and if my memory right, Kbuild testing can improve
>> about 1~2% in average level.
>
> Oh, a 1-2% kbuild speedup is absolutely _massive_. Don't even think
> about dropping this idea ... it needs to be explored.
>
> Alas, that for_each_cpu() loop is obviously disgusting, these values
> should be precalculated into percpu variables and such.

yes, pr-calcucatied variable would save much time.
>
>> So could you like to accept some ugly quirks to do this lazy TLB
>> flush on known working CPU?
>
> it's not really 'lazy TLB flush' AFAICS but a genuine optimization:
> only flush the TLB on the logical CPUs that need it, right? I.e. do
> only one flush per pair of siblings.
>
>> Forgive me if it's stupid.
>
> I'd say measurable speedups that are safe are never ever stupid.

Thanks a lot!
>
> And even the range-flush TLB optimization we are talking about here
> could still be used IMO, just tone it down a bit and make it less
> model dependent.
>
> Thanks,
>
> Ingo
>


--
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/