Re: [QUESTION FOR ARM64 TLB] performance issue and implementation difference of TLB flush

From: Gang Li
Date: Tue May 16 2023 - 03:47:29 EST


Hi,

On 2023/5/9 22:30, Mark Rutland wrote:
For example, early in D8.13 we have the rule:

| R_SQBCS
|
| When address translation is enabled, a translation table entry for an
| in-context translation regime that does not cause a Translation fault, an
| Address size fault, or an Access flag fault is permitted to be cached in a
| TLB or intermediate TLB caching structure as the result of an explicit or
| speculative access.


Thanks a lot!

I looked up the x86 manual and found that the x86 TLB cache mechanism is
similar to arm64 (but the x86 guys haven't reply me yet):

Intel® 64 and IA-32 Architectures Software Developer Manuals:
4.10.2.3 Details of TLB Use
Subject to the limitations given in the previous paragraph, the
processor may cache a translation for any linear address, even if that
address is not used to access memory. For example, the processor may
cache translations required for prefetches and for accesses that result
from speculative execution that would never actually occur in the
executed code path.

Both architectures have similar TLB cache policies, why arm64 flush all
and x86 flush local in ghes_map and ghes_unmap?

I think flush all may be unnecessary.

1. Before accessing ghes data. Each CPU needs to call ghes_map, which
will create the mapping and flush their own TLb to make sure the current
CPU is using the latest mapping.

2. And there is no need to flush all in ghes_unmap, because the ghes_map
of other CPUs will flush their own TLBs before accessing the memory.

What do you think?

Thanks,
Gang Li.