Re: [tech-privileged] [RFC PATCH V1] riscv-privileged: Add broadcast mode to sfence.vma

From: Guo Ren
Date: Thu Sep 19 2019 - 20:13:22 EST


Hi,

On Fri, Sep 20, 2019 at 12:10 AM Andrew Waterman <andrew@xxxxxxxxxx> wrote:
>
> This needs to be discussed and debated at length; proposing edits to the spec at this stage is putting the cart before the horse!
Agree :)

>
> We shouldnât change the definition of the existing SFENCE.VMA instruction to accomplish this. Itâs also not abundantly clear to me that this should be an instruction:
If you implement sfence.vma as current define, it also could work with
new mechanism, they are compatible.

> TLB shootdown looks more like MMIO.
Per-CPU MMIO ? I the proposal, every hart only takes care of its own request.




>
> On Thu, Sep 19, 2019 at 5:36 AM Guo Ren <guoren@xxxxxxxxxx> wrote:
>>
>> From: Guo Ren <ren_guo@xxxxxxxxx>
>>
>> The patch is for https://github.com/riscv/riscv-isa-manual
>>
>> The proposal has been talked in LPC-2019 RISC-V MC ref [1]. Here is the
>> formal patch.
>>
>> Introduction
>> ============
>>
>> Using the Hardware TLB broadcast invalidation instruction to maintain the
>> system TLB is a good choice and it'll simplify the system software design.
>> The proposal hopes to add a broadcast mode to the sfence.vma in the
>> riscv-privilege specification. To support the sfence.vma broadcast mode,
>> there are two modification introduced below:
>>
>> 1) Add PGD.PPN (root page table's PPN) as the unique identifier of the
>> address space in addition to asid/vmid. Compared to the dynamically
>> changed asid/vmid, PGD.PPN is fixed throughout the address space life
>> cycle. This feature enables uniform address space identification
>> between different TLB systems (actually, it's difficult to unify the
>> asid/vmid between the CPU system and the IOMMU system, because their
>> mechanisms are different)
>>
>> 2) Modify the definition of the sfence.vma instruction from synchronous
>> mode to asynchronous mode, which means that the completion of the TLB
>> operation is not guaranteed when the sfence.vma instruction retires.
>> It needs to be completed by checking the flag bit on the hart. The
>> sfence.vma request finish can notify the software by generating an
>> interrupt. This function alleviates the large delay of TLB invalidation
>> in the PCI ATS system.
>>
>> Add S1/S2.PGD.PPN for ASID/VMID
>> ===============================
>>
>> PGD is global directory (defined in linux) and PPN is page physical number
>> (defined in riscv-spec). PGD.PNN corresponds to the root page table pointer
>> of the address space, i.e. mm->pgd (linux concept).
>>
>> In CPU/IOMMU TLB, we use asid/vmid to distinguish the address space of
>> process or virtual machine. Due to the limitation of id encoding, it can
>> only represent a part(window) of the address space. S1/S2.PGD.PPN are the
>> root page table's PPNs of the address spaces and S1/S2.PGD.PPN are the
>> unique identifier of the address spaces.
>>
>> For the CPU SMP system, you can use context switch to perform the necessary
>> software mechanism to ensure that the asid/vmid on all harts is consistent
>> (please refer to the arm64 asid mechanism). In this way, the TLB broadcast
>> invalidation instruction can determine the address space processed on all
>> harts by asid/vmid.
>>
>> Different from the CPU SMP system, there is no context switch for the
>> DMA-IOMMU system, so the unification with the CPU asid/vmid cannot be
>> guaranteed. So we need a unique identifier for the address space to
>> establish a communication bridge between the TLBs of different systems.
>>
>> That is PGD.PPN (for virtualization scenarios: S1/S2.PGD.PPN)
>>
>> current:
>> sfence.vma rs1 = vaddr, rs2 = asid
>> hfence.vvma rs1 = vaddr, rs2 = asid
>> hfence.gvma rs1 = gaddr, rs2 = vmid
>>
>> proposed:
>> sfence.vma rs1 = vaddr, rs2 = mode:ppn:asid
>> hfence.vvma rs1 = vaddr, rs2 = mode:ppn:asid
>> hfence.gvma rs1 = gaddr, rs2 = mode:ppn:vmid
>>
>> mode - broadcast | local
>> ppn - the PPN of the address space of the root page table
>> vmid/asid - the window identifier of the address space
>>
>> At the Linux Plumber Conference 2019 RISCV-MC, ref:[1], we've showed two
>> IOMMU examples to explain how it work with hardware.
>>
>> 1) In a lightweight IOMMU system (up to 64 address spaces), the hardware
>> could directly convert PGD.PPN into DID (IOMMU ASID)
>>
>> 2) For the PCI ATS scenario, its IO ASID/VMID encoding space can support
>> a very large number of address spaces. We use two reverse mapping
>> tables to let the hardware translate S1/S2.PGD.PPN into IO ASID/VMID.
>>
>> ASYNC BROADCAST SFENCE.VMA
>> ===========================
>>
>> To support the high latency broadcast sfence.vma operation in the PCI ATS
>> usage scenario, we modify the sfence.vma from synchronous mode to
>> asynchronous mode. (For simpler implementation, if hardware only implement
>> synchronous mode and software still work in asynchronous mode)
>>
>> To implement the asynchronous mode, 3 features are added:
>> 1) sstatus:TLBI
>> A "status bit - TLBI" is added to the sstatus register. The TLBI status
>> bit indicates if there are still outstanding sfence.vma requests on the
>> current hart.
>> Value:
>> 1: sfence.vma requests are not completed.
>> 0: all sfece.vma requests completed, request queue is empty.
>>
>> 2) sstatus:TLBIC
>> A "control bits - TLBIC" is added to sstatus register. The TLBIC control
>> bits are controlled by software.
>> "Write 1" will trigger the current hart check to see if there are still
>> outstanding sfence.vma requests. If there are unfinished requests, an
>> interrupt will be generated when the request is completed, notifying the
>> software that all of the current sfence.vma requests have been completed.
>> "Write 0" will cause nothing.
>>
>> 3) supervisor interrupt register (sip & sie):TLBI finish interrupt
>> A per-hart interrupt is added to supervisor interrupt registers.
>> When all sfence.vma requests are completed and sstatus:TLBIC has been
>> triggered, hart will receive a TLBI finish interrupt. Just like timer,
>> software and external interrupt's definition in sip & sie.
>>
>> Fake code:
>>
>> flush_tlb_page(vma, addr) {
>> asid = cpu_asid(vma->vm_mm);
>> ppn = PFN_DOWN(vma->vm_mm->pgd);
>>
>> sfence.vma (addr, 1|PPN_OFFSET(ppn)|asid); //1. start request
>>
>> while(sstatus:TLBI) if (time_out() > 1ms) break; //2. loop check
>>
>> while (sstatus:TLBI) {
>> ...
>> set sstatus:TLBIC;
>> wait_TLBI_finish_interrupt(); //3. wait irq, io_schedule
>> }
>> }
>>
>> Here we give 2 level check:
>> 1) loop check sstatus:TLBI, CPU could response Interrupt.
>> 2) set sstatus:TLBIC and wait for irq, CPU schedule out for other task.
>>
>> ACE-DVM Example
>> ===============
>>
>> Honestly, "broadcasting addr, asid, vmid, S1/S2.PGD.PPN to interconnects"
>> and "ASYNC SFENCE.VMA" could be implemented by ACE-DVM protocol ref [2].
>>
>> There are 3 types of transactions in DVM:
>>
>> - DVM operation
>> Send all information to the interconnect, including addr, asid,
>> S1.PGD.PPN, vmid, S2.PGD.PPN.
>>
>> - DVM synchronization
>> Check that all DVM operations have been completed. If not, it will use
>> state machine to wait DVM complete requests.
>>
>> - DVM complete
>> Return transaction from components, eg: IOMMU. If hart has received all
>> DVM completes which are triggered by sfence.vma instructions and
>> "sstatus:TLBIC" has been set, a TLBI finish interrupt is triggered.
>>
>> (Actually, we do not need to implement the above functions strictly
>> according to the ACE specification :P )
>>
>> 1: https://www.linuxplumbersconf.org/event/4/contributions/307/
>> 2: AMBA AXI and ACE Protocol Specification - Distributed Virtual Memory
>> Transactions"
>>
>> Signed-off-by: Guo Ren <ren_guo@xxxxxxxxx>
>> Reviewed-by: Li Feiteng <feiteng_li@xxxxxxxxx>
>> ---
>> src/hypervisor.tex | 43 ++++++++-------
>> src/supervisor.tex | 155 +++++++++++++++++++++++++++++++++++++++++------------
>> 2 files changed, 143 insertions(+), 55 deletions(-)
>>
>> diff --git a/src/hypervisor.tex b/src/hypervisor.tex
>> index 47b90b2..3718819 100644
>> --- a/src/hypervisor.tex
>> +++ b/src/hypervisor.tex
>> @@ -1094,15 +1094,15 @@ The hypervisor extension adds two new privileged fence instructions.
>> \multicolumn{1}{c|}{opcode} \\
>> \hline
>> 7 & 5 & 5 & 3 & 5 & 7 \\
>> -HFENCE.GVMA & vmid & gaddr & PRIV & 0 & SYSTEM \\
>> -HFENCE.VVMA & asid & vaddr & PRIV & 0 & SYSTEM \\
>> +HFENCE.GVMA & mode:ppn:vmid & gaddr & PRIV & 0 & SYSTEM \\
>> +HFENCE.VVMA & mode:ppn:asid & vaddr & PRIV & 0 & SYSTEM \\
>> \end{tabular}
>> \end{center}
>>
>> The hypervisor memory-management fence instructions, HFENCE.GVMA and
>> HFENCE.VVMA, are valid only in HS-mode when {\tt mstatus}.TVM=0, or in M-mode
>> (irrespective of {\tt mstatus}.TVM).
>> -These instructions perform a function similar to SFENCE.VMA
>> +These instructions perform a function similar to SFENCE.VMA (broadcast/local)
>> (Section~\ref{sec:sfence.vma}), except applying to the guest-physical
>> memory-management data structures controlled by CSR {\tt hgatp} (HFENCE.GVMA)
>> or the VS-level memory-management data structures controlled by CSR {\tt vsatp}
>> @@ -1136,11 +1136,10 @@ An HFENCE.VVMA instruction applies only to a single virtual machine, identified
>> by the setting of {\tt hgatp}.VMID when HFENCE.VVMA executes.
>> \end{commentary}
>>
>> -When {\em rs2}$\neq${\tt x0}, bits XLEN-1:ASIDMAX of the value held in {\em
>> -rs2} are reserved for future use and should be zeroed by software and ignored
>> -by current implementations.
>> -Furthermore, if ASIDLEN~$<$~ASIDMAX, the implementation shall ignore bits
>> -ASIDMAX-1:ASIDLEN of the value held in {\em rs2}.
>> +When {\em rs2}$\neq${\tt x0}, bits contain 3 informations: mode, ppn, asid.
>> +1) mode control HFENCE.VVMA broadcast or not.
>> +2) ppn is the root page talbe's PPN of the asid address space.
>> +3) asid is the identifier of process in virtual machine.
>>
>> \begin{commentary}
>> Simpler implementations of HFENCE.VVMA can ignore the guest virtual address in
>> @@ -1168,11 +1167,10 @@ physical addresses in PMP address registers (Section~\ref{sec:pmp}) and in page
>> table entries (Sections \ref{sec:sv32}, \ref{sec:sv39}, and~\ref{sec:sv48}).
>> \end{commentary}
>>
>> -When {\em rs2}$\neq${\tt x0}, bits XLEN-1:VMIDMAX of the value held in {\em
>> -rs2} are reserved for future use and should be zeroed by software and ignored
>> -by current implementations.
>> -Furthermore, if VMIDLEN~$<$~VMIDMAX, the implementation shall ignore bits
>> -VMIDMAX-1:VMIDLEN of the value held in {\em rs2}.
>> +When {\em rs2}$\neq${\tt x0}, bits contain 3 informations: mode, vmid, ppn.
>> +1) mode control HFENCE.GVMA broadcast or not.
>> +2) ppn is the root page talbe's PPN of the vmid address space.
>> +3) vmid is the identifier of virtual machine.
>>
>> \begin{commentary}
>> Simpler implementations of HFENCE.GVMA can ignore the guest physical address in
>> @@ -1567,21 +1565,22 @@ register.
>> \subsection{Memory-Management Fences}
>>
>> The behavior of the SFENCE.VMA instruction is affected by the current
>> -virtualization mode V. When V=0, the virtual-address argument is an HS-level
>> -virtual address, and the ASID argument is an HS-level ASID.
>> +virtualization mode V. When V=0, the rs1 argument is an HS-level
>> +virtual address, and the rs2 argument is an HS-level ASID and root page table's PPN.
>> The instruction orders stores only to HS-level address-translation structures
>> with subsequent HS-level address translations.
>>
>> -When V=1, the virtual-address argument to SFENCE.VMA is a guest virtual
>> -address within the current virtual machine, and the ASID argument is a VS-level
>> -ASID within the current virtual machine.
>> +When V=1, the rs1 argument to SFENCE.VMA is a guest virtual
>> +address within the current virtual machine, and the rs2 argument is a VS-level
>> +ASID and root page table's PPN within the current virtual machine.
>> The current virtual machine is identified by the VMID field of CSR {\tt hgatp},
>> -and the effective ASID can be considered to be the combination of this VMID
>> -with the VS-level ASID.
>> +and the effective ASID and root page table's PPN can be considered to be the
>> +combination of this VMID and root page table's PPN with the VS-level ASID and
>> +root page table's PPN.
>> The SFENCE.VMA instruction orders stores only to the VS-level
>> address-translation structures with subsequent VS-level address translations
>> -for the same virtual machine, i.e., only when {\tt hgatp}.VMID is the same as
>> -when the SFENCE.VMA executed.
>> +for the same virtual machine, i.e., only when {\tt hgatp}.VMID and {\\tt hgatp}.PPN is
>> +the same as when the SFENCE.VMA executed.
>>
>> Hypervisor instructions HFENCE.GVMA and HFENCE.VVMA provide additional
>> memory-management fences to complement SFENCE.VMA.
>> diff --git a/src/supervisor.tex b/src/supervisor.tex
>> index ba3ced5..2877b7a 100644
>> --- a/src/supervisor.tex
>> +++ b/src/supervisor.tex
>> @@ -47,10 +47,12 @@ register keeps track of the processor's current operating state.
>> \begin{center}
>> \setlength{\tabcolsep}{4pt}
>> \scalebox{0.95}{
>> -\begin{tabular}{cWcccccWccccWcc}
>> +\begin{tabular}{cccWcccccWccccWcc}
>> \\
>> \instbit{31} &
>> -\instbitrange{30}{20} &
>> +\instbit{30} &
>> +\instbit{29} &
>> +\instbitrange{28}{20} &
>> \instbit{19} &
>> \instbit{18} &
>> \instbit{17} &
>> @@ -66,6 +68,8 @@ register keeps track of the processor's current operating state.
>> \instbit{0} \\
>> \hline
>> \multicolumn{1}{|c|}{SD} &
>> +\multicolumn{1}{|c|}{TLBI} &
>> +\multicolumn{1}{|c|}{TLBIC} &
>> \multicolumn{1}{c|}{\wpri} &
>> \multicolumn{1}{c|}{MXR} &
>> \multicolumn{1}{c|}{SUM} &
>> @@ -82,7 +86,7 @@ register keeps track of the processor's current operating state.
>> \multicolumn{1}{c|}{\wpri}
>> \\
>> \hline
>> -1 & 11 & 1 & 1 & 1 & 2 & 2 & 4 & 1 & 1 & 1 & 1 & 3 & 1 & 1 \\
>> +1 & 1 & 1 & 10 & 1 & 1 & 1 & 2 & 2 & 4 & 1 & 1 & 1 & 1 & 3 & 1 & 1 \\
>> \end{tabular}}
>> \end{center}
>> }
>> @@ -95,10 +99,12 @@ register keeps track of the processor's current operating state.
>> {\footnotesize
>> \begin{center}
>> \setlength{\tabcolsep}{4pt}
>> -\begin{tabular}{cMFScccc}
>> +\begin{tabular}{cccMFScccc}
>> \\
>> \instbit{SXLEN-1} &
>> -\instbitrange{SXLEN-2}{34} &
>> +\instbit{SXLEN-2} &
>> +\instbit{SXLEN-3} &
>> +\instbitrange{SXLEN-4}{34} &
>> \instbitrange{33}{32} &
>> \instbitrange{31}{20} &
>> \instbit{19} &
>> @@ -107,6 +113,8 @@ register keeps track of the processor's current operating state.
>> \\
>> \hline
>> \multicolumn{1}{|c|}{SD} &
>> +\multicolumn{1}{|c|}{TLBI} &
>> +\multicolumn{1}{|c|}{TLBIC} &
>> \multicolumn{1}{c|}{\wpri} &
>> \multicolumn{1}{c|}{UXL[1:0]} &
>> \multicolumn{1}{c|}{\wpri} &
>> @@ -115,7 +123,7 @@ register keeps track of the processor's current operating state.
>> \multicolumn{1}{c|}{\wpri} &
>> \\
>> \hline
>> -1 & SXLEN-35 & 2 & 12 & 1 & 1 & 1 & \\
>> +1 & 1 & 1 & SXLEN-37 & 2 & 12 & 1 & 1 & 1 & \\
>> \end{tabular}
>> \begin{tabular}{cWWFccccWcc}
>> \\
>> @@ -152,6 +160,17 @@ register keeps track of the processor's current operating state.
>> \label{sstatusreg}
>> \end{figure*}
>>
>> +The TLBI (read-only) bit indicates that any async sfence.vma operations are
>> +still pended on the hart. The value:0 means that there is no sfence.vma
>> +operations pending and value:1 means that there are still sfence.vma operations
>> +pending on the hart.
>> +
>> +When the sstatus:TLBIC bit is written 1, it triggers the hardware to check if
>> +there are any TLB invalidate operations being pended. When all operations are
>> +finished, a TLB Invalidate finish interrupt will be triggered
>> +(see Section~\ref{sipreg}). When the sstatus:TLBIC bit is written 0, it will
>> +cause nothing. Reading sstatus:TLBIC bit will alaways return 0.
>> +
>> The SPP bit indicates the privilege level at which a hart was executing before
>> entering supervisor mode. When a trap is taken, SPP is set to 0 if the trap
>> originated from user mode, or 1 otherwise. When an SRET instruction
>> @@ -329,8 +348,10 @@ SXLEN-bit read/write register containing interrupt enable bits.
>> {\footnotesize
>> \begin{center}
>> \setlength{\tabcolsep}{4pt}
>> -\begin{tabular}{KcFcFcc}
>> -\instbitrange{SXLEN-1}{10} &
>> +\begin{tabular}{KcFcFcFcc}
>> +\instbitrange{SXLEN-1}{14} &
>> +\instbit{13} &
>> +\instbitrange{12}{10} &
>> \instbit{9} &
>> \instbitrange{8}{6} &
>> \instbit{5} &
>> @@ -339,6 +360,8 @@ SXLEN-bit read/write register containing interrupt enable bits.
>> \instbit{0} \\
>> \hline
>> \multicolumn{1}{|c|}{\wpri} &
>> +\multicolumn{1}{c|}{STLBIP} &
>> +\multicolumn{1}{|c|}{\wpri} &
>> \multicolumn{1}{c|}{SEIP} &
>> \multicolumn{1}{c|}{\wpri} &
>> \multicolumn{1}{c|}{STIP} &
>> @@ -346,7 +369,7 @@ SXLEN-bit read/write register containing interrupt enable bits.
>> \multicolumn{1}{c|}{SSIP} &
>> \multicolumn{1}{c|}{\wpri} \\
>> \hline
>> -SXLEN-10 & 1 & 3 & 1 & 3 & 1 & 1 \\
>> +SXLEN-14 & 1 & 3 & 1 & 3 & 1 & 3 & 1 & 1 \\
>> \end{tabular}
>> \end{center}
>> }
>> @@ -359,8 +382,10 @@ SXLEN-10 & 1 & 3 & 1 & 3 & 1 & 1 \\
>> {\footnotesize
>> \begin{center}
>> \setlength{\tabcolsep}{4pt}
>> -\begin{tabular}{KcFcFcc}
>> -\instbitrange{SXLEN-1}{10} &
>> +\begin{tabular}{KcFcFcFcc}
>> +\instbitrange{SXLEN-1}{14} &
>> +\instbit{13} &
>> +\instbitrange{12}{10} &
>> \instbit{9} &
>> \instbitrange{8}{6} &
>> \instbit{5} &
>> @@ -369,6 +394,8 @@ SXLEN-10 & 1 & 3 & 1 & 3 & 1 & 1 \\
>> \instbit{0} \\
>> \hline
>> \multicolumn{1}{|c|}{\wpri} &
>> +\multicolumn{1}{c|}{STLBIE} &
>> +\multicolumn{1}{|c|}{\wpri} &
>> \multicolumn{1}{c|}{SEIE} &
>> \multicolumn{1}{c|}{\wpri} &
>> \multicolumn{1}{c|}{STIE} &
>> @@ -376,7 +403,7 @@ SXLEN-10 & 1 & 3 & 1 & 3 & 1 & 1 \\
>> \multicolumn{1}{c|}{SSIE} &
>> \multicolumn{1}{c|}{\wpri} \\
>> \hline
>> -SXLEN-10 & 1 & 3 & 1 & 3 & 1 & 1 \\
>> +SXLEN-14 & 1 & 3 & 1 & 3 & 1 & 3 & 1 & 1 \\
>> \end{tabular}
>> \end{center}
>> }
>> @@ -410,6 +437,12 @@ when the SEIE bit in the {\tt sie} register is clear. The implementation
>> should provide facilities to mask, unmask, and query the cause of external
>> interrupts.
>>
>> +A supervisor-level TLB Invalidate finish interrupt is pending if the STLBIP bit
>> +in the {\tt sip} register is set. Supervisor-level TLB Invalidate finish
>> +interrupts are disabled when the STLBIE bit in the {\tt sie} register is clear.
>> +When hart tlb invalidate operations are finished, hardware will change sstatus:TLBI
>> +bit from 1 to 0 and trigger TLB Invalidate finish interrupt.
>> +
>> \begin{commentary}
>> The {\tt sip} and {\tt sie} registers are subsets of the {\tt mip} and {\tt
>> mie} registers. Reading any field, or writing any writable field, of {\tt
>> @@ -598,7 +631,9 @@ so is only guaranteed to hold supported exception codes.
>> 1 & 5 & Supervisor timer interrupt \\
>> 1 & 6--8 & {\em Reserved} \\
>> 1 & 9 & Supervisor external interrupt \\
>> - 1 & 10--15 & {\em Reserved} \\
>> + 1 & 10--11 & {\em Reserved} \\
>> + 1 & 12 & Supervisor TLBI finish interrupt \\
>> + 1 & 13--15 & {\em Reserved} \\
>> 1 & $\ge$16 & {\em Available for platform use} \\ \hline
>> 0 & 0 & Instruction address misaligned \\
>> 0 & 1 & Instruction access fault \\
>> @@ -884,7 +919,7 @@ provided.
>> \multicolumn{1}{c|}{opcode} \\
>> \hline
>> 7 & 5 & 5 & 3 & 5 & 7 \\
>> -SFENCE.VMA & asid & vaddr & PRIV & 0 & SYSTEM \\
>> +SFENCE.VMA & mode:ppn:asid & vaddr & LOCAL & 0 & SYSTEM \\
>> \end{tabular}
>> \end{center}
>>
>> @@ -899,21 +934,70 @@ from that hart to the memory-management data structures.
>> Further details on the behavior of this instruction are
>> described in Section~\ref{virt-control} and Section~\ref{pmp-vmem}.
>>
>> +SFENCE.VMA is defined as an asynchronous completion instruction, which means
>> +that the TLB operation is not guaranteed to complete when the instruction retires.
>> +Software need check sstatus:TLBI to determine all TLB operations complete.
>> +The sstatus:TLBI described in Section~\ref{sstatus}. When hardware change
>> +sstatus:TLBI bit from 1 to 0, the TLB Invalidate finish interrupt will be
>> +triggered.
>> +
>> \begin{commentary}
>> -The SFENCE.VMA is used to flush any local hardware caches related to
>> +The SFENCE.VMA is used to flush any local/remote hardware caches related to
>> address translation. It is specified as a fence rather than a TLB
>> flush to provide cleaner semantics with respect to which instructions
>> are affected by the flush operation and to support a wider variety of
>> dynamic caching structures and memory-management schemes. SFENCE.VMA
>> is also used by higher privilege levels to synchronize page table
>> -writes and the address translation hardware.
>> +writes and the address translation hardware. There is a mode bit to determine
>> +sfence.vma would broadcast on interconnect or not.
>> \end{commentary}
>>
>> -SFENCE.VMA orders only the local hart's implicit references to the
>> -memory-management data structures.
>> +\begin{figure}[h!]
>> +{\footnotesize
>> +\begin{center}
>> +\begin{tabular}{c@{}E@{}K}
>> +\instbit{31} &
>> +\instbitrange{30}{9} &
>> +\instbitrange{8}{0} \\
>> +\hline
>> +\multicolumn{1}{|c|}{{\tt MODE}} &
>> +\multicolumn{1}{|c|}{{\tt PPN (root page table)}} &
>> +\multicolumn{1}{|c|}{{\tt ASID}} \\
>> +\hline
>> +1 & 22 & 9 \\
>> +\end{tabular}
>> +\end{center}
>> +}
>> +\vspace{-0.1in}
>> +\caption{RV32 sfence.vma rs2 format.}
>> +\label{rv32satp}
>> +\end{figure}
>> +
>> +\begin{figure}[h!]
>> +{\footnotesize
>> +\begin{center}
>> +\begin{tabular}{@{}S@{}T@{}U}
>> +\instbitrange{63}{60} &
>> +\instbitrange{59}{16} &
>> +\instbitrange{15}{0} \\
>> +\hline
>> +\multicolumn{1}{|c|}{{\tt MODE}} &
>> +\multicolumn{1}{|c|}{{\tt PPN (root page table)}} &
>> +\multicolumn{1}{|c|}{{\tt ASID}} \\
>> +\hline
>> +4 & 44 & 16 \\
>> +\end{tabular}
>> +\end{center}
>> +}
>> +\vspace{-0.1in}
>> +\caption{RV64 sfence.vma rs2 format, for MODE values, only highest bit:63 is
>> +valid and others are reserved.}
>> +\label{rv64satp}
>> +\end{figure}
>>
>> \begin{commentary}
>> -Consequently, other harts must be notified separately when the
>> +The mode's highest bit could control sfence.vma behavior with 1:broadcast or 0:local.
>> +If only have mode:local, other harts must be notified separately when the
>> memory-management data structures have been modified.
>> One approach is to use 1)
>> a local data fence to ensure local writes are visible globally, then
>> @@ -928,8 +1012,17 @@ modified for a single address mapping (i.e., one page or superpage), {\em rs1}
>> can specify a virtual address within that mapping to effect a translation
>> fence for that mapping only. Furthermore, for the common case that the
>> translation data structures have only been modified for a single address-space
>> -identifier, {\em rs2} can specify the address space. The behavior of
>> -SFENCE.VMA depends on {\em rs1} and {\em rs2} as follows:
>> +identifier, {\em rs2} can specify the address space with {\tt satp} format
>> +which include asid and root page table's PPN information.
>> +
>> +\begin{commentary}
>> +We use ASID and root page table's PPN to determine address space and the format
>> +stored in rs2 is similar with {\tt satp} described in Section~\ref{sec:satp}.
>> +ASID are used by local harts and root page table's PPN of the asid are used by
>> +other different TLB systems, eg: IOMMU.
>> +\end{commentary}
>> +
>> +The behavior of SFENCE.VMA depends on {\em rs1} and {\em rs2} as follows:
>>
>> \begin{itemize}
>> \item If {\em rs1}={\tt x0} and {\em rs2}={\tt x0}, the fence orders all
>> @@ -939,23 +1032,18 @@ SFENCE.VMA depends on {\em rs1} and {\em rs2} as follows:
>> all reads and writes made to any level of the page tables, but only
>> for the address space identified by integer register {\em rs2}.
>> Accesses to {\em global} mappings (see Section~\ref{sec:translation})
>> - are not ordered.
>> + are not ordered. The mode field in rs2 is determine broadcast or local.
>> \item If {\em rs1}$\neq${\tt x0} and {\em rs2}={\tt x0}, the fence orders
>> only reads and writes made to the leaf page table entry corresponding
>> to the virtual address in {\em rs1}, for all address spaces.
>> \item If {\em rs1}$\neq${\tt x0} and {\em rs2}$\neq${\tt x0}, the fence
>> orders only reads and writes made to the leaf page table entry
>> corresponding to the virtual address in {\em rs1}, for the address
>> - space identified by integer register {\em rs2}.
>> + space identified by integer register {\em rs2}. The mode field in rs2
>> + is determine broadcast or local.
>> Accesses to global mappings are not ordered.
>> \end{itemize}
>>
>> -When {\em rs2}$\neq${\tt x0}, bits SXLEN-1:ASIDMAX of the value held in {\em
>> -rs2} are reserved for future use and should be zeroed by software and ignored
>> -by current implementations. Furthermore, if ASIDLEN~$<$~ASIDMAX, the
>> -implementation shall ignore bits ASIDMAX-1:ASIDLEN of the value held in {\em
>> -rs2}.
>> -
>> \begin{commentary}
>> Simpler implementations can ignore the virtual address in {\em rs1} and
>> the ASID value in {\em rs2} and always perform a global fence.
>> @@ -994,7 +1082,7 @@ can execute the same SFENCE.VMA instruction while a different ASID is loaded
>> into {\tt satp}, provided the next time {\tt satp} is loaded with the recycled
>> ASID, it is simultaneously loaded with the new page table.
>>
>> -\item If the implementation does not provide ASIDs, or software chooses to
>> +\item If the implementation does not provide ASIDs and PPNs, or software chooses to
>> always use ASID 0, then after every {\tt satp} write, software should execute
>> SFENCE.VMA with {\em rs1}={\tt x0}. In the common case that no global
>> translations have been modified, {\em rs2} should be set to a register other than
>> @@ -1003,13 +1091,14 @@ not flushed.
>>
>> \item If software modifies a non-leaf PTE, it should execute SFENCE.VMA with
>> {\em rs1}={\tt x0}. If any PTE along the traversal path had its G bit set,
>> -{\em rs2} must be {\tt x0}; otherwise, {\em rs2} should be set to the ASID for
>> -which the translation is being modified.
>> +{\em rs2} must be {\tt x0}; otherwise, {\em rs2} should be set to the ASID and
>> +root page table's PPN for which the translation is being modified.
>>
>> \item If software modifies a leaf PTE, it should execute SFENCE.VMA with {\em
>> rs1} set to a virtual address within the page. If any PTE along the traversal
>> path had its G bit set, {\em rs2} must be {\tt x0}; otherwise, {\em rs2}
>> -should be set to the ASID for which the translation is being modified.
>> +should be set to the ASID and root page table's PPN for which the translation
>> +is being modified.
>>
>> \item For the special cases of increasing the permissions on a leaf PTE and
>> changing an invalid PTE to a valid leaf, software may choose to execute
>> --
>> 2.7.4
>>
>>
>> -=-=-=-=-=-=-=-=-=-=-=-
>> Links: You receive all messages sent to this group.
>>
>> View/Reply Online (#810): https://lists.riscv.org/g/tech-privileged/message/810
>> Mute This Topic: https://lists.riscv.org/mt/34198986/1677273
>> Group Owner: tech-privileged+owner@xxxxxxxxxxxxxxx
>> Unsubscribe: https://lists.riscv.org/g/tech-privileged/unsub [andrew@xxxxxxxxxx]
>> -=-=-=-=-=-=-=-=-=-=-=-
>>


--
Best Regards
Guo Ren

ML: https://lore.kernel.org/linux-csky/