Re: [PATCH v2] riscv: Make mmap() with PROT_WRITE imply PROT_READ

From: Andrew Bresticker
Date: Fri Sep 09 2022 - 11:16:57 EST


On Fri, Sep 9, 2022 at 7:42 AM Coelacanthus <coelacanthushex@xxxxxxxxx> wrote:
>
> On 2022/9/9 11:01, Celeste Liu wrote:
> > On 2022/9/9 02:50, Andrew Bresticker wrote:
> >> Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is
> >> invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout
> >> PROT_READ with the justification that a write-only PTE is considered a
> >> reserved PTE permission bit pattern in the privileged spec. This check
> >> is unnecessary since RISC-V defines its protection_map such that PROT_WRITE
> >> maps to the same PTE permissions as PROT_WRITE|PROT_READ, and it is
> >> inconsistent with other architectures that don't support write-only PTEs,
> >> creating a potential software portability issue. Just remove the check
> >> altogether and let PROT_WRITE imply PROT_READ as is the case on other
> >> architectures.
> >>
> >> Note that this also allows PROT_WRITE|PROT_EXEC mappings which were
> >> disallowed prior to the aforementioned commit; PROT_READ is implied in
> >> such mappings as well.
> >>
> >> Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid")
> >> Signed-off-by: Andrew Bresticker <abrestic@xxxxxxxxxxxx>
> >> ---
> >> v1 -> v2: Update access_error() to account for write-implies-read
> >> ---
> >> arch/riscv/kernel/sys_riscv.c | 3 ---
> >> arch/riscv/mm/fault.c | 3 ++-
> >> 2 files changed, 2 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
> >> index 571556bb9261..5d3f2fbeb33c 100644
> >> --- a/arch/riscv/kernel/sys_riscv.c
> >> +++ b/arch/riscv/kernel/sys_riscv.c
> >> @@ -18,9 +18,6 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len,
> >> if (unlikely(offset & (~PAGE_MASK >> page_shift_offset)))
> >> return -EINVAL;
> >>
> >> - if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
> >> - return -EINVAL;
> >> -
> >> return ksys_mmap_pgoff(addr, len, prot, flags, fd,
> >> offset >> (PAGE_SHIFT - page_shift_offset));
> >> }
> >> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
> >> index f2fbd1400b7c..d86f7cebd4a7 100644
> >> --- a/arch/riscv/mm/fault.c
> >> +++ b/arch/riscv/mm/fault.c
> >> @@ -184,7 +184,8 @@ static inline bool access_error(unsigned long cause, struct vm_area_struct *vma)
> >> }
> >> break;
> >> case EXC_LOAD_PAGE_FAULT:
> >> - if (!(vma->vm_flags & VM_READ)) {
> >> + /* Write implies read */
> >> + if (!(vma->vm_flags & (VM_READ | VM_WRITE))) {
> >> return true;
> >> }
> >> break;
> >
> > Hi, this did solve the problem and achieved consistency between
> > architectures, but I have a question.
> >
> > Such a change specifies behavior for a state that should not exist,
> > and if, in the future, RISC-V spec specifies a different behavior
> > for that state (I mean, RVI itself has a history of not caring about
> > downstream, like Zicsr and Zifencei), it will create inconsistencies,
> > which is bad.
> >
> > If we reject the "write but not read" state, the user gets the most direct
> > response: the state is not allowed so that they do not and cannot rely
> > on the behavior of the state. This will bring better time consistency
> > to the application if the spec specifies the behavior in the future.
> > But it lost architecture consistency.
> >
> > How do you think this situation should be handled properly?
> >
> > Yours,
> > Celeste Liu
>
> Oops!
>
> I found a mistake in my previous understanding: PTE permission!=vma permission.
> So your modification makes sense, no matter how we handle the mapping of input
> permissions to PTEs, as long as we don't use the reserved permission combinations,
> the behavior is reasonable and also independent of the architecture's definition
> of PTEs.
>
> But I think this mapping relationship should be well documented. If we have
> such a mapping behavior in all architectures, then we should change this line in
> the mmap documentation
> On some hardware architectures (e.g., i386), PROT_WRITE implies PROT_READ.
> to apply all architectures. According to my read about code, all the vm_get_page_prot
> will do the protection_map mapping to have this feature.

I think leaving the PROT_WRITE-implies-PROT_READ as being specified as
architecture-dependent is reasonable, but of course portable programs
shouldn't rely on this behavior. There are CPUs out there that support
write-only mappings -- MIPS with RI/XI comes to mind and indeed
mmap(PROT_WRITE) on such CPUs results in write-only mappings.

-Andrew

>
> Yours,
> Celeste Liu