Re: [PATCH] arm64: mmu: no write cache for O_SYNC flag

From: Wang, Li
Date: Fri Mar 27 2020 - 12:50:20 EST



å 2020/3/27 22:29, Mark Rutland åé:
On Thu, Mar 26, 2020 at 09:36:25AM -0700, Li Wang wrote:
reproduce steps:
1.
disable CONFIG_STRICT_DEVMEM in linux kernel
2.
Process A gets a Physical Address of global variable by
"/proc/self/pagemap".
3.
Process B writes a value to the same Physical Address by mmap():
fd=open("/dev/mem",O_SYNC);
Virtual Address=mmap(fd);
Is this just to demonstrate the behaviour, or is this meant to be
indicative of a real use-case? I'm struggling to see the latter.

problem symptom:
after Process B write a value to the Physical Address,
Process A of the value of global variable does not change.
They both W/R the same Physical Address.
If Process A is not using the same attributes as process B, there is no
guarantee of coherency. How did process A map this memory?


about 2 Process:

Process A:

the memory is not declared by map function, it is just a global variable.

only by /proc/self/pagemap to get its Physical Address.

I attached the codes(wrl-cache-coh-test.c)

Process B:

it is command of "devmem" in busybox, it writes a value to Physical Address.

it uses open(O_SYNC) and mmap.


technical reason:
Process B writing the Physical Address is by the Virtual Address,
and the Virtual Address comes from "/dev/mem" and mmap().
In arm64 arch, the Virtual Address has write cache.
So, maybe the value is not written into Physical Address.
I don't think that's true. I think what's happening here is:

* Process A has a Normal WBWA Cacheable mapping.
* Process B as a Normal Non-cacheable mapping.
* Process B's write does not snoop any caches, and goes straight to
memory.
* Process A reads a value from cache, which does not include process B's
write.

That's a natural result of using mismatched attributes, and is
consistent with the O_SYNC flag meaning that the write "is transferred
to the underlying hardware".


if you agree that O_SYNC flag means "is transferred to the underlying hardware",

the arm64 does not do that:

when use O_SYNC flag under arm64 arch, it adds write cache feature,

so, it is no guarantee "transferred to hardware".

=====

arch/arm64/mm/mmu.c
phys_mem_access_prot(){
 else if (file->f_flags & O_SYNC)
ÂÂÂ return pgprot_writecombine(vma_prot);}

=====


by my test without the write cache, even if Process A is not using the same attributes as process B,

it has guarantee of coherency:

when Process B change value, Process B can see the change, too.


Thanks,

LiWang.


my email server seems to reject to send to linux-arm-kernel@xxxxxxxxxxxxxxxxxxx,

the info is in another email not showing in linux-arm-kernel@xxxxxxxxxxxxxxxxxxx:


1.

no pass O_SYNC in user space is not a good idea.

in fact, the codes come from 'devmem' command of busybox:

=====

busybox-1.24.1/miscutils$ vim devmem.c

fd = xopen("/dev/mem", O_SYNC);

=====

the codes are used for a long time.


2.

according to info of open man about "O_SYNC":

=====

http://man7.org/linux/man-pages/man2/open.2.html

the output data and associated file metadata have been transferred to the underlying hardware

=====

I think "O_SYNC" means no cache.


3.

/dev/mem of driver offers 2 ways to operate physical memory.

one is mmap, the other is read/write.

when use read/write way, it operates uncached memory:

=====

kernel-source/drivers/char/mem.c

write_mem(){

/* it must also be accessed uncached */

}

=====


4.

arm64 arch is different with other arch about phys_mem_access_prot().

you can see no any other arch add cache flag in the function.

only arm and arm64 add write cache for O_SYNC flag.


x86/mm/pat.c

phys_mem_access_prot(){

return vma_prot;

}


powerpc/mm/mem.c

phys_mem_access_prot(){
ÂÂÂÂÂÂÂ if (ppc_md.phys_mem_access_prot)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return ppc_md.phys_mem_access_prot(file, pfn, size, vma_prot);
ÂÂÂÂÂÂÂ if (!page_is_ram(pfn))
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ vma_prot = pgprot_noncached(vma_prot);
ÂÂÂÂÂÂÂ return vma_prot;
}


drivers/char/mem.c

phys_mem_access_prot()
{
#ifdef pgprot_noncached
ÂÂÂÂÂÂÂ phys_addr_t offset = pfn << PAGE_SHIFT;

ÂÂÂÂÂÂÂ if (uncached_access(file, offset))
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return pgprot_noncached(vma_prot);
#endif
ÂÂ return vma_prot;
}


fix reason:
giving write cache flag in arm64 is in phys_mem_access_prot():
=====
arch/arm64/mm/mmu.c
phys_mem_access_prot()
{
if (!pfn_valid(pfn))
return pgprot_noncached(vma_prot);
else if (file->f_flags & O_SYNC)
return pgprot_writecombine(vma_prot);
return vma_prot;
}
====
the other arch and the share function drivers/char/mem.c of phys_mem_access_prot()
does not add write cache flag.
So, removing the flag to fix the issue
This will change behaviour that other software may be relying upon, and
as above I do not believe this actually solves the problem you describe.

Thanks,
Mark.

Signed-off-by: Li Wang <li.wang@xxxxxxxxxxxxx>
Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
Cc: Will Deacon <will@xxxxxxxxxx>
Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
---
arch/arm64/mm/mmu.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 128f70852bf3..d7083965ca17 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -81,8 +81,6 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
{
if (!pfn_valid(pfn))
return pgprot_noncached(vma_prot);
- else if (file->f_flags & O_SYNC)
- return pgprot_writecombine(vma_prot);
return vma_prot;
}
EXPORT_SYMBOL(phys_mem_access_prot);
--
2.24.1

#include <stdio.h>
#include <stdint.h>
#include <unistd.h>

static uintptr_t virt_to_phys_address(uintptr_t vaddr)
{
FILE *pagemap;
uintptr_t paddr = 0;
off_t offset = (vaddr / sysconf(_SC_PAGESIZE)) * sizeof(uint64_t);
uint64_t e;

/* https://www.kernel.org/doc/Documentation/vm/pagemap.txt */
if ((pagemap = fopen("/proc/self/pagemap", "r"))) {
if (lseek(fileno(pagemap), offset, SEEK_SET) == offset) {
if (fread(&e, sizeof(uint64_t), 1, pagemap)) {
if (e & (1ULL << 63)) { /* page present ? */
/* pfn mask */
paddr = e & ((1ULL << 54) - 1);
paddr = paddr * sysconf(_SC_PAGESIZE);
/* add offset within page */
paddr |= (vaddr & (sysconf(_SC_PAGESIZE) - 1));
}
else
printf("%s: No page present\n", __func__);
}
else
printf("%s: fread failed\n", __func__);
}
else
printf("%s: lseek did not find\n", __func__);

fclose(pagemap);
}
else
printf("%s: Pagemap open failed\n", __func__);

return paddr;
}

volatile uint32_t var=0;

int main()
{
void* phys_addr = virt_to_phys_address(&var);
printf("%p %p\n", &var, phys_addr);
while( var==0 )
sleep(1);
printf("done\n");
fflush(stdout);
return 0;
}