Re: [PATCH] arm64: mmu: no write cache for O_SYNC flag
From: Wang, Li
Date: Fri Mar 27 2020 - 12:50:20 EST
å 2020/3/27 22:29, Mark Rutland åé:
On Thu, Mar 26, 2020 at 09:36:25AM -0700, Li Wang wrote:
reproduce steps:
1.
disable CONFIG_STRICT_DEVMEM in linux kernel
2.
Process A gets a Physical Address of global variable by
"/proc/self/pagemap".
3.
Process B writes a value to the same Physical Address by mmap():
fd=open("/dev/mem",O_SYNC);
Virtual Address=mmap(fd);
Is this just to demonstrate the behaviour, or is this meant to be
indicative of a real use-case? I'm struggling to see the latter.
problem symptom:
after Process B write a value to the Physical Address,
Process A of the value of global variable does not change.
They both W/R the same Physical Address.
If Process A is not using the same attributes as process B, there is no
guarantee of coherency. How did process A map this memory?
about 2 Process:
Process A:
the memory is not declared by map function, it is just a global variable.
only by /proc/self/pagemap to get its Physical Address.
I attached the codes(wrl-cache-coh-test.c)
Process B:
it is command of "devmem" in busybox, it writes a value to Physical Address.
it uses open(O_SYNC) and mmap.
technical reason:
Process B writing the Physical Address is by the Virtual Address,
and the Virtual Address comes from "/dev/mem" and mmap().
In arm64 arch, the Virtual Address has write cache.
So, maybe the value is not written into Physical Address.
I don't think that's true. I think what's happening here is:
* Process A has a Normal WBWA Cacheable mapping.
* Process B as a Normal Non-cacheable mapping.
* Process B's write does not snoop any caches, and goes straight to
memory.
* Process A reads a value from cache, which does not include process B's
write.
That's a natural result of using mismatched attributes, and is
consistent with the O_SYNC flag meaning that the write "is transferred
to the underlying hardware".
if you agree that O_SYNC flag means "is transferred to the underlying
hardware",
the arm64 does not do that:
when use O_SYNC flag under arm64 arch, it adds write cache feature,
so, it is no guarantee "transferred to hardware".
=====
arch/arm64/mm/mmu.c
phys_mem_access_prot(){
 else if (file->f_flags & O_SYNC)
ÂÂÂ return pgprot_writecombine(vma_prot);}
=====
by my test without the write cache, even if Process A is not using the
same attributes as process B,
it has guarantee of coherency:
when Process B change value, Process B can see the change, too.
Thanks,
LiWang.
my email server seems to reject to send to
linux-arm-kernel@xxxxxxxxxxxxxxxxxxx,
the info is in another email not showing in
linux-arm-kernel@xxxxxxxxxxxxxxxxxxx:
1.
no pass O_SYNC in user space is not a good idea.
in fact, the codes come from 'devmem' command of busybox:
=====
busybox-1.24.1/miscutils$ vim devmem.c
fd = xopen("/dev/mem", O_SYNC);
=====
the codes are used for a long time.
2.
according to info of open man about "O_SYNC":
=====
http://man7.org/linux/man-pages/man2/open.2.html
the output data and associated file metadata have been transferred to
the underlying hardware
=====
I think "O_SYNC" means no cache.
3.
/dev/mem of driver offers 2 ways to operate physical memory.
one is mmap, the other is read/write.
when use read/write way, it operates uncached memory:
=====
kernel-source/drivers/char/mem.c
write_mem(){
/* it must also be accessed uncached */
}
=====
4.
arm64 arch is different with other arch about phys_mem_access_prot().
you can see no any other arch add cache flag in the function.
only arm and arm64 add write cache for O_SYNC flag.
x86/mm/pat.c
phys_mem_access_prot(){
return vma_prot;
}
powerpc/mm/mem.c
phys_mem_access_prot(){
ÂÂÂÂÂÂÂ if (ppc_md.phys_mem_access_prot)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return ppc_md.phys_mem_access_prot(file, pfn, size,
vma_prot);
ÂÂÂÂÂÂÂ if (!page_is_ram(pfn))
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ vma_prot = pgprot_noncached(vma_prot);
ÂÂÂÂÂÂÂ return vma_prot;
}
drivers/char/mem.c
phys_mem_access_prot()
{
#ifdef pgprot_noncached
ÂÂÂÂÂÂÂ phys_addr_t offset = pfn << PAGE_SHIFT;
ÂÂÂÂÂÂÂ if (uncached_access(file, offset))
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return pgprot_noncached(vma_prot);
#endif
ÂÂ return vma_prot;
}
fix reason:
giving write cache flag in arm64 is in phys_mem_access_prot():
=====
arch/arm64/mm/mmu.c
phys_mem_access_prot()
{
if (!pfn_valid(pfn))
return pgprot_noncached(vma_prot);
else if (file->f_flags & O_SYNC)
return pgprot_writecombine(vma_prot);
return vma_prot;
}
====
the other arch and the share function drivers/char/mem.c of phys_mem_access_prot()
does not add write cache flag.
So, removing the flag to fix the issue
This will change behaviour that other software may be relying upon, and
as above I do not believe this actually solves the problem you describe.
Thanks,
Mark.
Signed-off-by: Li Wang <li.wang@xxxxxxxxxxxxx>
Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
Cc: Will Deacon <will@xxxxxxxxxx>
Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
---
arch/arm64/mm/mmu.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 128f70852bf3..d7083965ca17 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -81,8 +81,6 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
{
if (!pfn_valid(pfn))
return pgprot_noncached(vma_prot);
- else if (file->f_flags & O_SYNC)
- return pgprot_writecombine(vma_prot);
return vma_prot;
}
EXPORT_SYMBOL(phys_mem_access_prot);
--
2.24.1
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
static uintptr_t virt_to_phys_address(uintptr_t vaddr)
{
FILE *pagemap;
uintptr_t paddr = 0;
off_t offset = (vaddr / sysconf(_SC_PAGESIZE)) * sizeof(uint64_t);
uint64_t e;
/* https://www.kernel.org/doc/Documentation/vm/pagemap.txt */
if ((pagemap = fopen("/proc/self/pagemap", "r"))) {
if (lseek(fileno(pagemap), offset, SEEK_SET) == offset) {
if (fread(&e, sizeof(uint64_t), 1, pagemap)) {
if (e & (1ULL << 63)) { /* page present ? */
/* pfn mask */
paddr = e & ((1ULL << 54) - 1);
paddr = paddr * sysconf(_SC_PAGESIZE);
/* add offset within page */
paddr |= (vaddr & (sysconf(_SC_PAGESIZE) - 1));
}
else
printf("%s: No page present\n", __func__);
}
else
printf("%s: fread failed\n", __func__);
}
else
printf("%s: lseek did not find\n", __func__);
fclose(pagemap);
}
else
printf("%s: Pagemap open failed\n", __func__);
return paddr;
}
volatile uint32_t var=0;
int main()
{
void* phys_addr = virt_to_phys_address(&var);
printf("%p %p\n", &var, phys_addr);
while( var==0 )
sleep(1);
printf("done\n");
fflush(stdout);
return 0;
}