Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged

From: Jiri Kosina
Date: Thu Jan 24 2019 - 09:26:00 EST


On Thu, 24 Jan 2019, Dominique Martinet wrote:

> Jiri, you've offered resubmitting the last two patches properly, can you
> incorporate this change or should I just send this directly? (I'd take
> most of your commit message and add your name somewhere)

I've been running some basic smoke testing with the kernel from

https://git.kernel.org/pub/scm/linux/kernel/git/jikos/jikos.git/log/?h=pagecache-sidechannel-v2

(attaching the respective two patches to apply on top of latest Linus'
tree to this mail as well), and everything looks good so far.

Thanks,

--
Jiri Kosina
SUSE Labs
From 9810565f1d5f966a84900cdcb85e33aa7571afbe Mon Sep 17 00:00:00 2001
From: Jiri Kosina <jkosina@xxxxxxx>
Date: Wed, 16 Jan 2019 20:53:17 +0100
Subject: [PATCH 1/2] mm/mincore: make mincore() more conservative

The semantics of what mincore() considers to be resident is not completely
clear, but Linux has always (since 2.3.52, which is when mincore() was
initially done) treated it as "page is available in page cache".

That's potentially a problem, as that [in]directly exposes meta-information
about pagecache / memory mapping state even about memory not strictly belonging
to the process executing the syscall, opening possibilities for sidechannel
attacks.

Change the semantics of mincore() so that it only reveals pagecache information
for non-anonymous mappings that belog to files that the calling process could
(if it tried to) successfully open for writing.

Originally-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Originally-by: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>
Signed-off-by: Jiri Kosina <jkosina@xxxxxxx>
---
mm/mincore.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/mm/mincore.c b/mm/mincore.c
index 218099b5ed31..747a4907a3ac 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -169,6 +169,14 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
return 0;
}

+static inline bool can_do_mincore(struct vm_area_struct *vma)
+{
+ return vma_is_anonymous(vma) ||
+ (vma->vm_file &&
+ (inode_owner_or_capable(file_inode(vma->vm_file))
+ || inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0));
+}
+
/*
* Do a chunk of "sys_mincore()". We've already checked
* all the arguments, we hold the mmap semaphore: we should
@@ -189,8 +197,13 @@ static long do_mincore(unsigned long addr, unsigned long pages, unsigned char *v
vma = find_vma(current->mm, addr);
if (!vma || addr < vma->vm_start)
return -ENOMEM;
- mincore_walk.mm = vma->vm_mm;
end = min(vma->vm_end, addr + (pages << PAGE_SHIFT));
+ if (!can_do_mincore(vma)) {
+ unsigned long pages = (end - addr) >> PAGE_SHIFT;
+ memset(vec, 1, pages);
+ return pages;
+ }
+ mincore_walk.mm = vma->vm_mm;
err = walk_page_range(addr, end, &mincore_walk);
if (err < 0)
return err;
--
2.12.3

From f287185fc5e0ffbbb380f2d68dd19290715829a8 Mon Sep 17 00:00:00 2001
From: Jiri Kosina <jkosina@xxxxxxx>
Date: Wed, 16 Jan 2019 21:06:58 +0100
Subject: [PATCH 2/2] mm/filemap: initiate readahead even if IOCB_NOWAIT is set
for the I/O

preadv2(RWF_NOWAIT) can be used to open a side-channel to pagecache contents, as
it reveals metadata about residency of pages in pagecache.

If preadv2(RWF_NOWAIT) returns immediately, it provides a clear "page not
resident" information, and vice versa.

Close that sidechannel by always initiating readahead on the cache if we
encounter a cache miss for preadv2(RWF_NOWAIT); with that in place, probing
the pagecache residency itself will actually populate the cache, making the
sidechannel useless.

Originally-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Jiri Kosina <jkosina@xxxxxxx>
---
mm/filemap.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 9f5e323e883e..7bcdd36e629d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2075,8 +2075,6 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,

page = find_get_page(mapping, index);
if (!page) {
- if (iocb->ki_flags & IOCB_NOWAIT)
- goto would_block;
page_cache_sync_readahead(mapping,
ra, filp,
index, last_index - index);
--
2.12.3