Re: [PATCH] mm: readahead: do not cap readahead() and MADV_WILLNEED
From: Linus Torvalds
Date: Tue Feb 23 2016 - 21:35:08 EST
On Tue, Feb 23, 2016 at 5:38 PM, Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> Since both readahead() and MADV_WILLNEED take an explicit length
> parameter, it seems weird to truncate that request quietly. Just do
> what the user asked for and leave the limiting to the heuristics.
Why the hell do people continue to try to push these kinds of changes?
Just read the code that you changed: the size_t is a trivially
user-supplied number that any user can set to any value. And you just
removed all sanity checking for it, so now people can basically do
unlimited IO that is not even killable.
Seriously.
[ It's slightly saved by the fact that you have to find a file that is
large and readable, which will happily limit things a lot on practice,
but it's still something we should worry about.
It is also at least partially mitigated by the fact that read-ahead
allocations use __GFP_NORETRY and hopefully thus don't do the worst
kind of damage to the memory management, but I'd hate to have to rely
on that ]
At least with regular read() system calls, we try to use killable IO
(well - some filesystems might not do it, but the core pagecache
functions support it), and we limit the maximum IO size even if that
limit happens to be pretty high.
You just completely removed that limiting for the readahead code. So
now you can pass in any arbitrary 64-bit size.
Why do you think that "Just do what the user asked for" is obviously
the right thing?
What if I as a user asked to overwrite /etc/passwd with my own data?
Would that be right?
And if that isn't right, then why do you assume it is right that you
can do infinite readahead?
Now, it is entirely possible that we should raise the limit (note that
"raise" is different from "remove"). But that requires
(a) thought
(b) data
and this patch had neither.
If we raise the limit, we need to do so intelligently. It's almost
certainly going to involve looking at how much free memory we have,
because part of "readahead" is very much also "don't disturb other
users and IO too much".
The current choice for the limit is "small enough that we don't need
to think too much about it". If we raise it to hundreds of megs, we
very definitely will want to think about it much more. We might also
want to make sure that the operation can be properly aborted,
something we haven't needed to worry about for the small limit we have
now.
Linus