Re: [PATCH -mm -v4 3/5] mm, swap: VMA based swap readahead

From: Minchan Kim
Date: Thu Sep 14 2017 - 09:15:07 EST


On Thu, Sep 14, 2017 at 08:01:30PM +0800, Huang, Ying wrote:
> Minchan Kim <minchan@xxxxxxxxxx> writes:
>
> > On Wed, Sep 13, 2017 at 02:02:29PM -0700, Andrew Morton wrote:
> >> On Wed, 13 Sep 2017 10:40:19 +0900 Minchan Kim <minchan@xxxxxxxxxx> wrote:
> >>
> >> > Every zram users like low-end android device has used 0 page-cluster
> >> > to disable swap readahead because it has no seek cost and works as
> >> > synchronous IO operation so if we do readahead multiple pages,
> >> > swap falut latency would be (4K * readahead window size). IOW,
> >> > readahead is meaningful only if it doesn't bother faulted page's
> >> > latency.
> >> >
> >> > However, this patch introduces additional knob /sys/kernel/mm/swap/
> >> > vma_ra_max_order as well as page-cluster. It means existing users
> >> > has used disabled swap readahead doesn't work until they should be
> >> > aware of new knob and modification of their script/code to disable
> >> > vma_ra_max_order as well as page-cluster.
> >> >
> >> > I say it's a *regression* and wanted to fix it but Huang's opinion
> >> > is that it's not a functional regression so userspace should be fixed
> >> > by themselves.
> >> > Please look into detail of discussion in
> >> > http://lkml.kernel.org/r/%3C1505183833-4739-4-git-send-email-minchan@xxxxxxxxxx%3E
> >>
> >> hm, tricky problem. I do agree that linking the physical and virtual
> >> readahead schemes in the proposed fashion is unfortunate. I also agree
> >> that breaking existing setups (a bit) is also unfortunate.
> >>
> >> Would it help if, when page-cluster is written to zero, we do
> >>
> >> printk_once("physical readahead disabled, virtual readahead still
> >> enabled. Disable virtual readhead via
> >> /sys/kernel/mm/swap/vma_ra_max_order").
> >>
> >> Or something like that. It's pretty lame, but it should help alert the
> >> zram-readahead-disabling people to the issue?
> >
> > It was my last resort. If we cannot find other ways after all, yes, it would
> > be a minimum we should do. But it still breaks users don't/can't read/modify
> > alert and program.
> >
> > How about this?
> >
> > Can't we make vma-based readahead config option?
> > With that, users who no interest on readahead don't enable vma-based
> > readahead. In this case, page-cluster works as expected "disable readahead
> > completely" so it doesn't break anything.
>
> Now. Users can choose between VMA based readahead and original
> readahead via a knob as follow at runtime,
>
> /sys/kernel/mm/swap/vma_ra_enabled

It's not a config option and is enabled by default. IOW, it's under the radar
so current users cannot notice it. That's why we want to emit big fat warnning.
when old user set 0 to page-cluster. However, as Andrew said, it's lame.

If we make it config option, product maker/kernel upgrade user can have
a chance to notice and read description so they could be aware of two weird
knobs and help to solve the problem in advance without printk_once warn.
If user has no interest about swap-readahead or skip the new config option
by mistake, it works physcial readahead which means no regression.

>
> Best Regards,
> Huang, Ying
>
>
> > People who want to use upcoming vma-based readahead can enable the feature
> > and we can say such unfortunate things in config/document description
> > somewhere so upcoming users will be aware of that unforunate two knobs.