Re: [PATCH 1/2] zram: remove BD_CAP_SYNCHRONOUS_IO with writeback feature

From: Sergey Senozhatsky
Date: Fri Aug 03 2018 - 00:13:10 EST


Hi Minchan,

On (08/03/18 12:00), Minchan Kim wrote:
> > "Device is so fast that asynchronous IO would be inefficient."
> >
> > Which is not the reason why BDI_CAP_SYNCHRONOUS_IO is used by ZRAM.
> > Probably, the comment needs to be updated as well.
>
> I couldn't catch your point. Could you clarify a little bit more?
> What do you want to correct for the comment?
>
> > Both SWP_SYNCHRONOUS_IO and BDI_CAP_SYNCHRONOUS_IO tend to pivot
> > "efficiency" [looking at the comments], but in ZRAM's case the whole
> > reason to use SYNC IO is a race condition and user-after-free that
> > follows.
>
> Actually, it's not whole reason. As I wrote down, without it, swap_readpage
> waits the IO completion for a long time by blk_poll so it causes system
> sluggish problem when device is slow(e.g., zram with backing device).

Sure, this is problem #1. But slow swap device probably doesn't do any
irreversible harm to the system. Unlike use-after-free, which does. Thus
use-after-free is a problem #0. BDI_CAP_SYNCHRONOUS_IO comment doesn't
mention problem #0; it talks about problem #1 only. So, nothing serious,
just wanted to point that out.

So we probably can make ZRAM always ASYNC when WB is enabled.


Or... maybe we can make swap out to be SYNC and perform WB in background.
In __zram_bvec_write() we can always write compressed object to zmalloc,
even the huge ones.
Things to note:
a) even when WB is enabled we still allocate huge classes
b) even when WB is enabled we still may use those huge classes (consider
a case when backing devices is full)

So huge classes are still there and we still use them. So let's use
them?

For a huge object, after we stored it into zsmalloc, we can schedule a WB
work, which would:
a) write that particular object (page) to the backing device
b) mark entry as WB entry
c) remove object from zsmalloc, unlock necessary locks

So swap in should either see object in zsmalloc or on backing device.
How does this sound?

And reading from a backing device can always be SYNC. Can it?
Am I missing something?

-ss