Re: [f2fs-dev] [PATCH v3] f2fs: fix race of pending_pages in decompression

From: Jaegeuk Kim
Date: Tue Dec 08 2020 - 18:56:30 EST


On 12/07, Eric Biggers wrote:
> On Tue, Dec 08, 2020 at 08:51:45AM +0900, Daeho Jeong wrote:
> > > I am trying to review this but it is very hard, as the f2fs compression code is
> > > very hard to understand.
> > >
> > > It looks like a 'struct decompress_io_ctx' represents the work to decompress a
> > > particular cluster. Since the compressed data of the cluster can be read using
> > > multiple bios, there is a reference count of how many pages are remaining to be
> > > read before all the cluster's pages have been read and decompression can start.
> > >
> > > What I don't understand is why that reference counting needs to work differently
> > > depending on whether verity is enabled or not. Shouldn't it be exactly the
> > > same?
> > >
> > > There also seems to be some confusion about the scope of STEP_VERITY. Before
> > > f2fs compression was added, it was a per-bio thing. But now in a compressed
> > > file, it's really a per-cluster thing, since all decompressed pages in a
> > > compressed cluster are verified (or not verified) at once.
> > >
> > > Wouldn't it make a lot more sense to, when a cluster needs both compression and
> > > verity, *not* set STEP_VERITY on the bios, but rather set a similar flag in the
> > > decompress_io_ctx?
> > >
> >
> > Eric,
> >
> > Decompression and verity can be executed in different thread contexts
> > in different timing, so we need separate counts for each.
> >
> > We already use STEP_VERITY for non-compression case, so I think using
> > this flag in here looks more making sense.
> >
> > Thanks,
>
> That didn't really answer my questions.
>
> I gave up trying to review this patch as the compression post-read handling is
> just way too weird and hard to understand. I wrote a patch to clean it all up
> instead, please take a look:
> https://lkml.kernel.org/r/20201208060328.2237091-1-ebiggers@xxxxxxxxxx

Eric,
I also tried to review your patch, but it's quite hard to follow quickly and
requires stress tests for a while. Given upcoming merge window and urgency of
the bug, let me apply Daeho's fix first. By any chance, may I ask revisiting
your clean-up on top of the fix in the next cycle?

Thanks,

>
> - Eric