Re: [PATCH 5/7] f2fs: enhance multithread dio write performance

From: Jaegeuk Kim
Date: Thu Sep 17 2015 - 13:48:49 EST

Next message: Drew DeVault: "Re: Failover root devices"
Previous message: Oleg Nesterov: "Re: [PATCH] kernel: fix data race in put_pid"
In reply to: Chao Yu: "RE: [PATCH 5/7] f2fs: enhance multithread dio write performance"
Next in thread: Chao Yu: "RE: [PATCH 5/7] f2fs: enhance multithread dio write performance"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Chao,

On Thu, Sep 17, 2015 at 08:52:10PM +0800, Chao Yu wrote:
> Hi Jaegeuk,
>
> > -----Original Message-----
> > From: Jaegeuk Kim [mailto:jaegeuk@xxxxxxxxxx]
> > Sent: Thursday, September 17, 2015 2:13 AM
> > To: Chao Yu
> > Cc: linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> > Subject: Re: [PATCH 5/7] f2fs: enhance multithread dio write performance
> >
> > Hi Chao,
> >
> > On Wed, Sep 16, 2015 at 06:15:55PM +0800, Chao Yu wrote:
> > > Hi Jaegeuk,
> > >
> > > > -----Original Message-----
> > > > From: Jaegeuk Kim [mailto:jaegeuk@xxxxxxxxxx]
> > > > Sent: Wednesday, September 16, 2015 5:21 AM
> > > > To: Chao Yu
> > > > Cc: linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> > > > Subject: Re: [PATCH 5/7] f2fs: enhance multithread dio write performance
> > > >
> > > > Hi Chao,
> > > >
> > > > On Fri, Sep 11, 2015 at 02:41:53PM +0800, Chao Yu wrote:
> > > > > When dio writes perform concurrently, our performace will be low because of
> > > > > Thread A's allocation of multi continuous blocks will be break by Thread B,
> > > > > there are two cases as below:
> > > > > - In Thread B, we may change current segment to a new segment for LFS
> > > > > allocation if we dio write in the beginning of the file.
> > > > > - In Thread B, we may allocate blocks in the middle of Thread A's
> > > > > allocation, which make blocks which allocated in Thread A being
> > > > > discontinuous.
> > > > >
> > > > > This patch adds writepages mutex lock to make block allocation in dio write
> > > > > atomic to avoid above issues.
> > > > >
> > > > > Test environment:
> > > > > ubuntu os with linux kernel 4.2+, intel i7-3770, 16g memory,
> > > > > 32g kingston sd card.
> > > > >
> > > > > fio --name seqw --ioengine=sync --invalidate=1 --rw=write --directory=/mnt/f2fs
> > > > --filesize=256m --size=16m --bs=2m --direct=1
> > > > > --numjobs=10
> > > > >
> > > > > before:
> > > > > WRITE: io=163840KB, aggrb=3145KB/s, minb=314KB/s, maxb=411KB/s, mint=39836msec,
> > > > maxt=52083msec
> > > > >
> > > > > patched:
> > > > > WRITE: io=163840KB, aggrb=10033KB/s, minb=1003KB/s, maxb=1124KB/s, mint=14565msec,
> > > > maxt=16329msec
> > > > >
> > > > > Signed-off-by: Chao Yu <chao2.yu@xxxxxxxxxxx>
> > > > > ---
> > > > > fs/f2fs/data.c | 13 ++++++++++---
> > > > > 1 file changed, 10 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > > > > index a737ca5..a0a5849 100644
> > > > > --- a/fs/f2fs/data.c
> > > > > +++ b/fs/f2fs/data.c
> > > > > @@ -1536,7 +1536,9 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter
> > *iter,
> > > > > struct file *file = iocb->ki_filp;
> > > > > struct address_space *mapping = file->f_mapping;
> > > > > struct inode *inode = mapping->host;
> > > > > + struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
> > > > > size_t count = iov_iter_count(iter);
> > > > > + int rw = iov_iter_rw(iter);
> > > > > int err;
> > > > >
> > > > > /* we don't need to use inline_data strictly */
> > > > > @@ -1555,12 +1557,17 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter
> > > > *iter,
> > > > >
> > > > > trace_f2fs_direct_IO_enter(inode, offset, count, iov_iter_rw(iter));
> > > > >
> > > > > - if (iov_iter_rw(iter) == WRITE)
> > > > > + if (rw == WRITE) {
> > > > > + mutex_lock(&sbi->writepages);
> > > >
> > > > Why do we have to share sbi->writepages?
> > >
> > > The root cause of this issue is that: in f2fs, we have no suitable
> > > dispatcher which can do the following things as an atomic operation:
> > > a) allocate position(s) in flash device for current block(s);
> > > b) submit user data in allocated position(s) in block layer.
> > >
> > > Without the dispatcher, we will suffer performance issue in following
> > > scenario:
> > > Thread A Thread B Thread C
> > > allocate pos+1
> > > allocate pos+2
> > > allocate pos+3
> > > submit pos+1
> > > submit pos+3
> > > submit pos+2
> > >
> > > Our final submitting series will: pos+1, pos+3, pos+2, this makes f2fs
> > > running into non-LFS mode, therefore resulting in bad performance.
> > >
> > > writepages mutex lock supply us with a good solution for above issue.
> > > It not only make the allocating and submitting pair executing atomically,
> > > but also reduce the fragmentation for one file since we submit blocks
> > > belong to single inode as continuous as possible.
> > >
> > > So here I choose to use writepages mutex lock to fix the performance
> > > issue caused by both dio write vs dio write and dio write vs buffered
> > > write.
> >
> > Understood, but the concern was the multi-thread performance as you mentioned.
> > If one thread throws a big dio request, anybody cannot write at all?
>
> Buffered write will not be stopped, but actually my way completely stops the
> concurrency of multithreads which are doing dio writes, for aspect of improving
> concurrency, moving mutex_unlock below __allocate_data_blocks is a good solution
> so far.
>
> > How about adding some limits likewise f2fs_write_data_pages whieh is for example
> > nr_pages_to_write?
>
> Could you share more details about your idea?
>
> As Yunlei reported, there is performance regression issue, so how about
> holding this patch and let me do some investigation?

It seems there is a mutex overhead when handling a bunch of 4KB small dios.
Hmm, I meant, how about serializing pretty long-sized dios only?

>
> Thanks,
>
> >
> > Thanks,
> >
> > >
> > > If I'm missing something, please correct me.
> > >
> > > >
> > > > > __allocate_data_blocks(inode, offset, count);
> > > >
> > > > If the problem lies on the misaligned blocks, how about calling mutex_unlock
> > > > here?
> > >
> > > When changing to unlock here, I got regression when testing with following command:
> > > fio --name seqw --ioengine=sync --invalidate=1 --rw=write --directory=/mnt/f2fs
> > --filesize=256m --size=4m --bs=64k --direct=1
> > > --numjobs=20
> > >
> > > unlock here:
> > > WRITE: io=81920KB, aggrb=5802KB/s, minb=290KB/s, maxb=292KB/s, mint=14010msec,
> > maxt=14119msec
> > > unlock after dio finished:
> > > WRITE: io=81920KB, aggrb=6088KB/s, minb=304KB/s, maxb=1081KB/s, mint=3786msec,
> > maxt=13454msec
> > >
> > > So how about keep it in original place in this patch?
> > >
> > > Thanks,
> > > >
> > > > Thanks,
> > > >
> > > > > + }
> > > > >
> > > > > err = blockdev_direct_IO(iocb, inode, iter, offset, get_data_block_dio);
> > > > > - if (err < 0 && iov_iter_rw(iter) == WRITE)
> > > > > - f2fs_write_failed(mapping, offset + count);
> > > > > + if (rw == WRITE) {
> > > > > + mutex_unlock(&sbi->writepages);
> > > > > + if (err)
> > > > > + f2fs_write_failed(mapping, offset + count);
> > > > > + }
> > > > >
> > > > > trace_f2fs_direct_IO_exit(inode, offset, count, iov_iter_rw(iter), err);
> > > > >
> > > > > --
> > > > > 2.4.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Drew DeVault: "Re: Failover root devices"
Previous message: Oleg Nesterov: "Re: [PATCH] kernel: fix data race in put_pid"
In reply to: Chao Yu: "RE: [PATCH 5/7] f2fs: enhance multithread dio write performance"
Next in thread: Chao Yu: "RE: [PATCH 5/7] f2fs: enhance multithread dio write performance"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]