Re: [PATCH -next v2 7/7] md/raid1-10: limit the number of plugged bio

From: Xiao Ni
Date: Mon May 29 2023 - 22:26:02 EST


On Tue, May 30, 2023 at 9:20 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> 在 2023/05/30 8:58, Xiao Ni 写道:
> > On Mon, May 29, 2023 at 4:50 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/05/29 15:57, Xiao Ni 写道:
> >>> On Mon, May 29, 2023 at 11:18 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> 在 2023/05/29 11:10, Xiao Ni 写道:
> >>>>> On Mon, May 29, 2023 at 10:20 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> 在 2023/05/29 10:08, Xiao Ni 写道:
> >>>>>>> Hi Kuai
> >>>>>>>
> >>>>>>> There is a limitation of the memory in your test. But for most
> >>>>>>> situations, customers should not set this. Can this change introduce a
> >>>>>>> performance regression against other situations?
> >>>>>>
> >>>>>> Noted that this limitation is just to triggered writeback as soon as
> >>>>>> possible in the test, and it's 100% sure real situations can trigger
> >>>>>> dirty pages write back asynchronously and continue to produce new dirty
> >>>>>> pages.
> >>>>>
> >>>>> Hi
> >>>>>
> >>>>> I'm confused here. If we want to trigger write back quickly, it needs
> >>>>> to set these two values with a smaller number, rather than 0 and 60.
> >>>>> Right?
> >>>>
> >>>> 60 is not required, I'll remove this setting.
> >>>>
> >>>> 0 just means write back if there are any dirty pages.
> >>>
> >>> Hi Kuai
> >>>
> >>> Does 0 mean disabling write back? I tried to find the doc that
> >>> describes the meaning when setting dirty_background_ratio to 0, but I
> >>> didn't find it.
> >>> In https://www.kernel.org/doc/html/next/admin-guide/sysctl/vm.html it
> >>> doesn't describe this. But it says something like this
> >>>
> >>> Note:
> >>> dirty_background_bytes is the counterpart of dirty_background_ratio. Only
> >>> one of them may be specified at a time. When one sysctl is written it is
> >>> immediately taken into account to evaluate the dirty memory limits and the
> >>> other appears as 0 when read.
> >>>
> >>> Maybe you can specify dirty_background_ratio to 1 if you want to
> >>> trigger write back ASAP.
> >>
> >> The purpose here is to trigger write back ASAP, I'm not an expert here,
> >> but based on test result, 0 obviously doesn't mean disable write back.
> >>
> >> Set dirty_background_bytes to a value, dirty_background_ratio will be
> >> set to 0 together, which means dirty_background_ratio is disabled.
> >> However, change dirty_background_ratio from default value to 0, will end
> >> up both dirty_background_ratio and dirty_background_bytes to be 0, and
> >> based on following related code, I think 0 just means write back if
> >> there are any dirty pages.
> >>
> >> domain_dirty_limits:
> >> bg_bytes = dirty_background_bytes -> 0
> >> bg_ratio = (dirty_background_ratio * PAGE_SIZE) / 100 -> 0
> >>
> >> if (bg_bytes)
> >> bg_thresh = DIV_ROUND_UP(bg_bytes, PAGE_SIZE);
> >> else
> >> bg_thresh = (bg_ratio * available_memory) / PAGE_SIZE; -> 0
> >>
> >> dtc->bg_thresh = bg_thresh; -> 0
> >>
> >> balance_dirty_pages
> >> nr_reclaimable = global_node_page_state(NR_FILE_DIRTY);
> >> if (!laptop_mode && nr_reclaimable > gdtc->bg_thresh &&
> >> !writeback_in_progress(wb))
> >> wb_start_background_writeback(wb); -> writeback ASAP
> >>
> >> Thanks,
> >> Kuai
> >
> > Hi Kuai
> >
> > I'm not an expert about this either. Thanks for all your patches, I
> > can study more things too. But I still have some questions.
> >
> > I did a test in my environment something like this:
> > modprobe brd rd_nr=4 rd_size=10485760
> > mdadm -CR /dev/md0 -l10 -n4 /dev/ram[0123] --assume-clean
> > echo 0 > /proc/sys/vm/dirty_background_ratio
> > fio -filename=/dev/md0 -ioengine=libaio -rw=write -thread -bs=1k-8k
> > -numjobs=1 -iodepth=128 --runtime=10 -name=xxx
> > It will cause OOM and the system hangs
>
> OOM means you trigger this problem... Plug hold lots of bios and cost
> lots of memory, it's not that write back is disabled, you can verify
> this by monitor md inflight, noted that don't use too much memory for
> ramdisk(rd_nr * rd_size) in the test so that OOM won't be triggered.
>
> Have you tried to test with this patchset?

Yes, I know I have reproduced this problem. I'll have the v3 patchest.
>
> >
> > modprobe brd rd_nr=4 rd_size=10485760
> > mdadm -CR /dev/md0 -l10 -n4 /dev/ram[0123] --assume-clean
> > echo 1 > /proc/sys/vm/dirty_background_ratio (THIS is the only different place)
> > fio -filename=/dev/md0 -ioengine=libaio -rw=write -thread -bs=1k-8k
> > -numjobs=1 -iodepth=128 --runtime=10 -name=xxx
> > It can finish successfully. The value of dirty_background_ration is 1
> > here means it flushes ASAP
>
> This really doesn't mean flushes ASAP, our test report this problem in
> the real test that doesn't modify dirty_background_ratio. I guess
> somewhere triggers io_scheduler(), probably background thread think
> dirty pages doesn't match threshold, but I'm not sure for now.

Thanks for notifying me of this.

Regards
Xiao
>
> Thanks,
> Kuai
> >
> > So your method should be the opposite way as you designed. All the
> > memory can't be flushed in time, so it uses all memory very soon and
> > the memory runs out and the system hangs. The reason I'm looking at
> > the test is that do we really need this change. Because in the real
> > world, most customers don't disable write back. Anyway, it depends on
> > Song's decision and thanks for your patches again. I'll review V3 and
> > try to do some performance tests.
> >
> > Best Regards
> > Xiao
>