Re: [PATCH] mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again

From: Zach O'Keefe
Date: Wed Apr 17 2024 - 15:34:30 EST


On Wed, Apr 17, 2024 at 4:10 AM Jan Kara <jack@xxxxxxx> wrote:
>
> On Thu 18-01-24 10:19:53, Zach O'Keefe wrote:
> > (struct dirty_throttle_control *)->thresh is an unsigned long, but is
> > passed as the u32 divisor argument to div_u64(). On architectures where
> > unsigned long is 64 bytes, the argument will be implicitly truncated.
> >
> > Use div64_u64() instead of div_u64() so that the value used in the "is
> > this a safe division" check is the same as the divisor.
> >
> > Also, remove redundant cast of the numerator to u64, as that should
> > happen implicitly.
> >
> > This would be difficult to exploit in memcg domain, given the
> > ratio-based arithmetic domain_drity_limits() uses, but is much easier in
> > global writeback domain with a BDI_CAP_STRICTLIMIT-backing device, using
> > e.g. vm.dirty_bytes=(1<<32)*PAGE_SIZE so that dtc->thresh == (1<<32)
> >
> > Fixes: f6789593d5ce ("mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()")
> > Cc: Maxim Patlasov <MPatlasov@xxxxxxxxxxxxx>
> > Cc: <stable@xxxxxxxxxxxxxxx>
> > Signed-off-by: Zach O'Keefe <zokeefe@xxxxxxxxxx>
>
> I've come across this change today and it is broken in several ways:

Thanks for picking up on this, Jan.

> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index cd4e4ae77c40a..02147b61712bc 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -1638,7 +1638,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
> > */
> > dtc->wb_thresh = __wb_calc_thresh(dtc);
> > dtc->wb_bg_thresh = dtc->thresh ?
> > - div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
> > + div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
>
> Firstly, the removed (u64) cast from the multiplication will introduce a
> multiplication overflow on 32-bit archs if wb_thresh * bg_thresh >= 1<<32
> (which is actually common - the default settings with 4GB of RAM will
> trigger this). [..]

True, and embarrassing given I was looking at this code with a 32-bit
focus. Well spotted.

> [..] Secondly, the div64_u64() is unnecessarily expensive on
> 32-bit archs. We have div64_ul() in case we want to be safe & cheap.

A last-minute change vs just casting the initial "dtc->thresh ?"
check. It did look expensive, but figured its existence implied it
should be used. I must have missed div64_ul().

> Thirdly, if thresholds are larger than 1<<32 pages, then dirty balancing is
> going to blow up in many other spectacular ways - consider only the
> multiplication on this line - it will not necessarily fit into u64 anymore.
> The whole dirty limiting code is interspersed with assumptions that limits
> are actually within u32 and we do our calculations in unsigned longs to
> avoid worrying about overflows (with occasional typing to u64 to make it
> more interesting because people expected those entities to overflow 32 bits
> even on 32-bit archs). Which is lame I agree but so far people don't seem
> to be setting limits to 16TB or more. And I'm not really worried about
> security here since this is global-root-only tunable and that has much
> better ways to DoS the system.
>
> So overall I'm all for cleaning up this code but in a sensible way please.
> E.g. for these overflow issues at least do it one function at a time so
> that we can sensibly review it.
>
> Andrew, can you please revert this patch until we have a better fix? So far
> it does more harm than good... Thanks!

Shall we just roll-forward with a suitable fix? I think all the
original code actually "needed" was to cast the ternary predicate,
like:

---8<---
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index fba324e1a010..ca1bfc0c9bdd 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1637,8 +1637,8 @@ static inline void wb_dirty_limits(struct
dirty_throttle_control *dtc)
* at some rate <= (write_bw / 2) for bringing down wb_dirty.
*/
dtc->wb_thresh = __wb_calc_thresh(dtc);
- dtc->wb_bg_thresh = dtc->thresh ?
- div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
+ dtc->wb_bg_thresh = (u32)dtc->thresh ?
+ div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;

/*
* In order to avoid the stacked BDI deadlock we need
---8<---

Thanks, and apologize for the inconvenience

Zach

> Honza
> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR