Re: [PATCH v5 5/7] can: netlink: add interface for CAN-FD Transmitter Delay Compensation (TDC)

From: Vincent MAILHOL
Date: Wed Aug 18 2021 - 04:37:35 EST


On Wed. 18 Aug 2021 à 17:19, Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx> wrote:
> On 18.08.2021 17:08:51, Vincent MAILHOL wrote:
> > On Wed 18 Aug 2021 at 04:55, Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx> wrote:
> > > On 15.08.2021 12:32:46, Vincent Mailhol wrote:
> > > > +static int can_tdc_changelink(struct net_device *dev, const struct nlattr *nla,
> > > > + struct netlink_ext_ack *extack)
> > > > +{
> > > > + struct nlattr *tb_tdc[IFLA_CAN_TDC_MAX + 1];
> > > > + struct can_priv *priv = netdev_priv(dev);
> > > > + struct can_tdc *tdc = &priv->tdc;
> > > > + const struct can_tdc_const *tdc_const = priv->tdc_const;
> > > > + int err;
> > > > +
> > > > + if (!tdc_const || !can_tdc_is_enabled(priv))
> > > > + return -EOPNOTSUPP;
> > > > +
> > > > + if (dev->flags & IFF_UP)
> > > > + return -EBUSY;
> > > > +
> > > > + err = nla_parse_nested(tb_tdc, IFLA_CAN_TDC_MAX, nla,
> > > > + can_tdc_policy, extack);
> > > > + if (err)
> > > > + return err;
> > > > +
> > > > + if (tb_tdc[IFLA_CAN_TDC_TDCV]) {
> > > > + u32 tdcv = nla_get_u32(tb_tdc[IFLA_CAN_TDC_TDCV]);
> > > > +
> > > > + if (tdcv < tdc_const->tdcv_min || tdcv > tdc_const->tdcv_max)
> > > > + return -EINVAL;
> > > > +
> > > > + tdc->tdcv = tdcv;
> > >
> > > You have to assign to a temporary struct first, and set the priv->tdc
> > > after complete validation, otherwise you end up with inconsistent
> > > values.
> >
> > Actually, copying the temporary structure to priv->tdc is not an
> > atomic operation. Here, you are only reducing the window, not
> > closing it.
>
> It's not a race I'm fixing.
>
> >
> > > > + }
> > > > +
> > > > + if (tb_tdc[IFLA_CAN_TDC_TDCO]) {
> > > > + u32 tdco = nla_get_u32(tb_tdc[IFLA_CAN_TDC_TDCO]);
> > > > +
> > > > + if (tdco < tdc_const->tdco_min || tdco > tdc_const->tdco_max)
> > > > + return -EINVAL;
> > > > +
> > > > + tdc->tdco = tdco;
> > > > + }
> > > > +
> > > > + if (tb_tdc[IFLA_CAN_TDC_TDCF]) {
> > > > + u32 tdcf = nla_get_u32(tb_tdc[IFLA_CAN_TDC_TDCF]);
> > > > +
> > > > + if (tdcf < tdc_const->tdcf_min || tdcf > tdc_const->tdcf_max)
> > > > + return -EINVAL;
> > > > +
> > > > + tdc->tdcf = tdcf;
> > > > + }
> > > > +
> > > > + return 0;
> > > > +}
> > >
> > > To reproduce (ip pseudo-code only :D ):
> > >
> > > ip down
> > > ip up tdc-mode manual tdco 111 tdcv 33 # 111 is out of range, 33 is valid
> > > ip down
> > > ip up # results in tdco=0 tdcv=33 mode=manual
> >
> > I do not think that this PoC would work because, thankfully, the
> > netlink interface uses a mutex to prevent this issue from
> > occurring.
>
> It works, I've tested it :)
>
> > That mutex is defined in:
> > https://elixir.bootlin.com/linux/latest/source/net/core/rtnetlink.c#L68
> >
> > Each time a netlink message is sent to the kernel, it would be
> > dispatched by rtnetlink_rcv_msg() which will make sure to lock
> > the mutex before doing so:
> > https://elixir.bootlin.com/linux/latest/source/net/core/rtnetlink.c#L5551
> >
> > A funny note is that because the mutex is global, if you run two
> > ip command in a row:
> >
> > | ip link set can0 type can bitrate 500000
> > | ip link set can1 up
> >
> > the second one will wait for the first one to finish even if it
> > is on a different network device.
> >
> > To conclude, I do not think this needs to be fixed.
>
> It's not a race. Consider this command:
>
> | ip up tdc-mode manual tdco 111 tdcv 33 # 111 is out of range, 33 is valid
>
> tdcv is checked first and valid, then it's assigned to the priv->tdc.
> tdco is checked second and invalid, then can_tdc_changelink() returns -EINVAL.
>
> tdc ends up being half set :(
>
> So the setting of tdc is inconsistent and when you do a "ip down" "ip
> up" then it results in a tdco=0 tdcv=33 mode=manual.

My bad. Now I understand the issue.
I was confused because tdco=111 is in the valid range of my driver...
I will squash your patch.

Actually, I think that there is one more thing which needs to be
fixed: If can_tdc_changelink() fails (e.g. value out of range),
the CAN_CTRLMODE_TDC_AUTO or CAN_CTRLMODE_TDC_MANUAL would still
be set, meaning that can_tdc_is_enabled() would return true. So I
will add a "fail" branch to clear the flags.


Yours sincerely,
Vincent