Re: [patch net-next v3] net: ethtool: fix unheld rtnl lock

From: Maxime Chevallier
Date: Wed Aug 28 2024 - 02:37:53 EST


Hi Juakub,

On Tue, 27 Aug 2024 12:46:53 -0700
Jakub Kicinski <kuba@xxxxxxxxxx> wrote:

> On Tue, 27 Aug 2024 09:23:36 +0200 Maxime Chevallier wrote:
> > On Mon, 26 Aug 2024 14:38:53 -0300
> > Diogo Jahchan Koike <djahchankoike@xxxxxxxxx> wrote:
> >
> > > ethnl_req_get_phydev should be called with rtnl lock held.
> > >
> > > Reported-by: syzbot+ec369e6d58e210135f71@xxxxxxxxxxxxxxxxxxxxxxxxx
> > > Closes: https://syzkaller.appspot.com/bug?extid=ec369e6d58e210135f71
> > > Fixes: 31748765bed3 ("net: ethtool: pse-pd: Target the command to the requested PHY")
> > > Signed-off-by: Diogo Jahchan Koike <djahchankoike@xxxxxxxxx>
> >
> > This looks good to me.
> >
> > Even though RTNL is released between the .validate() and .set()
> > calls, should the PHY disappear, the .set() callback handles that.
> >
> > Reviewed-by: Maxime Chevallier <maxime.chevallier@xxxxxxxxxxx>
>
> I know this isn't very well documented, but the point of .set_validate
> is to perform checks before taking rtnl_lock (which may be quite
> heavily contended), and potentially skip .set completely.
> See 99132b6eb792 ("ethtool: netlink: handle SET intro/outro in the
> common code"). Since we take rtnl lock and always return 1, this starts
> to feel a bit cart before the horse.

That explanation makes a lot of sense, I didn't have in mind that this
is what .set_validate is for.

> How about we move the validation into set? (following code for
> illustration only, please modify/test/review carefully and submit
> as v4 if agreed on):

That would work for me, that makes more sense than the current
approach.

>
> diff --git a/net/ethtool/pse-pd.c b/net/ethtool/pse-pd.c
> index ff81aa749784..18759d8f85a5 100644
> --- a/net/ethtool/pse-pd.c
> +++ b/net/ethtool/pse-pd.c
> @@ -217,13 +217,10 @@ const struct nla_policy ethnl_pse_set_policy[ETHTOOL_A_PSE_MAX + 1] = {
> };
>
> static int
> -ethnl_set_pse_validate(struct ethnl_req_info *req_info, struct genl_info *info)
> +ethnl_set_pse_validate(struct phy_device *phydev, struct genl_info *info)
> {
> - struct net_device *dev = req_info->dev;
> struct nlattr **tb = info->attrs;
> - struct phy_device *phydev;
>
> - phydev = dev->phydev;
> if (!phydev) {
> NL_SET_ERR_MSG(info->extack, "No PHY is attached");
> return -EOPNOTSUPP;
> @@ -249,7 +246,7 @@ ethnl_set_pse_validate(struct ethnl_req_info *req_info, struct genl_info *info)
> return -EOPNOTSUPP;
> }
>
> - return 1;
> + return 0;
> }
>
> static int
> @@ -258,10 +255,14 @@ ethnl_set_pse(struct ethnl_req_info *req_info, struct genl_info *info)
> struct net_device *dev = req_info->dev;
> struct nlattr **tb = info->attrs;
> struct phy_device *phydev;
> - int ret = 0;
> + int ret;
>
> phydev = dev->phydev;

With the updated PHY code, the above context would look like this :

phydev = ethnl_req_get_phydev(req_info, tb[ETHTOOL_A_PSE_HEADER],
info->extack);
if (IS_ERR_OR_NULL(phydev))
return -ENODEV;

>
> + ret = ethnl_set_pse_validate(phydev, info);
> + if (ret)
> + return ret;
> +
> if (tb[ETHTOOL_A_C33_PSE_AVAIL_PW_LIMIT]) {
> unsigned int pw_limit;
>
> @@ -307,7 +308,6 @@ const struct ethnl_request_ops ethnl_pse_request_ops = {
> .fill_reply = pse_fill_reply,
> .cleanup_data = pse_cleanup_data,
>
> - .set_validate = ethnl_set_pse_validate,
> .set = ethnl_set_pse,
> /* PSE has no notification */
> };

This is OK for me. Diogo, as you started addressing this, is it OK for
you to send a V4 with Jakub's proposed changes ?

Thanks,

Maxime