Re: [PATCH RESEND V2 net 1/7] net: hns3: fixed reset failure issues caused by the incorrect reset type

From: Jijie Shao
Date: Thu Dec 19 2024 - 07:19:21 EST



on 2024/12/19 17:41, Paolo Abeni wrote:
On 12/18/24 10:02, Michal Swiatkowski wrote:
On Tue, Dec 17, 2024 at 09:08:33AM +0800, Jijie Shao wrote:
From: Hao Lan <lanhao@xxxxxxxxxx>

When a reset type that is not supported by the driver is input, a reset
pending flag bit of the HNAE3_NONE_RESET type is generated in
reset_pending. The driver does not have a mechanism to clear this type
of error. As a result, the driver considers that the reset is not
complete. This patch provides a mechanism to clear the
HNAE3_NONE_RESET flag and the parameter of
hnae3_ae_ops.set_default_reset_request is verified.

The error message:
hns3 0000:39:01.0: cmd failed -16
hns3 0000:39:01.0: hclge device re-init failed, VF is disabled!
hns3 0000:39:01.0: failed to reset VF stack
hns3 0000:39:01.0: failed to reset VF(4)
hns3 0000:39:01.0: prepare reset(2) wait done
hns3 0000:39:01.0 eth4: already uninitialized

Use the crash tool to view struct hclgevf_dev:
struct hclgevf_dev {
...
default_reset_request = 0x20,
reset_level = HNAE3_NONE_RESET,
reset_pending = 0x100,
reset_type = HNAE3_NONE_RESET,
...
};

Fixes: 720bd5837e37 ("net: hns3: add set_default_reset_request in the hnae3_ae_ops")
Signed-off-by: Hao Lan <lanhao@xxxxxxxxxx>
Signed-off-by: Jijie Shao <shaojijie@xxxxxxxxxx>
Signed-off-by: Paolo Abeni <pabeni@xxxxxxxxxx>
I haven't signed-off this patch.

Still no need to repost (yet) for this if the following points are
solved rapidly (as I may end-up merging the series and really adding my
SoB), but please avoid this kind of issue in the future.

Sorry, the patch is fotmated from the patch that has been accpected.
So SOB is added automatically. I will delete the SOB in next version.


@@ -4227,7 +4240,7 @@ static bool hclge_reset_err_handle(struct hclge_dev *hdev)
return false;
} else if (hdev->rst_stats.reset_fail_cnt < MAX_RESET_FAIL_CNT) {
hdev->rst_stats.reset_fail_cnt++;
- set_bit(hdev->reset_type, &hdev->reset_pending);
+ hclge_set_reset_pending(hdev, hdev->reset_type);
Sth is unclear for me here. Doesn't HNAE3_NONE_RESET mean that there is
no reset? If yes, why in this case reset_fail_cnt++ is increasing?

Maybe the check for NONE_RESET should be done in this else if check to
prevent reset_fail_cnt from increasing (and also solve the problem with
pending bit set)
@Michal: I don't understand your comment above. hclge_reset_err_handle()
handles attempted reset failures. I don't see it triggered when
reset_type == HNAE3_NONE_RESET.

@@ -4470,8 +4483,20 @@ static void hclge_reset_event(struct pci_dev *pdev, struct hnae3_handle *handle)
static void hclge_set_def_reset_request(struct hnae3_ae_dev *ae_dev,
enum hnae3_reset_type rst_type)
{
+#define HCLGE_SUPPORT_RESET_TYPE \
+ (BIT(HNAE3_FLR_RESET) | BIT(HNAE3_FUNC_RESET) | \
+ BIT(HNAE3_GLOBAL_RESET) | BIT(HNAE3_IMP_RESET))
+
struct hclge_dev *hdev = ae_dev->priv;
+ if (!(BIT(rst_type) & HCLGE_SUPPORT_RESET_TYPE)) {
+ /* To prevent reset triggered by hclge_reset_event */
+ set_bit(HNAE3_NONE_RESET, &hdev->default_reset_request);
+ dev_warn(&hdev->pdev->dev, "unsupported reset type %d\n",
+ rst_type);
+ return;
+ }
Maybe (nit):
if (...) {
rst_type =
dev_warn();
}

set_bit(rst_type, );
It is a little hard to follow with return in the if.
@Michal: I personally find the patch code quite readable, do you have
strong opinions here?

set_bit(rst_type, &hdev->default_reset_request);
}
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index 2f6ffb88e700..fd0abe37fdd7 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -1393,6 +1393,17 @@ static int hclgevf_notify_roce_client(struct hclgevf_dev *hdev,
return ret;
}
+static void hclgevf_set_reset_pending(struct hclgevf_dev *hdev,
+ enum hnae3_reset_type reset_type)
+{
+ /* When an incorrect reset type is executed, the get_reset_level
+ * function generates the HNAE3_NONE_RESET flag. As a result, this
+ * type do not need to pending.
+ */
+ if (reset_type != HNAE3_NONE_RESET)
+ set_bit(reset_type, &hdev->reset_pending);
+}
You already have a way to share the code between PF and VF, so please
move the same functions to common file in one direction up.
AFAICS this can't be shared short of a large refactor not suitable for
net as the functions eligible for sharing operate on different structs
with different layout (hclgevf_dev vs hclge_dev). Currently all the
shared code operates on shared structs.

Cheers,

Yes, Paolo is right, this function cannot be shared.

Thanks,
Jijie Shao


Paolo