Re: Fwd: [PATCH] NTB: Fix an error in get link status

From: Jiasen Lin
Date: Thu Nov 21 2019 - 08:34:08 EST




On 2019/11/20 22:13, Sanjay R Mehta wrote:
From: *Jiasen Lin* <linjiasen@xxxxxxxx <mailto:linjiasen@xxxxxxxx>>
Date: Wed, Nov 20, 2019 at 3:25 PM
Subject: Re: [PATCH] NTB: Fix an error in get link status
To: Jon Mason <jdmason@xxxxxxxx <mailto:jdmason@xxxxxxxx>>
Cc: S-k, Shyam-sundar <Shyam-sundar.S-k@xxxxxxx <mailto:Shyam-sundar.S-k@xxxxxxx>>, Dave Jiang <dave.jiang@xxxxxxxxx <mailto:dave.jiang@xxxxxxxxx>>, Allen Hubbe <allenbh@xxxxxxxxx
<mailto:allenbh@xxxxxxxxx>>, linux-kernel <linux-kernel@xxxxxxxxxxxxxxx <mailto:linux-kernel@xxxxxxxxxxxxxxx>>, linux-ntb <linux-ntb@xxxxxxxxxxxxxxxx <mailto:linux-ntb@xxxxxxxxxxxxxxxx>>,
<linjiasen007@xxxxxxxxx <mailto:linjiasen007@xxxxxxxxx>>, Jiasen Lin <linjiasen@xxxxxxxx <mailto:linjiasen@xxxxxxxx>>




On 2019/11/18 18:17, Jiasen Lin wrote:


On 2019/11/18 7:00, Jon Mason wrote:
On Thu, Nov 7, 2019 at 4:37 AM Jiasen Lin <linjiasen@xxxxxxxx <mailto:linjiasen@xxxxxxxx>> wrote:

The offset of PCIe Capability Header for AMD and HYGON NTB is 0x64,
but the macro which named "AMD_LINK_STATUS_OFFSET" is defined as 0x68.
It is offset of Device Capabilities Reg rather than Link Control Reg.

This code trigger an error in get link statsus:

ÂÂÂÂÂÂÂÂ cat /sys/kernel/debug/ntb_hw_amd/0000:43:00.1/info
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ LNK STA -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 0x8fa1
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Link Status -ÂÂÂÂÂÂÂÂÂÂ Up
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Link Speed -ÂÂÂÂÂÂÂÂÂÂÂ PCI-E Gen 0
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Link Width -ÂÂÂÂÂÂÂÂÂÂÂ x0

This patch use pcie_capability_read_dword to get link status.
After fix this issue, we can get link status accurately:

ÂÂÂÂÂÂÂÂ cat /sys/kernel/debug/ntb_hw_amd/0000:43:00.1/info
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ LNK STA -ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 0x11030042
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Link Status -ÂÂÂÂÂÂÂÂÂÂ Up
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Link Speed -ÂÂÂÂÂÂÂÂÂÂÂ PCI-E Gen 3
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Link Width -ÂÂÂÂÂÂÂÂÂÂÂ x16

No response from AMD maintainers, but it looks like you are correct.

This needs a "Fixes:" line here. I took the liberty of adding one to
this patch.


Thank you for your suggestions. Yes, this patch fix the commit id:
a1b3695 ("NTB: Add support for AMD PCI-Express Non-Transparent Bridge").

Signed-off-by: Jiasen Lin <linjiasen@xxxxxxxx <mailto:linjiasen@xxxxxxxx>>
---
 drivers/ntb/hw/amd/ntb_hw_amd.c | 5 +++--
 drivers/ntb/hw/amd/ntb_hw_amd.h | 1 -
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/ntb/hw/amd/ntb_hw_amd.c
b/drivers/ntb/hw/amd/ntb_hw_amd.c
index 156c2a1..ae91105 100644
--- a/drivers/ntb/hw/amd/ntb_hw_amd.c
+++ b/drivers/ntb/hw/amd/ntb_hw_amd.c
@@ -855,8 +855,8 @@ static int amd_poll_link(struct amd_ntb_dev *ndev)

ÂÂÂÂÂÂÂÂ ndev->cntl_sta = reg;

-ÂÂÂÂÂÂ rc = pci_read_config_dword(ndev->ntb.pdev,
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ AMD_LINK_STATUS_OFFSET, &stat);
+ÂÂÂÂÂÂ rc = pcie_capability_read_dword(ndev->ntb.pdev,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ PCI_EXP_LNKCTL, &stat);
ÂÂÂÂÂÂÂÂ if (rc)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return 0;
ÂÂÂÂÂÂÂÂ ndev->lnk_sta = stat;
@@ -1139,6 +1139,7 @@ static const struct ntb_dev_data dev_data[] = {
 static const struct pci_device_id amd_ntb_pci_tbl[] = {
ÂÂÂÂÂÂÂÂ { PCI_VDEVICE(AMD, 0x145b), (kernel_ulong_t)&dev_data[0] },
ÂÂÂÂÂÂÂÂ { PCI_VDEVICE(AMD, 0x148b), (kernel_ulong_t)&dev_data[1] },
+ÂÂÂÂÂÂ { PCI_VDEVICE(HYGON, 0x145b), (kernel_ulong_t)&dev_data[0] },

This should be a separate patch. I took the liberty of splitting it
off into a unique patch and attributing it to you. I've pushed them
to the ntb-next branch on
https://github.com/jonmason/ntb

Thank you for your comment. We appreciate the time and effort you have
spent to split it off, I will test it ASAP.

Please verify everything looks acceptable to you (given the changes I
did above that are attributed to you). Also, testing of the latest
code is always appreciated.

Thanks,
Jon


I have tested these patches that are pushed to ntb-next branch, they
work well on HYGON platforms.

Thanks,
Jiasen Lin

Hi Jiasen Lin,

Apologies for my delayed response. I was on vacation.

Your patch is a correct fix, but that would work only for primary system.

The design of AMD NTB implementation is such that NTB primary acts as an endpoint device and NTB secondary is a PCIe Root Port (RP). Considering that,
the link status and control register needs to be accessed differently based on the NTB topology.

So in the case of NTB secondary, we read the link status and control register of the PCIe RP capabilities structure and in the case of NTB primary, we read the
link status and control register from NTB device capabilities structure.

The code below is the proper fix for AMD platform. I will be sending incremental change above your patch.

would appreciate if you could test my patch and let me know whether that works for you.


Dhyana CPU dones not support data transfer while both sides of PCIe link are configured as NTB, in other word, Dhyana only support NTB that is connected to RP rather than NT to NT.

As illustrated in the following topology, NTB consists of two PCIe endpoints, a Primary NTB, and a Secondary NTB. Primary CPU can find Priamry NTB, while Secondary NTB, Secondary internal SW.ds and Secondary internal SW.ds are enumerated by secondary CPU.

In this topology, to remove any ambiguity, your suggestion is more accurate method to get link status of NTB.

In primary PCI domain:
Primary RP--Primary NTB----------------------------------------
40:04.1-------41:00.1(Pri NTB) |
| In secondary PCI domain: |
Secondary RP--Secondary SW.us--Secondary SW.ds--Secondary NTB--
40:03.1---------41:00.0---------42:00.0---------43:00.1(Sec NTB)

I have modified the code according to your suggestion and tested it
on Dhyana platform, it works well, expect to receice your patch.

Before modify the code, read the Link Status and control register of the secondary NTB device to get link status.

cat /sys/kernel/debug/ntb_hw_amd/0000\:43\:00.1/info
NTB Device Information:
Connection Topology - NTB_TOPO_SEC
LNK STA - 0x11030042
Link Status - Up
Link Speed - PCI-E Gen 3
Link Width - x16

After modify the code, read the Link Status and control register of the secondary RP to get link status.

cat /sys/kernel/debug/ntb_hw_amd/0000\:43\:00.1/info
NTB Device Information:
Connection Topology - NTB_TOPO_SEC
LNK STA - 0x70830042
Link Status - Up
Link Speed - PCI-E Gen 3
Link Width - x8

Thanks,
Jiasen Lin

---
Âdrivers/ntb/hw/amd/ntb_hw_amd.c | 27 +++++++++++++++++++++++----
Âdrivers/ntb/hw/amd/ntb_hw_amd.h |Â 1 -
Â2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/ntb/hw/amd/ntb_hw_amd.c b/drivers/ntb/hw/amd/ntb_hw_amd.c
index 14ad69c..91e1966 100644
--- a/drivers/ntb/hw/amd/ntb_hw_amd.c
+++ b/drivers/ntb/hw/amd/ntb_hw_amd.c
@@ -842,6 +842,8 @@ static inline void ndev_init_struct(struct amd_ntb_dev *ndev,
Âstatic int amd_poll_link(struct amd_ntb_dev *ndev)
Â{
ÂÂÂÂ void __iomem *mmio = ndev->peer_mmio;
+ÂÂÂ struct pci_dev *pci_rp = NULL;
+ÂÂÂ struct pci_dev *pdev = NULL;
ÂÂÂÂ u32 reg, stat;
ÂÂÂÂ int rc;
@@ -855,10 +857,27 @@ static int amd_poll_link(struct amd_ntb_dev *ndev)
ÂÂÂÂ ndev->cntl_sta = reg;
-ÂÂÂ rc = pci_read_config_dword(ndev->ntb.pdev,
-ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂ AMD_LINK_STATUS_OFFSET, &stat);
-ÂÂÂ if (rc)
-ÂÂÂ ÂÂÂ return 0;
+ÂÂÂ if (ndev->ntb.topo == NTB_TOPO_SEC) {
+ÂÂÂ ÂÂÂ /* Locate the pointer to PCIe Root Port for this device */
+ÂÂÂ ÂÂÂ pci_rp = pci_find_pcie_root_port(ndev->ntb.pdev);
+ÂÂÂ ÂÂÂ /* Read the PCIe Link Control and Status register */
+ÂÂÂ ÂÂÂ if (pci_rp) {
+ÂÂÂ ÂÂÂ ÂÂÂ rc = pcie_capability_read_dword(pci_rp, PCI_EXP_LNKCTL,
+ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ &stat);
+ÂÂÂ ÂÂÂ ÂÂÂ if (rc)
+ÂÂÂ ÂÂÂ ÂÂÂ ÂÂÂ return 0;
+ÂÂÂ ÂÂÂ }
+ÂÂÂ } else if (ndev->ntb.topo == NTB_TOPO_PRI) {
+ÂÂÂ ÂÂÂ /*
+ÂÂÂ ÂÂÂ Â* For NTB primary, we simply read the Link Status and control
+ÂÂÂ ÂÂÂ Â* register of the NTB device itself.
+ÂÂÂ ÂÂÂ Â*/
+ÂÂÂ ÂÂÂ pdev = ndev->ntb.pdev;
+ÂÂÂ ÂÂÂ rc = pcie_capability_read_dword(pdev, PCI_EXP_LNKCTL, &stat);
+ÂÂÂ ÂÂÂ if (rc)
+ÂÂÂ ÂÂÂ ÂÂÂ return 0;
+ÂÂÂ }
+
ÂÂÂÂ ndev->lnk_sta = stat;
ÂÂÂÂ return 1;
diff --git a/drivers/ntb/hw/amd/ntb_hw_amd.h b/drivers/ntb/hw/amd/ntb_hw_amd.h
index 139a307..39e5d18 100644
--- a/drivers/ntb/hw/amd/ntb_hw_amd.h
+++ b/drivers/ntb/hw/amd/ntb_hw_amd.h
@@ -53,7 +53,6 @@
Â#include <linux/pci.h>
Â#define AMD_LINK_HB_TIMEOUTÂÂÂ msecs_to_jiffies(1000)
-#define AMD_LINK_STATUS_OFFSETÂÂÂ 0x68
Â#define NTB_LIN_STA_ACTIVE_BITÂÂÂ 0x00000002
Â#define NTB_LNK_STA_SPEED_MASKÂÂÂ 0x000F0000
Â#define NTB_LNK_STA_WIDTH_MASKÂÂÂ 0x03F00000