[PATCH RESEND 3/4] docs: Add HiSilicon PTT device driver documentation

From: Yicong Yang
Date: Sat Apr 17 2021 - 06:20:30 EST


Document the introduction and usage of HiSilicon PTT device driver.

Signed-off-by: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
---
Documentation/trace/hisi-ptt.rst | 326 +++++++++++++++++++++++++++++++++++++++
1 file changed, 326 insertions(+)
create mode 100644 Documentation/trace/hisi-ptt.rst

diff --git a/Documentation/trace/hisi-ptt.rst b/Documentation/trace/hisi-ptt.rst
new file mode 100644
index 0000000..f093846
--- /dev/null
+++ b/Documentation/trace/hisi-ptt.rst
@@ -0,0 +1,326 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================
+HiSilicon PCIe Tune and Trace device
+======================================
+
+Introduction
+============
+
+HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex
+integrated Endpoint (RCiEP) device, providing the capability
+to dynamically monitor and tune the PCIe link's events (tune),
+and trace the TLP headers (trace). The two functions are independent,
+but is recommended to use them together to analyze and enhance the
+PCIe link's performance.
+
+On Kunpeng 930 SoC, the PCIe Root Complex is composed of several
+PCIe cores. Each PCIe core includes several Root Ports and a PTT
+RCiEP, like below. The PTT device is capable of tuning and
+tracing the link of the PCIe core.
+::
+ +--------------Core 0-------+
+ | | [ PTT ] |
+ | | [Root Port]---[Endpoint]
+ | | [Root Port]---[Endpoint]
+ | | [Root Port]---[Endpoint]
+ Root Complex |------Core 1-------+
+ | | [ PTT ] |
+ | | [Root Port]---[ Switch ]---[Endpoint]
+ | | [Root Port]---[Endpoint] `-[Endpoint]
+ | | [Root Port]---[Endpoint]
+ +---------------------------+
+
+The PTT device driver cannot be loaded if debugfs is not mounted.
+Each PTT device will be presented under /sys/kernel/debugfs/hisi_ptt
+as its root directory, with name of its BDF number.
+::
+
+ /sys/kernel/debug/hisi_ptt/<domain>:<bus>:<device>.<function>
+
+Tune
+====
+
+PTT tune is designed for monitoring and adjusting PCIe link parameters (events).
+Currently we support events in 4 classes. The scope of the events
+covers the PCIe core to which the PTT device belongs.
+
+Each event is presented as a file under $(PTT root dir)/$(BDF)/tune, and
+mostly a simple open/read/write/close cycle will be used to tune
+the event.
+::
+ $ cd /sys/kernel/debug/hisi_ptt/$(BDF)/tune
+ $ ls
+ qos_tx_cpl qos_tx_np qos_tx_p
+ tx_path_rx_req_alloc_buf_level
+ tx_path_tx_req_alloc_buf_level
+ $ cat qos_tx_dp
+ 1
+ $ echo 2 > qos_tx_dp
+ $ cat qos_tx_dp
+ 2
+
+Current value (numerical value) of the event can be simply read
+from the file, and the desired value written to the file to tune.
+
+1. Tx path QoS control
+------------------------
+
+The following files are provided to tune the QoS of the tx path of
+the PCIe core.
+
+- qos_tx_cpl: weight of Tx completion TLPs
+- qos_tx_np: weight of Tx non-posted TLPs
+- qos_tx_p: weight of Tx posted TLPs
+
+The weight influences the proportion of certain packets on the PCIe link.
+For example, for the storage scenario, increase the proportion
+of the completion packets on the link to enhance the performance as
+more completions are consumed.
+
+The available tune data of these events is [0, 1, 2].
+Writing a negative value will return an error, and out of range
+values will be converted to 2. Note that the event value just
+indicates a probable level, but is not precise.
+
+2. Tx path buffer control
+-------------------------
+
+Following files are provided to tune the buffer of tx path of the PCIe core.
+
+- tx_path_rx_req_alloc_buf_level: watermark of Rx requested
+- tx_path_tx_req_alloc_buf_level: watermark of Tx requested
+
+These events influence the watermark of the buffer allocated for each
+type. Rx means the inbound while Tx means outbound. The packets will
+be stored in the buffer first and then posted either when the watermark
+reached or when timed out. For a busy direction, you should increase
+the related buffer watermark to avoid frequently posting and thus
+enhance the performance. In most cases just keep the default value.
+
+The available tune data of above events is [0, 1, 2].
+Writing a negative value will return an error, and out of range
+values will be converted to 2. Note that the event value just
+indicates a probable level, but is not precise.
+
+Trace
+=====
+
+PTT trace is designed for dumping the TLP headers to the memory, which
+can be used to analyze the transactions and usage condition of the PCIe
+Link. You can choose to filter the traced headers by either requester ID,
+or those downstream of a set of Root Ports on the same core of the PTT
+device. It's also supported to trace the headers of certain type and of
+certain direction.
+
+In order to start trace, you need to configure the parameters first.
+The parameters files are provided under $(PTT root dir)/$(BDF)/trace.
+::
+ $ cd /sys/kernel/debug/hisi_ptt/$(BDF)/trace
+ $ ls
+ free_buffer filter buflet_nums buflet_size
+ direction type data trace_on
+ data_format
+
+1. filter
+---------
+
+You can configure the filter of TLP headers through the file. The filter
+is provided as BDF numbers of either Root Port or subordinates, which
+belong to the same PCIe core. You can get the filters available and
+currently configured by reading the file, and write the desired BDF to
+the file to set the filters. There is no default filter, which means you
+must specify at least one filter before starting tracing. Writing an
+invalid BDF (not in the available list) will return a failure.
+::
+ $ echo 0000:80:04.0 > filter
+ $ cat filter
+ #### Root Ports ####
+ 0000:80:00.0
+ [0000:80:04.0]
+ #### Functions ####
+ 0000:81:00.0
+ 0000:81:00.1
+ 0000:82:00.0
+
+Note that multiple Root Ports can be specified at one time, but only
+one Endpoint function can be specified in one trace. Specifying both
+Root Port and function at the same time is not supported.
+
+If no filter is available, reading the filter will get the hint.
+::
+ $ cat filter
+ #### No available filter ####
+
+The filter can be dynamically updated, which means you can always
+get correct filter information when hotplug events happen, or
+when you manually remove/rescan the devices.
+
+2. type
+-------
+
+You can trace the TLP headers of certain types by configuring the file.
+Reading the file will get available types and current setting, and write
+the desired type to the file to configure. The default type is
+`posted_request`. Write the types not in the available list will return
+a failure.
+::
+ $ echo completion > type
+ $ cat type
+ all posted_request non-posted_request [completion]
+
+3. direction
+------------
+
+You can trace the TLP headers from certain direction, which is relative
+to the Root Port or the PCIe core. Read the file to get available
+directions and current configurition, and write the desired direction
+to configure. The default value is `rx` and any invalid direction will
+return a failure. Note `rxtx_no_dma_p2p` means the headers of both
+directions, but not include P2P DMA access.
+::
+ $ echo rxtx > direction
+ $ cat direction
+ rx tx [rxtx] rxtx_no_dma_p2p
+
+4. buflet_size
+--------------
+
+The traced TLP headers will be written to the memory allocated
+by the driver. The hardware accepts 4 DMA address with same size,
+and writes the buflet sequentially like below. If DMA addr 3 is
+finished and the trace is still on, it will return to addr 0.
+Driver will allocate each DMA buffer (we call it buflet).
+The finished buflet will be replaced with a new one, so
+a long time trace can be achieved.
+::
+ +->[DMA addr 0]->[DMA addr 1]->[DMA addr 2]->[DMA addr 3]-+
+ +---------------------------------------------------------+
+
+You should both configure the buflet_size and buflet_nums to
+configure the `trace buffer` to receive the TLP headers. The
+total trace buffer size is buflet_size * buflet_nums. Note
+that the trace buffer will not be allocated immediately after you
+configure the parameters, but will be allocated right before
+the trace starts.
+
+This file configures the buflet size. Read the file will get
+available buflet size and size set currently; write the desired
+size to the file to configure. The default size is 2 MiB and any
+invalid size written will return a failure.
+::
+ $ cat buflet_size
+ [2 MiB] 4 MiB
+ $ echo 4 > buflet_size
+ $ cat buflet_size
+ 2 MiB [4 MiB]
+
+5. buflet_nums
+--------------
+
+You can write the desired buflet count to the file to configure,
+and read the file to get current buflet count. The default
+value is 64. Any positive value is valid. Note that big value
+may lead to DMA memory allocation failure, and you will not be
+able to start tracing. If it happens, you should consider adjusting
+buflet_nums or buflet_size.
+::
+ $ cat buflet_nums
+ 64
+ $ echo 128 > buflet_nums
+ $ cat buflet_nums
+ 128
+
+6. data
+-------
+
+The file to access the traced data. You can read the file to get the
+binary blob of traced TLP headers. The format of the headers is
+4 Dword length and is just as defined by the PCIe Spec r4.0,
+Sec 2.2.4.1, or 8 Dword length with additional 4 Dword extra
+information.
+
+echo "" > data will free all the trace buffers allocated as well as
+the traced datas.
+
+7. trace_on
+-----------
+
+Start or end the trace by writing to the file, and monitor the trace
+status by reading the file.
+::
+ $ echo 1 > trace_on # start trace
+ $ cat trace_on # get the trace status
+ 1
+ $ echo 0 > trace_on # manually end trace
+
+The read value of the trace_on will be auto cleared if the buffer
+allocated is full. 1 indicates the trace is running and 0 for
+stopped. Writing any non-zero value to the file starts tracing.
+
+8. free_buffer
+--------------
+
+File to indicate the trace buffer status and to manually free the
+trace buffer. The read value of 1 indicates the trace buffer has
+been allocated and exists in the memory, while 0 indicates there
+is no buffer allocated. Write 1 to the file to free the trace
+buffer as well as the traced data.
+::
+ $ cat free_buffer
+ 1 # indicate the buffer exists
+ $ echo 1 > free_buffer # free the trace buffer
+ $ cat free_buffer
+ 0
+
+9. data_format
+--------------
+
+File to indicate the format of the traced TLP headers. User can
+specify the desired format of traced TLP headers. Available formats
+are 4DW, 8DW which indicates the length of each TLP headers traced.
+::
+ $ cat data_format
+ [4DW] 8DW
+ $ echo 8 > data_format
+ $ cat data_format
+ 4DW [8DW]
+
+The traced TLP header format is different from the PCIe standard.
+
+When using the 8DW data format, the entire TLP header is logged
+(Header DW0-3 shown below). For example, the TLP header for Memory
+Reads with 64-bit addresses is shown in PCIe r5.0, Figure 2-17;
+the header for Configuration Requests is shown in Figure 2.20, etc.
+
+In addition, 8DW trace buffer entries contain a timestamp and
+possibly a prefix for a PASID TLP prefix (see Figure 6-20).
+Otherwise this field will be all 0.
+
+The bit[31:11] of DW0 is always 0x1fffff, which can be
+used to distinguish the data format. 8DW format is like
+::
+ bits [ 31:11 ][ 10:0 ]
+ |---------------------------------------|-------------------|
+ DW0 [ 0x1fffff ][ Reserved (0x7ff) ]
+ DW1 [ Prefix ]
+ DW2 [ Header DW0 ]
+ DW3 [ Header DW1 ]
+ DW4 [ Header DW2 ]
+ DW5 [ Header DW3 ]
+ DW6 [ Reserved (0x0) ]
+ DW7 [ Time ]
+
+When using the 4DW data format, DW0 of the trace buffer entry
+contains selected fields of DW0 of the TLP, together with a
+timestamp. DW1-DW3 of the trace buffer entry contain DW1-DW3
+directly from the TLP header.
+
+4DW format is like
+::
+ bits [31:30] [ 29:25 ][24][23][22][21][ 20:11 ][ 10:0 ]
+ |-----|---------|---|---|---|---|-------------|-------------|
+ DW0 [ Fmt ][ Type ][T9][T8][TH][SO][ Length ][ Time ]
+ DW1 [ Header DW1 ]
+ DW2 [ Header DW2 ]
+ DW3 [ Header DW3 ]
--
2.8.1