Re: [PATCH v3 1/8] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device

From: Yicong Yang
Date: Mon Feb 14 2022 - 07:51:24 EST


On 2022/2/8 19:07, Yicong Yang wrote:
> On 2022/2/7 19:42, Jonathan Cameron wrote:
>> On Mon, 24 Jan 2022 21:11:11 +0800
>> Yicong Yang <yangyicong@xxxxxxxxxxxxx> wrote:
>>
>>> HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex
>>> integrated Endpoint(RCiEP) device, providing the capability
>>> to dynamically monitor and tune the PCIe traffic, and trace
>>> the TLP headers.
>>>
>>> Add the driver for the device to enable the trace function.
>>> This patch adds basic function of trace, including the device's
>>> probe and initialization, functions for trace buffer allocation
>>> and trace enable/disable, register an interrupt handler to
>>> simply response to the DMA events. The user interface of trace
>>> will be added in the following patch.
>>>
>>> Signed-off-by: Yicong Yang <yangyicong@xxxxxxxxxxxxx>
>> Hi Yicong,
>>
>> I've not been following all the earlier discussion on this driver closely
>> so I may well raise something that has already been addressed. If so
>> just ignore the comment.
>
> Thanks for the comments. It's ok for me to clarify it :).
> Part replies inline and I need to do some test on the others.
>
>>
>> Thanks,
>>
>> Jonathan
>>
>>> ---
>>> drivers/Makefile | 1 +
>>> drivers/hwtracing/Kconfig | 2 +
>>> drivers/hwtracing/ptt/Kconfig | 11 +
>>> drivers/hwtracing/ptt/Makefile | 2 +
>>> drivers/hwtracing/ptt/hisi_ptt.c | 398 +++++++++++++++++++++++++++++++
>>> drivers/hwtracing/ptt/hisi_ptt.h | 159 ++++++++++++
>>> 6 files changed, 573 insertions(+)
>>> create mode 100644 drivers/hwtracing/ptt/Kconfig
>>> create mode 100644 drivers/hwtracing/ptt/Makefile
>>> create mode 100644 drivers/hwtracing/ptt/hisi_ptt.c
>>> create mode 100644 drivers/hwtracing/ptt/hisi_ptt.h
>>>
> [...]
>>> +
>>> +static int hisi_ptt_alloc_trace_buf(struct hisi_ptt *hisi_ptt)
>>> +{
>>> + struct hisi_ptt_trace_ctrl *ctrl = &hisi_ptt->trace_ctrl;
>>> + struct device *dev = &hisi_ptt->pdev->dev;
>>> + struct hisi_ptt_dma_buffer *buffer;
>>> + int i, ret;
>>> +
>>> + hisi_ptt->trace_ctrl.buf_index = 0;
>>> +
>>> + /* Make sure the trace buffer is empty before allocating */
>>
>> This comment is misleading as it suggests it not being empty is
>> a bad thing but the code handles it as an acceptable path.
>> Perhaps:
>> /*
>> * If the trace buffer has already been allocated, zero the
>> * memory.
>> */
>>
>
> will make it less misleading. thanks.
>
>>> + if (!list_empty(&ctrl->trace_buf)) {
>>> + list_for_each_entry(buffer, &ctrl->trace_buf, list)
>>> + memset(buffer->addr, 0, buffer->size);
>>> + return 0;
>>> + }
>>> +
>>> + for (i = 0; i < HISI_PTT_TRACE_BUF_CNT; ++i) {
>>> + buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
>>> + if (!buffer) {
>>> + ret = -ENOMEM;
>>> + goto err;
>>> + }
>>> +
>>> + buffer->addr = dma_alloc_coherent(dev, ctrl->buffer_size,
>>> + &buffer->dma, GFP_KERNEL);
>>> + if (!buffer->addr) {
>>> + kfree(buffer);
>>> + ret = -ENOMEM;
>>> + goto err;
>>> + }
>>> +
>>> + memset(buffer->addr, 0, buffer->size);
>> See:
>> https://lore.kernel.org/lkml/20190108130701.14161-4-hch@xxxxxx/
>> dma_alloc_coherent() always zeros the memory for us hence there
>> is no longer a dma_kzalloc_coherent()
>>
>
> thanks for the information. Then the memset here is redundant and will drop it.
>
>>> +
>>> + buffer->index = i;
>>
>> Carrying an index inside a list which corresponds directly
>> to the position in the list is not particularly nice.
>> Why can't we compute this index on the fly where the list
>> is walked? Or am I misunderstanding and the order of the buffers
>> is changed in a later patch?
>>
>
> The index is fixed once allocated and I stored it to avoid later
> computing. But seems it's highly recommended to compute these sort
> of things on the fly when necessary. John recommends the same things
> on some other places so I think I can get these addressed.
>
>> As a side note, is a list actually appropriate when we always
>> have 4 of these buffers? Feels like an array of buffer
>> structures might be cheaper.
>>

As suggested here and below, I tried to maintianed the buffers with
an array instead of a list and it looks more straightforward and some
fields of buffer structure can also be dropped. So I think I can change
to use an array.

Thanks for the suggestion!

Yicong

>>> + buffer->size = ctrl->buffer_size;
>>> + list_add_tail(&buffer->list, &ctrl->trace_buf);
>>> + }
>>> +
>>> + return 0;
>>> +err:
>>> + hisi_ptt_free_trace_buf(hisi_ptt);
>>> + return ret;
>>> +}
>>> +
>>> +static void hisi_ptt_trace_end(struct hisi_ptt *hisi_ptt)
>>> +{
>>> + writel(0, hisi_ptt->iobase + HISI_PTT_TRACE_CTRL);
>>> + hisi_ptt->trace_ctrl.status = HISI_PTT_TRACE_STATUS_OFF;
>>> +}
>>> +
>>> +static int hisi_ptt_trace_start(struct hisi_ptt *hisi_ptt)
>>> +{
>>> + struct hisi_ptt_trace_ctrl *ctrl = &hisi_ptt->trace_ctrl;
>>> + struct hisi_ptt_dma_buffer *cur;
>>> + u32 val;
>>> +
>>> + /* Check device idle before start trace */
>>> + if (hisi_ptt_wait_trace_hw_idle(hisi_ptt)) {
>>> + pci_err(hisi_ptt->pdev, "Failed to start trace, the device is still busy.\n");
>>> + return -EBUSY;
>>> + }
>>> +
>>> + /* Reset the DMA before start tracing */
>>> + val = readl(hisi_ptt->iobase + HISI_PTT_TRACE_CTRL);
>>> + val |= HISI_PTT_TRACE_CTRL_RST;
>>> + writel(val, hisi_ptt->iobase + HISI_PTT_TRACE_CTRL);
>>> +
>>> + /*
>>> + * We'll be in the perf context where preemption is disabled,
>>> + * so use busy loop here.
>>> + */
>>> + mdelay(HISI_PTT_RESET_WAIT_MS);
>>
>> Busy look for 1 second? Ouch. If we can reduce this in any way
>> that would be great or if there is a means to do it before
>> we disable preemption.
>>
>
> It's inherited from the previous version that was using msleep() and it's
> somehow unacceptable in an atomic context I think. The reset here is
> going to reset the write pointer of the hardware DMA so we can check the
> whether the pointer before dereset it. I confirmed with our hardware
> teams that it can be reduced to 10us. So I'll poll the write pointer register
> for about 10us before continue here.
>
> thanks for catching this!
>
>>> +
>>> + val = readl(hisi_ptt->iobase + HISI_PTT_TRACE_CTRL);
>>> + val &= ~HISI_PTT_TRACE_CTRL_RST;
>>> + writel(val, hisi_ptt->iobase + HISI_PTT_TRACE_CTRL);
>>> +
>>> + /* Clear the interrupt status */
>>> + writel(HISI_PTT_TRACE_INT_STAT_MASK, hisi_ptt->iobase + HISI_PTT_TRACE_INT_STAT);
>>> + writel(0, hisi_ptt->iobase + HISI_PTT_TRACE_INT_MASK);
>>> +
>>> + /* Configure the trace DMA buffer */
>>> + list_for_each_entry(cur, &ctrl->trace_buf, list) {
>>
>> I comment on the use of cur->index above. Here it would be easy to compute
>> the index as we go for example assuming we never end up with holes
>> in the list.
>>
>
> ok.
>
>>> + writel(lower_32_bits(cur->dma),
>>> + hisi_ptt->iobase + HISI_PTT_TRACE_ADDR_BASE_LO_0 +
>>> + cur->index * HISI_PTT_TRACE_ADDR_STRIDE);
>>> + writel(upper_32_bits(cur->dma),
>>> + hisi_ptt->iobase + HISI_PTT_TRACE_ADDR_BASE_HI_0 +
>>> + cur->index * HISI_PTT_TRACE_ADDR_STRIDE);
>>> + }
>>> + writel(ctrl->buffer_size, hisi_ptt->iobase + HISI_PTT_TRACE_ADDR_SIZE);
>>> +
>>> + /* Set the trace control register */
>>> + val = FIELD_PREP(HISI_PTT_TRACE_CTRL_TYPE_SEL, ctrl->type);
>>> + val |= FIELD_PREP(HISI_PTT_TRACE_CTRL_RXTX_SEL, ctrl->direction);
>>> + val |= FIELD_PREP(HISI_PTT_TRACE_CTRL_DATA_FORMAT, ctrl->format);
>>> + val |= FIELD_PREP(HISI_PTT_TRACE_CTRL_TARGET_SEL, hisi_ptt->trace_ctrl.filter);
>>> + if (!hisi_ptt->trace_ctrl.is_port)
>>> + val |= HISI_PTT_TRACE_CTRL_FILTER_MODE;
>>> +
>>> + /* Start the Trace */
>>> + val |= HISI_PTT_TRACE_CTRL_EN;
>>> + writel(val, hisi_ptt->iobase + HISI_PTT_TRACE_CTRL);
>>> +
>>> + ctrl->status = HISI_PTT_TRACE_STATUS_ON;
>>> +
>>> + return 0;
>>> +}
>>> +
>>
>> ...
>>
>>> +
>>> +static void hisi_ptt_init_ctrls(struct hisi_ptt *hisi_ptt)
>>> +{
>>> + struct pci_dev *pdev = hisi_ptt->pdev;
>>> + struct pci_bus *bus;
>>> + u32 reg;
>>> +
>>> + INIT_LIST_HEAD(&hisi_ptt->port_filters);
>>> + INIT_LIST_HEAD(&hisi_ptt->req_filters);
>>> +
>>> + /*
>>> + * The device range register provides the information about the
>>> + * root ports which the RCiEP can control and trace. The RCiEP
>>> + * and the root ports it support are on the same PCIe core, with
>>> + * same domain number but maybe different bus number. The device
>>> + * range register will tell us which root ports we can support,
>>> + * Bit[31:16] indicates the upper BDF numbers of the root port,
>>> + * while Bit[15:0] indicates the lower.
>>> + */
>>> + reg = readl(hisi_ptt->iobase + HISI_PTT_DEVICE_RANGE);
>>> + hisi_ptt->upper = reg >> 16;
>>> + hisi_ptt->lower = reg & 0xffff;
>> Trivial:
>> Perhaps worthing define HISI_PTT_DEVICE_RANGE_UPPER_MASK etc adn using
>> FIELD_GET?
>>
>
> sure.
>
>>> +
>>> + reg = readl(hisi_ptt->iobase + HISI_PTT_LOCATION);
>>> + hisi_ptt->core_id = FIELD_GET(HISI_PTT_CORE_ID, reg);
>>> + hisi_ptt->sicl_id = FIELD_GET(HISI_PTT_SICL_ID, reg);
>>> +
>>> + bus = pci_find_bus(pci_domain_nr(pdev->bus), PCI_BUS_NUM(hisi_ptt->upper));
>>> + if (bus)
>>> + pci_walk_bus(bus, hisi_ptt_init_filters, hisi_ptt);
>>> +
>>> + /* Initialize trace controls */
>>> + INIT_LIST_HEAD(&hisi_ptt->trace_ctrl.trace_buf);
>>> + hisi_ptt->trace_ctrl.buffer_size = HISI_PTT_TRACE_BUF_SIZE;
>>> + hisi_ptt->trace_ctrl.default_cpu = cpumask_first(cpumask_of_node(dev_to_node(&pdev->dev)));
>>> +}
>>> +
> [...]
>>> +
>>> +#define HISI_PCIE_CORE_PORT_ID(devfn) (PCI_FUNC(devfn) << 1)
>>> +
>>> +enum hisi_ptt_trace_status {
>>> + HISI_PTT_TRACE_STATUS_OFF = 0,
>>> + HISI_PTT_TRACE_STATUS_ON,
>>> +};
>>
>> Why not just use a boolean given we only have off and on states?
>>
>
> An enum may make the code more readable I think.
>
> Thanks,
> Yicong
>
> .
>