Re: [RFC PATCH V2 1/4] pci: APM X-Gene PCIe controller driver
From: Tanmay Inamdar
Date: Fri Jan 24 2014 - 16:28:31 EST
On Thu, Jan 16, 2014 at 5:10 PM, Tanmay Inamdar <tinamdar@xxxxxxx> wrote:
> On Wed, Jan 15, 2014 at 4:39 AM, Arnd Bergmann <arnd@xxxxxxxx> wrote:
>> On Wednesday 15 January 2014, Tanmay Inamdar wrote:
>>> This patch adds the AppliedMicro X-Gene SOC PCIe controller driver.
>>> X-Gene PCIe controller supports maxmum upto 8 lanes and GEN3 speed.
>>> X-Gene has maximum 5 PCIe ports supported.
>>>
>>> Signed-off-by: Tanmay Inamdar <tinamdar@xxxxxxx>
>>
>> This already looks much better than the first version, but I have a more
>> comments. Most importantly, it would help to know how the root ports
>> are structured. Is this a standard root complex and multiple ports,
>> multiple root complexes with one port each, or a nonstandard organization
>> that is a mix of those two models?
>
> This is multiple root complexes with one port each.
>
>>
>>> +
>>> +/* When the address bit [17:16] is 2'b01, the Configuration access will be
>>> + * treated as Type 1 and it will be forwarded to external PCIe device.
>>> + */
>>> +static void __iomem *xgene_pcie_get_cfg_base(struct pci_bus *bus)
>>> +{
>>> + struct xgene_pcie_port *port = xgene_pcie_bus_to_port(bus);
>>> + u64 addr = (u64)port->cfg_base;
>>> +
>>> + if (bus->number >= (port->first_busno + 1))
>>> + addr |= AXI_EP_CFG_ACCESS;
>>> +
>>> + return (void *)addr;
>>> +}
>>
>> Wrong type, it should be 'void __iomem *'. Also you can't assume that
>> bit operations work on virtual __iomem addresses, so it should be better
>> to just add a constant integer to the pointer, which is a valid
>> operation.
>
> ok.
>
>>
>> I also wonder why you need to do this at all. If there isn't a global
>> config space for all ports, but rather a separate type0/type1 config
>> cycle based on the bus number, I see that as an indication that the
>> ports are in fact separate domains and should each start with bus 0.
>
> It is not a standard ECAM layout. We also have a separate RTDID
> register as well to program bus, device, function. While accessing EP
> config space, we have to set the bit 17:16 as 2b'01. The same config
> space address is utilized for enabling a customized nonstandard PCIe
> DMA feature. The bits are defined to differentiate the access purpose.
> The feature is not supported in this driver yet.
>
> Secondly I don't think it will matter if each port starts with bus 0.
> As long as we set the correct BDF in RTDID and set correct bits in
> config address, the config reads and writes would work. Right?
>
>>
>>> +static void xgene_pcie_setup_lanes(struct xgene_pcie_port *port)
>>> +{
>>> + void *csr_base = port->csr_base;
>>> + u32 val;
>>> +
>>> + val = readl(csr_base + BRIDGE_8G_CFG_8);
>>> + val = eq_pre_cursor_lane0_set(val, 0x7);
>>> + val = eq_pre_cursor_lane1_set(val, 0x7);
>>> + writel(val, csr_base + BRIDGE_8G_CFG_8);
>>> +
>>> + val = readl(csr_base + BRIDGE_8G_CFG_9);
>>> + val = eq_pre_cursor_lane0_set(val, 0x7);
>>> + val = eq_pre_cursor_lane1_set(val, 0x7);
>>> + writel(val, csr_base + BRIDGE_8G_CFG_9);
>>> +
>>> + val = readl(csr_base + BRIDGE_8G_CFG_10);
>>> + val = eq_pre_cursor_lane0_set(val, 0x7);
>>> + val = eq_pre_cursor_lane1_set(val, 0x7);
>>> + writel(val, csr_base + BRIDGE_8G_CFG_10);
>>> +
>>> + val = readl(csr_base + BRIDGE_8G_CFG_11);
>>> + val = eq_pre_cursor_lane0_set(val, 0x7);
>>> + val = eq_pre_cursor_lane1_set(val, 0x7);
>>> + writel(val, csr_base + BRIDGE_8G_CFG_11);
>>> +
>>> + val = readl(csr_base + BRIDGE_8G_CFG_4);
>>> + val = (val & ~0x30) | (1 << 4);
>>> + writel(val, csr_base + BRIDGE_8G_CFG_4);
>>> +}
>>
>> Please document what you are actually setting here. If the configuration
>> of the lanes is always the same, why do you have to set it here. If not,
>> why do you set constant values?
>
> Good point. Let me check if these values should be constant or tune-able.
>
>>
>>> +static void xgene_pcie_setup_link(struct xgene_pcie_port *port)
>>> +{
>>> + void *csr_base = port->csr_base;
>>> + u32 val;
>>> +
>>> + val = readl(csr_base + BRIDGE_CFG_14);
>>> + val |= DIRECT_TO_8GTS_MASK;
>>> + val |= SUPPORT_5GTS_MASK;
>>> + val |= SUPPORT_8GTS_MASK;
>>> + val |= DIRECT_TO_5GTS_MASK;
>>> + writel(val, csr_base + BRIDGE_CFG_14);
>>> +
>>> + val = readl(csr_base + BRIDGE_CFG_14);
>>> + val &= ~ADVT_INFINITE_CREDITS;
>>> + writel(val, csr_base + BRIDGE_CFG_14);
>>> +
>>> + val = readl(csr_base + BRIDGE_8G_CFG_0);
>>> + val |= (val & ~0xf) | 7;
>>> + val |= (val & ~0xf00) | ((7 << 8) & 0xf00);
>>> + writel(val, csr_base + BRIDGE_8G_CFG_0);
>>> +
>>> + val = readl(csr_base + BRIDGE_8G_CFG_0);
>>> + val |= DWNSTRM_EQ_SKP_PHS_2_3;
>>> + writel(val, csr_base + BRIDGE_8G_CFG_0);
>>> +}
>>
>> Same here.
>>
>>> +static void xgene_pcie_program_core(void *csr_base)
>>> +{
>>> + u32 val;
>>> +
>>> + val = readl(csr_base + BRIDGE_CFG_0);
>>> + val |= AER_OPTIONAL_ERROR_EN;
>>> + writel(val, csr_base + BRIDGE_CFG_0);
>>> + writel(0x0, csr_base + INTXSTATUSMASK);
>>> + val = readl(csr_base + BRIDGE_CTRL_1);
>>> + val = (val & ~0xffff) | XGENE_PCIE_DEV_CTRL;
>>> + writel(val, csr_base + BRIDGE_CTRL_1);
>>> +}
>>
>> 'program_core'?
>
> Some of the PCIe core related misc configurations.
>
>>
>>> +static void xgene_pcie_poll_linkup(struct xgene_pcie_port *port, u32 *lanes)
>>> +{
>>> + void *csr_base = port->csr_base;
>>> + u32 val32;
>>> + u64 start_time, time;
>>> +
>>> + /*
>>> + * A component enters the LTSSM Detect state within
>>> + * 20ms of the end of fundamental core reset.
>>> + */
>>> + msleep(XGENE_LTSSM_DETECT_WAIT);
>>> + port->link_up = 0;
>>> + start_time = jiffies;
>>> + do {
>>> + val32 = readl(csr_base + PCIECORE_CTLANDSTATUS);
>>> + if (val32 & LINK_UP_MASK) {
>>> + port->link_up = 1;
>>> + port->link_speed = PIPE_PHY_RATE_RD(val32);
>>> + val32 = readl(csr_base + BRIDGE_STATUS_0);
>>> + *lanes = val32 >> 26;
>>> + }
>>> + time = jiffies_to_msecs(jiffies - start_time);
>>> + } while ((!port->link_up) || (time <= XGENE_LTSSM_L0_WAIT));
>>> +}
>>
>> Maybe another msleep() in the loop? It seems weird to first do an
>> unconditional sleep but then busy-wait for the result.
>
> ok.
This loop can execute for maximum 4 msec. So putting msleep(1) won't
get us much.
>
>>
>>> +static void xgene_pcie_setup_primary_bus(struct xgene_pcie_port *port,
>>> + u32 first_busno, u32 last_busno)
>>> +{
>>> + u32 val;
>>> + void *cfg_addr = port->cfg_base;
>>> +
>>> + val = readl(cfg_addr + PCI_PRIMARY_BUS);
>>> + val &= ~PCI_PRIMARY_BUS_MASK;
>>> + val |= (last_busno << 16) | ((first_busno + 1) << 8) | (first_busno);
>>> + writel(val, cfg_addr + PCI_PRIMARY_BUS);
>>> +}
>>
>> Please explain what you are doing here. As mentioned above, I would expect
>> that each domain has visibility of all 255 buses. You shouldn't need any hacks
>> where you try to artificially squeeze the ports into a single domain when
>> they are separate in hardware.
>
> ok. I will check and get back.
You are right. I have removed this hack. It will be fixed in next version.
>
>>
>>> +/*
>>> + * read configuration values from DTS
>>> + */
>>> +static int xgene_pcie_read_dts_config(struct xgene_pcie_port *port)
>>
>> The comment and function name don't seem to match what the function
>> does. The main purpose of this function seems to be to ioremap
>> the resources, which have nothing to with configuration.
>
> ok.
>
>>
>>> +{
>>> + struct device_node *np = port->node;
>>> + struct resource csr_res;
>>> + struct resource cfg_res;
>>> +
>>> + /* Get CSR space registers address */
>>> + if (of_address_to_resource(np, 0, &csr_res))
>>> + return -EINVAL;
>>> +
>>> + port->csr_base = devm_ioremap_nocache(port->dev, csr_res.start,
>>> + resource_size(&csr_res));
>>
>> You can also use platform_get_resource() to access the resource
>> that is already there, rather than creating another one.
>
> ok.
>
>>
>>> +static void xgene_pcie_setup_ob_reg(struct xgene_pcie_port *port,
>>> + u32 addr, u32 restype)
>>> +{
>>> + struct resource *res = NULL;
>>> + void *base = port->csr_base + addr;
>>> + resource_size_t size;
>>> + u64 cpu_addr = 0;
>>> + u64 pci_addr = 0;
>>> + u64 mask = 0;
>>> + u32 min_size = 0;
>>
>> A general note: don't initialize local variables to a bogus valus (e.g. 0)
>> in their declaration. It prevents the compiler from warning about
>> incorrect uses.
>
> ok.
>
>>
>>> + u32 flag = EN_REG;
>>
>> This one on the other hand is ok, because you are actually going to
>> use that value.
>>
>>> + switch (restype) {
>>> + case IORESOURCE_MEM:
>>> + res = &port->mem.res;
>>> + pci_addr = port->mem.pci_addr;
>>> + min_size = SZ_128M;
>>> + break;
>>> + case IORESOURCE_IO:
>>> + res = &port->io.res;
>>> + pci_addr = port->io.pci_addr;
>>> + min_size = 128;
>>> + flag |= OB_LO_IO;
>>> + break;
>>> + }
>>
>> I assume this works ok, but seems wrong in one detail: If the resource
>> is marked IORESOURCE_IO, res->start is supposed to be in I/O space, not
>> in memory space, which would make it the wrong number to program
>> into the hardware registers.
>
> Yes for using ioport resource. However we have decided to defer using
> it since 'pci_ioremap_io' is not yet supported from arm64 side.
>
> From HW point of view, for memory mapped IO space, it is nothing but a
> piece taken out of the ranges in address map for outbound accesses. So
> while configuring registers from SOC side, it should take the CPU
> address which is address from SOC address map. Right?
>
> Later on we can have a separate io resource like 'realio' similar to
> what pci-mvebu.c does.
>
>>
>>> +static int xgene_pcie_parse_map_ranges(struct xgene_pcie_port *port)
>>> +{
>>> + struct device_node *np = port->node;
>>> + struct of_pci_range range;
>>> + struct of_pci_range_parser parser;
>>> + struct device *dev = port->dev;
>>> +
>>> + if (of_pci_range_parser_init(&parser, np)) {
>>> + dev_err(dev, "missing ranges property\n");
>>> + return -EINVAL;
>>> + }
>>> +
>>> + /* Get the I/O, memory, config ranges from DT */
>>
>> The comment needs updating now that you don't read config space here any more.
>
> ok.
>
>>
>>> +/* X-Gene PCIe support maximum 3 inbound memory regions
>>> + * This function helps to select a region based on size of region
>>> + */
>>> +static int xgene_pcie_select_ib_reg(u64 size)
>>> +{
>>> + static u8 ib_reg_mask;
>>> +
>>> + if ((size > 4) && (size < SZ_16M) && !(ib_reg_mask & (1 << 1))) {
>>> + ib_reg_mask |= (1 << 1);
>>> + return 1;
>>> + }
>>> +
>>> + if ((size > SZ_1K) && (size < SZ_1T) && !(ib_reg_mask & (1 << 0))) {
>>> + ib_reg_mask |= (1 << 0);
>>> + return 0;
>>> + }
>>> +
>>> + if ((size > SZ_1M) && (size < SZ_1T) && !(ib_reg_mask & (1 << 2))) {
>>> + ib_reg_mask |= (1 << 2);
>>> + return 2;
>>> + }
>>> + return -EINVAL;
>>> +}
>>
>> Shouldn't the ib_reg_mask variable be per host bridge? Static variables
>> are dangerous if you ever get multiple instances of the hardware in one
>> system.
>
> Yes. You are right. Thanks.
>
>>
>>> +static int xgene_pcie_parse_map_dma_ranges(struct xgene_pcie_port *port)
>>> +{
>>> + struct device_node *np = port->node;
>>> + struct of_pci_range range;
>>> + struct of_pci_range_parser parser;
>>> + struct device *dev = port->dev;
>>> + int region;
>>> +
>>> + if (pci_dma_range_parser_init(&parser, np)) {
>>> + dev_err(dev, "missing dma-ranges property\n");
>>> + return -EINVAL;
>>> + }
>>> +
>>> + /* Get the dma-ranges from DT */
>>> + for_each_of_pci_range(&parser, &range) {
>>> + u64 restype = range.flags & IORESOURCE_TYPE_BITS;
>>> + u64 end = range.cpu_addr + range.size - 1;
>>> + dev_dbg(port->dev, "0x%08x 0x%016llx..0x%016llx -> 0x%016llx\n",
>>> + range.flags, range.cpu_addr, end, range.pci_addr);
>>> + region = xgene_pcie_select_ib_reg(range.size);
>>> + if (region == -EINVAL) {
>>> + dev_warn(port->dev, "invalid pcie dma-range config\n");
>>> + continue;
>>> + }
>>> + xgene_pcie_setup_ib_reg(port, &range, restype, region);
>>> + }
>>> + return 0;
>>> +}
>>
>> I guess is could even be a local variable in this function, which you pass
>> by reference.
>>
>>> +
>>> +static int xgene_pcie_setup(int nr, struct pci_sys_data *sys)
>>> +{
>>> + struct xgene_pcie_port *pp = xgene_pcie_sys_to_port(sys);
>>> +
>>> + if (pp == NULL)
>>> + return 0;
>>> +
>>> + sys->mem_offset = pp->mem.res.start - pp->mem.pci_addr;
>>> + pci_add_resource_offset(&sys->resources, &pp->mem.res,
>>> + sys->mem_offset);
>>> + return 1;
>>> +}
>>
>> Please follow the regular error handling conventions, which are to
>> pass a negative errno value on error and zero on success.
>
> ok.
>
>>
>> Also, what would be a reason for the port to be zero here? If
>> it's something that can't happen in practice, don't try to handle
>> it gracefully. You can use BUG_ON() for fatal conditions that
>> are supposed to be impossible to reach.
>
> This function is a hook to upper layer. We register nr_controllers in
> hw_pci structure as MAX_PORTS we support. It can happen that number of
> ports actually enabled from device tree are less than the number in
> nr_controllers.
>
>>
>>> +static struct pci_bus __init *xgene_pcie_scan_bus(int nr,
>>> + struct pci_sys_data *sys)
>>> +{
>>> + struct xgene_pcie_port *pp = xgene_pcie_sys_to_port(sys);
>>> +
>>> + pp->first_busno = sys->busnr;
>>> + xgene_pcie_setup_primary_bus(pp, sys->busnr, 0xff);
>>> + return pci_scan_root_bus(NULL, sys->busnr, &xgene_pcie_ops,
>>> + sys, &sys->resources);
>>> +}
>>
>> You have a number of helper functions that don't seem to gain
>> much at all. Just move the call to pci_scan_root_bus() into
>> xgene_pcie_setup_primary_bus() here, and use that as the .scan
>> callback.
>>
>
> I can do that if I can get rid of setup_primary_bus api. Let me check.
>
>>
>>> + if (!port->link_up)
>>> + dev_info(port->dev, "(rc) link down\n");
>>> + else
>>> + dev_info(port->dev, "(rc) x%d gen-%d link up\n",
>>> + lanes, port->link_speed + 1);
>>> +#ifdef CONFIG_PCI_DOMAINS
>>> + xgene_pcie_hw.domain++;
>>> +#endif
>>> + xgene_pcie_hw.private_data[index++] = port;
>>> + platform_set_drvdata(pdev, port);
>>> + return 0;
>>> +}
>>
>> Do you have multiple domains or not? I don't see how it can work if you
>> make the domain setup conditional on a kernel configuration option.
>> If you in fact have multiple domains, make sure in Kconfig that
>> CONFIG_PCI_DOMAINS is enabled. Otherwise don't mess with the domain
>> number...
>
> It is enabled in Kconfig.
>
>>
>>> +static const struct of_device_id xgene_pcie_match_table[] __initconst = {
>>> + {.compatible = "apm,xgene-pcie",},
>>> + {},
>>> +};
>>
>> Another general note: Your "compatible" strings are rather unspecific.
>> Do you have a version number for this IP block? I suppose that it's related
>> to one that has been used in other chips before, or will be used in future
>> chips, if it's not actually licensed from some other company.
>
> I will have to check this.
>
We have decided to stick with current compatible string for now.
>>
>>> +static int __init xgene_pcie_init(void)
>>> +{
>>> + void *private;
>>> + int ret;
>>> +
>>> + pr_info("X-Gene: PCIe driver\n");
>>> +
>>> + /* allocate private data to keep xgene_pcie_port information */
>>> + private = kzalloc((XGENE_PCIE_MAX_PORTS * sizeof(void *)), GFP_KERNEL);
>>
>> This should not be done unconditionally: There is no point in printing
>> a message or allocating memory if you don't actually run on a system
>> with this device.
>
> I am doing this here because I have one instance of hw_pci structure
> with multiple pcie controllers. I can't do it from probe since it will
> be called once per instance in device tree.
>
>>
>>> + if (private == NULL)
>>> + return -ENOMEM;
>>
>> Style: if you are testing for an object, just write 'if (private)' or
>> 'if (!private)', but don't compare against NULL.
>
> ok.
>
>>
>>> + xgene_pcie_hw.private_data = private;
>>> + ret = platform_driver_probe(&xgene_pcie_driver,
>>> + xgene_pcie_probe_bridge);
>>> + if (ret)
>>> + return ret;
>>> + pci_common_init(&xgene_pcie_hw);
>>> + return 0;
>>
>> This seems wrong: You should not use platform_driver_probe() because
>> that has issues with deferred probing.
>
> I think 'platform_driver_probe' prevents the deferred probing.
> 'pci_common_init' needs to be called only once with current driver
> structure. The probes for all pcie ports should be finished (ports
> initialized) before 'pci_common_init' gets called.
>
>>
>> Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/