Re: pc300too on a modern kernel?

From: Krzysztof Halasa
Date: Fri Nov 19 2010 - 17:05:42 EST


Bernie Innocenti <bernie@xxxxxxxxxxx> writes:

>> Also... it's rather improbable, but I'd look at the SCA-II chip. There
>> were certain chips with a hardware bug which could cause such problems.
>> Chips with Hitachi logo and "R" letter after the lot code were ok, and
>> all later chips made by Renesas (either missing any logo or with
>> Renesas' - no "R" letter there) were ok.
>>
>> The faulty chips were marked with Hitachi logo and were missing the "R"
>> letter after the lot code. I think Hitachi fixed it in 1999 or so.
>> I'm not sure if this bug could manifest itself when only one SCA channel
>> was in use. The app note doesn't say a word about it, but I think I only
>> experienced the problem (with an older card, not PC300) when both
>> channels were simultaneously in use.
>
> Looks like we've hit this bug! Here's a photo of the board to confirm
> it's the bogus chip:
>
> http://people.sugarlabs.org/bernie/pc300too-photo.jpg

It's weird. Hitachi's app note TN-PSC-337B/E dated Dec 10, 1998 shows
example lot codes for unfixed chips - "8A3" and fixed - "9A3 R". I don't
really remember the details, but I think the first digit is year (+1990)
and the last digit is quarter#. 0M1 would mean Q1 2000. I personally
have (different) cards with chips marked "9H1 R" and "0C1 R". I remember
a prototype card with something like 7** lot code (faulty, without the
"R") though I can't look up the code anymore. I'd never expect a faulty
chip dated 2000.

BTW their (now Renesas) errata is at
http://www.renesas.eu/products/assp/for_information_and_communication_equipment/com_control/Technical_Update.jsp
(I have the datasheet / prog manual as well). TN-PSC-337B/E seems to
indicate that the bug is present in chips made till March 31, 1999.

Your card has "SFL33" chip while my cards are "FL33". I have a card with
"SFL33" but it's dated 2005 and it's a newer chip, missing the "R" and
Hitachi logo because of Hitachi -> Renesas transition. I don't know what
"S" means. The datasheet (1998) only lists "FL33" = 25 Mb/s max transfer
rate and "AFL33" is 30 Mb/s.

> [ 59.175900] bernie: stat=0x80, desc_address=ffffc900111003a8, port->chan=0
> [ 59.176639] bernie: cp=3b4, bp=1ef18, len=56, unused=12
> [ 67.159314] bernie: stat=0x80, desc_address=ffffc90011100390, port->chan=0
> [ 67.163214] bernie: cp=39c, bp=1e298, len=56, unused=12
> [ 68.425601] bernie: stat=0x80, desc_address=ffffc90011100390, port->chan=0
> [ 68.426123] bernie: cp=39c, bp=1e298, len=77, unused=12
> [ 70.312068] bernie: stat=0x80, desc_address=ffffc900111003b4, port->chan=0
> [ 70.314393] bernie: cp=3c0, bp=1f558, len=1504, unused=12
>
> So it seems that sometimes the controller doesn't always clear the EOM
> (0x80) status bit after transmitting a frame. Size and contents of the
> packet doesn't seem to matter We're using a single T1 channel.

Actually, the SCA-II never clears EOM. sca_tx_done() does, after it sees
the "ownership" bit set by SCA-II. Then it does netif_wake_queue().

It seems it happens this way:
- sca_xmit() fills the whole ring (leaving one descriptor empty as
designed - for EDA to work)
- the chip transmits something and signals IRQ->sca_tx_done()
- sca_tx_done can't see any descriptor processed and only wakes the
queue. Perhaps we should only wake the queue if at least one
descriptor has been processed - though sca_tx_done() should never be
called otherwise.
- sca_xmit is called again with full ring, thus BUG().

I wonder if the following helps (untested):

--- a/drivers/net/wan/hd64572.c
+++ b/drivers/net/wan/hd64572.c
@@ -293,6 +293,7 @@ static inline void sca_tx_done(port_t *port)
struct net_device *dev = port->netdev;
card_t* card = port->card;
u8 stat;
+ int wake = 0;

spin_lock(&port->lock);

@@ -316,10 +317,12 @@ static inline void sca_tx_done(port_t *port)
dev->stats.tx_bytes += readw(&desc->len);
}
writeb(0, &desc->stat); /* Free descriptor */
+ wake = 1;
port->txlast = (port->txlast + 1) % card->tx_ring_buffers;
}

- netif_wake_queue(dev);
+ if (wake)
+ netif_wake_queue(dev);
spin_unlock(&port->lock);
}


Perhaps the chip sets the bit in ISR0 register before ST_TX_OWNRSHP is
written to device RAM. With this patch sca_tx_done() should be called
again shortly, in the worst case after the next packed is transmitted.

> +++ linux-2.6.36/drivers/net/wan/hd64572.c 2010-11-12 20:48:03.000000000 -0500
> @@ -567,11 +567,20 @@ static netdev_tx_t sca_xmit(struct sk_bu
> card_t *card = port->card;
> pkt_desc __iomem *desc;
> u32 buff, len;
> + uint8_t stat;
>
> spin_lock_irq(&port->lock);
>
> desc = desc_address(port, port->txin + 1, 1);
> - BUG_ON(readb(&desc->stat)); /* previous xmit should stop queue */
> +
> + //BUG_ON(readb(&desc->stat)); /* previous xmit should stop queue */
> + stat = readb(&desc->stat); /* previous xmit should stop queue */
> + if (stat) {
> + printk(KERN_EMERG "bernie: stat=0x%02x, desc_address=%p, port->chan=%d\n", stat, desc, port->chan);
> + printk(KERN_EMERG "bernie: cp=%x, bp=%x, len=%d, unused=%x\n", readw(&desc->cp), readl(&desc->bp), readw(&desc->len), readb(&desc->unused));
> + printk(KERN_EMERG "bernie: %s TX(%i):", dev->name, skb->len);
> + debug_frame(skb);
> + }
>
> #ifdef DEBUG_PKT
> printk(KERN_DEBUG "%s TX(%i):", dev->name, skb->len);

This could send corrupted data, we don't want to overwrite buffers being
transmitted (or queued for TX).

Anyway, I think it has nothing to do with the "non-R" bug. That one
corrupts CDA register rendering any ring operation impossible and
probably corrupting system RAM (my experience is a single channel with
up to 2 Mb/s doesn't trigger it, two channels trigger it several times
a day). IOW, trying to use two channels with buggy chip is pointless.
OTOH I'm not sure your chip is buggy, perhaps SFL33 were always fixed
and thus not marked with "R"?
--
Krzysztof Halasa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/