Re: [PATCH net-next v3 00/12] net: dsa: microchip: PTP support for KSZ956x
From: Vladimir Oltean
Date: Fri Nov 20 2020 - 20:26:18 EST
On Thu, Nov 19, 2020 at 06:51:15PM +0000, Tristram.Ha@xxxxxxxxxxxxx wrote:
> The initial proposal in tag_ksz.c is for the switch driver to provide callback functions
> to handle receiving and transmitting. Then each switch driver can be added to
> process the tail tag in its own driver and leave tag_ksz.c unchanged.
>
> It was rejected because of wanting to keep tag_ksz.c code and switch driver code
> separate and concern about performance.
>
> Now tag_ksz.c is filled with PTP code that is not relevant for other switches and will
> need to be changed again when another switch driver with PTP function is added.
>
> Can we implement that callback mechanism?
I, too, lack the context here. But it sounds like feedback that Andrew
would give.
If you don't like the #ifdef's, I am not in love with them either. But
maybe Christian is just optimizing too aggressively, and doesn't actually
need to put those #ifdef's there and provide stub implementations, but
could actually just leave the ksz9477_rcv_timestamp and ksz9477_xmit_timestamp
always compiled-in, and "dead at runtime" in the case there is no PTP.
If there is something else you don't like, what is it? If you know that
other KSZ switches don't implement timestamping in the same way, well,
we don't know that. I thought that it's generally up to the second
implementer to recognize which parts of the code are common and should
be reused, not for the first one to guess. I would not add function
pointers for a single implementation if they don't have a clear
justification.
> One issue with transmission with PTP enabled is that the tail tag needs to contain 4
> additional bytes. When the PTP function is off the bytes are not added. This should
> be monitored all the time.
>
> The extra 4 bytes are only used for 1-step Pdelay_Resp. It should contain the receive
> timestamp of previous Pdelay_Req with latency adjusted. The correction field in
> Pdelay_Resp should be zero. It may be a hardware bug to have wrong UDP checksum
> when the message is sent.
It "may" be a hardware bug? Are you unsure or polite?
As for the phrase "the correction field in Pdelay_Resp should be zero".
Consider the case where there is an E2E TC switch attached to that port.
It will update the correctionField of the Pdelay_Req message. Then the
application stack running on this ksz9477 switch is forced by the
standard to copy the correctionField as-is from the Pdelay_Req into the
Pdelay_Resp message. So that correctionField is never guaranteed to be
zero, even if Christian doesn't fiddle with it within the driver. Are
you saying that for proper UDP checksum calculation, the driver should
be forcing the correctionField to zero and moving that value into the
tail tag?
> I think the right implementation is for the driver to remember this receive timestamp
> of Pdelay_Req and puts it in the tail tag when it sees a 1-step Pdelay_Resp is sent.
I have mixed feelings about this. IIUC, you're saying "let's implement a
fixed-size FIFO of RX timestamps of Pdelay_Req messages, and let's match
them on TX to Pdelay_Resp messages, by {sequenceId, domainNumber}."
But how deep should we make that FIFO? I.e. how many Pdelay_Req messages
should we expect before the user space will inject back a Pdelay_Resp
for transmission?
Again, consider the case of an E2E TC attached to a ksz9477 port. Even
if we run peer delay, it's not guaranteed that we only have one peer.
That E2E TC might connect us to a plethora of other peers. And the more
peers we are connected to, the higher the chance that the size of this
Pdelay_Req RX timestamp FIFO will not be adequately chosen.
> There is one more requirement that is a little difficult to do. The calculated peer delay
> needs to be programmed in hardware register, but the regular PTP stack has no way to
> send that command. I think the driver has to do its own calculation by snooping on the
> Pdelay_Req/Pdelay_Resp/Pdelay_Resp_Follow_Up messages.
What register, and what does the switch do with this peer delay information?
> The receive and transmit latencies are different for different connected speed. So the
> driver needs to change them when the link changes. For that reason the PTP stack
> should not use its own latency values as generally the application does not care about
> the linked speed.
The thing is, ptp4l already has ingressLatency and egressLatency
settings, and I would not be surprised if those config options would get
extended to cover values at multiple link speeds.
In the general case, the ksz9477 MAC could be attached to any external
PHY, having its own propagation delay characteristics, or any number of
other things that cause clock domain crossings. I'm not sure how feasible
it is for the kernel to abstract this away completely, and adjust
timestamps automatically based on any and all combinations of MAC and
PHY. Maybe this is just wishful thinking.
Oh, and by the way, Christian, I'm not even sure if you aren't in fact
just beating around the bush with these tstamp_rx_latency_ns and
tstamp_tx_latency_ns values? I mean, the switch adds the latency value
to the timestamps. And you, from the driver, read the value of the
register, so you can subtract the value from the timestamp, to
compensate for its correction. So, all in all, there is no net latency
compensation seen by the outside world?! If that is the case, can't you
just set the latency registers to zero, do your compensation from the
application stack and call it a day?