Re: [PATCH v1 tty] 8250: microchip: pci1xxxx: Refactor TX Burst code to use pre-existing APIs
From: Rengarajan.S
Date: Mon Mar 04 2024 - 23:16:37 EST
Hi Jiri,
On Mon, 2024-03-04 at 07:19 +0100, Jiri Slaby wrote:
> [Some people who received this message don't often get email from
> jirislaby@xxxxxxxxxx. Learn why this is important at
> https://aka.ms/LearnAboutSenderIdentification ;]
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you
> know the content is safe
>
> On 04. 03. 24, 5:37, Rengarajan.S@xxxxxxxxxxxxx wrote:
> > Hi Jiri,
> >
> > On Fri, 2024-02-23 at 10:26 +0100, Jiri Slaby wrote:
> > > EXTERNAL EMAIL: Do not click links or open attachments unless you
> > > know the content is safe
> > >
> > > On 23. 02. 24, 10:21, Rengarajan.S@xxxxxxxxxxxxx wrote:
> > > > On Fri, 2024-02-23 at 07:08 +0100, Jiri Slaby wrote:
> > > > > EXTERNAL EMAIL: Do not click links or open attachments unless
> > > > > you
> > > > > know the content is safe
> > > > >
> > > > > On 22. 02. 24, 14:49, Rengarajan S wrote:
> > > > > > Updated the TX Burst implementation by changing the
> > > > > > circular
> > > > > > buffer
> > > > > > processing with the pre-existing APIs in kernel. Also
> > > > > > updated
> > > > > > conditional
> > > > > > statements and alignment issues for better readability.
> > > > >
> > > > > Hi,
> > > > >
> > > > > so why are you keeping the nested double loop?
> > > > >
> > > >
> > > > Hi, in order to differentiate Burst mode handling with byte
> > > > mode
> > > > had
> > > > seperate loops for both. Since, having single while loop also
> > > > does
> > > > not
> > > > align with rx implementation (where we have seperate handling
> > > > for
> > > > burst
> > > > and byte) have retained the double loop.
> > >
> > > So obviously, align RX to a single loop if possible. The current
> > > TX
> > > code
> > > is very hard to follow and sort of unmaintainable (and buggy).
> > > And
> > > IMO
> > > it's unnecessary as I proposed [1]. And even if RX cannot be one
> > > loop,
> > > you still can make TX easy to read as the two need not be the
> > > same.
> > >
> > > [1]
> > > https://lore.kernel.org/all/b8325c3f-bf5b-4c55-8dce-ef395edce251@xxxxxxxxxx/
> >
> >
> > while (data_empty_count) {
> > cnt = CIRC_CNT_TO_END();
> > if (!cnt)
> > break;
> > if (cnt < UART_BURST_SIZE || (tail & 3)) { // is_unaligned()
> > writeb();
> > cnt = 1;
> > } else {
> > writel()
> > cnt = UART_BURST_SIZE;
> > }
> > uart_xmit_advance(cnt);
> > data_empty_count -= cnt;
> > }
> >
> > With the above implementation we are observing performance drop of
> > 2
> > Mbps at baud rate of 4 Mbps. The reason for this is the fact that
> > for
> > each iteration we are checking if the the data need to be processed
> > via
> > DWORDs or Bytes. The condition check for each iteration is causing
> > the
> > drop in performance.
>
> Hi,
>
> the check is by several orders of magnitude faster than the I/O
> proper.
> So I don't think that's the root cause.
>
> > With the previous implementation(with nested loops) the performance
> > is
> > found to be around 4 Mbps at baud rate of 4 Mbps. In that
> > implementation we handle sending DWORDs continuosly until the
> > transfer
> > size < 4. Can you let us know any other alternatives for the above
> > performance drop.
>
> Could you attach the patch you are testing?
Please find the updated pci1xxxx_process_write_data
u32 xfer_cnt;
while (*valid_byte_count) {
xfer_cnt = CIRC_CNT_TO_END(xmit->head, xmit->tail,
UART_XMIT_SIZE);
if (!xfer_cnt)
break;
if (xfer_cnt < UART_BURST_SIZE || (xmit->tail & 3)) {
writeb(xmit->buf[xmit->tail], port->membase +
UART_TX_BYTE_FIFO);
xfer_cnt = UART_BYTE_SIZE;
} else {
writel(*(u32 *)&xmit->buf[xmit->tail],
port->membase + UART_TX_BURST_FIFO);
xfer_cnt = UART_BURST_SIZE;
}
uart_xmit_advance(port, xfer_cnt);
*data_empty_count -= xfer_cnt;
*valid_byte_count -= xfer_cnt;
}
Testing is done via minicom by transferring a 10 MB file at 4 Mbps,
After the minicom transfer with single instance:
Previous implementation(Nested While Loops):
Transferred 10 MB at 3900000 CPS
Current implementation:
Transferred 10 MB at 2459999 CPS
Thanks,
Rengarajan S
>
> thanks,
> --
> js
> suse labs
>