Re: [PATCH] block/ps3: Fix slow VRAM IO

From: Akira Tsukamoto
Date: Thu Nov 12 2009 - 21:03:43 EST


Hello Andrew Morton,

Ping?

This patch is pretty important to improve the performance of PS3.
I really appreciate for your reply.

Thanks,

Akira

On Mon, 09 Nov 2009 15:40:42 +0900,
Akira Tsukamoto <akirat@xxxxxxxxxxxxxxxxxx> mentioned:
> Thank you for the review!
>
> > > The current PS3 VRAM driver uses msleep() to wait for completion
> > > of RSX DMA transfers between system memory and VRAM. Depending
> > > on the system timing, the processing delay and overhead of this
> > > msleep() call can significantly impact VRAM driver IO.
> > >
> > > To avoid the condition, add a short duration (200 usec max)
> > > udelay() polling loop before entering the msleep() polling
> > > loop.
> > >
> >
> > When raising a performance-based patch, please always try to include
> > before-and-after performance measurements in the changelog. People
> > want to know the magnitude of the improvement.
>
> No problem we will add the difference of improvement in the changelog.
> This is the results. Pretty impressive.
> Before
> Reading: 33MB/s
> Writing: 16MB/s
> After
> Reading: 370MB/s
> Writing: 238MB/s
>
> > > + if (!notify[3])
> > > + return 0;
> > > + udelay(10);
> > > + }
> >
> > You might as well do a udelay(1) here. The additional cost will be
> > negligible, and it will reduce latency.
>
> Are you mentioning adding udelay(1) in the between udelay polling
> and msleep polling? Or are you mentioning to change udelay(10) to udelay(1)
> inside the udelay polling?
>
> The former is no problem, but the later has impact on performance of PS3
> system.
> Because Cell/B.E.(consists of PPE and SPEs cores) and GPU are connected with
> ring bus called EIB and every issuing notify[3] to check VRAM-DMA results
> will generate data transfer to the bus.
> There are only one EIB bus in PS3 and other devices connected on the bus
> such as SPEs will be affected if the bus is occupied by many notify[3] and
> as a result it will decrease the over all system performance.
>
> The udelay(10) was the most reasonable distance not to overcrowd the bus
> and not to wait too long for checking DMA on VRAM.
> We have tried udelay(5) but did not improve the VRAM IO speed.
>
> > > + timeout = jiffies + msecs_to_jiffies(timeout_ms);
> >
> > The maximum latency is now timout_ms + 200usec.
> >
> > That's OK with the current constants, but if someone later changes a
> > constant, the error could become significant.
>
> Yes, I think so too. Probably reconstructing the design entirely based on
> usec instead of msec might be ideal but adding 200usec loops fixes the
> current slow VRAM driver, so I thought it is acceptable work around.
>
> > Perhaps that isn't worth bothering about though.
> >
> > > do {
> > > if (!notify[3])
>
> --
> Akira Tsukamoto
> Sony Computer Entertainment Inc.
> Architecture Lab.
> Japan
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Akira Tsukamoto
Sony Computer Entertainment Inc.
Architecture Lab.
Japan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/