Low latency patches made no difference. Tried it with 2.6.30 and it now works. There are a couple of commits contributing to the fix, including one introduced between 2.6.29-rc8 and 2.6.29 proper in powerpc/kernel/head_32.S (couple of commits with the name "Fix Respect _PAGE_COHERENT on classic ppc32 SW TLB load machines"). I've tried backporting this to 2.6.29-rc8 and it then worked. Backporting to 2.6.26 made no difference however, so I suspect there are other things fixed which are also contributing.You could enable CONFIG_NOT_COHERENT_CACHE.I've just tried this (I had to edit Kconfig in power/platforms to make the build system accept it), and interestingly it's making no difference. I'm using streaming mappings, and are using the pci_map_sg functions to ensure the memory is mapped/flushed correctly. I've also explicitly put in a pci_dma_sync_sg_for_device, however that's also not made any difference. Turning the cpu cache snoop off has the same affect as it did without CONFIG_NOT_COHERENT_CACHE; it gets much worse. Any other ideas?
Will back off the low latency patches next, and give 2.6.30 a try - see if that makes any difference.