Re: NFS corruption, fixed by echo 1 > /proc/sys/vm/drop_caches -- next debugging steps?

From: Ralf Baechle
Date: Wed Mar 15 2017 - 05:52:32 EST


On Mon, Mar 13, 2017 at 09:47:57AM +0000, James Hogan wrote:

> >
> > Note that the corruption is different across reboots, both in the size
> > of the corruption and the location. I saw 1900~ and 1400~ byte
> > sequences corrupted on separate occasions, which don't correspond to
> > the system's 16kB page size.
> >
> > I've tested kernels from v3.19 to 4.11-rc1+ (master branch from
> > today). All exhibit this behavior with differing frequencies. Earlier
> > kernels seem to reproduce the issue less often, while more recent
> > kernels reliably exhibit the problem every boot.
> >
> > How can I further debug this?
>
> It smells a bit like a DMA / caching issue.
>
> Can you provide a full kernel log. That might provide some information
> about caching that might be relevant (e.g. does dcache have aliases?).

The architecture of the BCM1250 SOC used for the BCM91250 boards are
fully coherent, S-cache and D-cache are physically indexed and tagged.
Only the VIVT (plus the usual ASID tagging) I-cache leaves space for
software to screw up cache management but that shouldn't matter for this
case, so I suggest to start looking into this from the NFS side.

Ralf