Re: [RFC] Buggy e100.c on ARM / DMA sync interfaces

From: James Bottomley
Date: Wed Jun 30 2004 - 08:18:50 EST


> This can't work - and I suspect anyone using the *dma_sync* functions
> will be in for the same problem. Why?
>
> Cache lines. They have a defined size and are not merely a single byte
> or a single word. If you modify even one single bit, you stand the
> chance of writing back the whole cache line, possibly overwriting data
> which the device has updated since the cache line was read.

Hang on, the PCI (and DMA) API explicitly states that to call sync on a
dma mapped area, you must do it for the *whole* of the area. The API's
also transfer ownership of the area (i.e. only the device *or* the
driver may touch it depending on who has the ownership). This is
designedly to get us out of cache line interference problems.

So the particular problem here is an incorrect *abuse* of the API.
There is a section in the black magic part of the DMA API that allows
you partially to sync a mapped area (dma_sync_single_range). However,
it's in the experts only section and it explains that if you do this,
you're responsible for preventing cache line interference and provides
another API (dma_get_cache_alignment) explicitly so the driver knows
what the cache line boundaries are.

> Therefore, if you're going to use the dma_sync functions to modify data
> owned by the remote device, you _must_ stop the remote device accessing
> the surrounding data _before_ touching it.

Precisely, hence the ownership concept.

> With the above code on ARM, it effectively means that we will read the
> whole struct rfd and some other data into cache, modify the command
> field, and then write _at least_ the whole struct rfd back out, all
> with the chip's DMA still running.
>
> _That_ can't be good.
>
> Note - I'm not saying that this is the cause of the above problem, but
> that this is something I have spotted while reading through the driver
> to ascertain why it possibly could not be working.
>
> Comments?

Clearly the e100 needs fixing. Either by migrating it to the expert
API. Or, in this case, could we not just pad the structure with
appropriate L1_CACHE_ALIGNMENT tags?

James


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/