... or maybe we could just unconditionally sync all buffers and let the
DMA API abstract this away. My concern is that on coherent architectures
we would still need to loop over all the pages for nothing, as I don't
think the loop (see e.g. nouveau_bo_sync_for_cpu in nouveau_bo.c) can be
optimized away by the compiler.