I think that FUA was designed for a different use case than what Linux
is using barriers for currently. The advantage with FUA is when you have
"before barrier", "after barrier" and "don't care" sets, where only the
specific things you care about ordering are in the before/after barrier
sets. Then you can do this:
Issue all before barrier requests with FUA bit set
Wait for all those to complete
Issue all after barrier requests with FUA bit set
Wait for all those to complete
A couple of issues with this would be in how to support our current semantics of fsync(). Today, the flush behavior of the barrier/fsync combination means that applications can have a hard promise of data on platter for any file after a successful fsync command.
If I understand correctly, to get a similar semantic from a pure FUA implementation would require us to tag all file IO as FUA.
I suspect that this would actually be less efficient since it would not allow the drives to reorder IO's up to the point that we actually care (fsync time).
The other big user of barriers is the internal transaction of journaled file systems. It would seem that we would need to tag each write from the journal with a FUA IO as well. Again, we might actually go more slowly in some cases as you mention below.
The limited queue depth of NCQ would seem to make it much harder to have a win in this case...