Paul,
I think you figured this out while I was sleeping, but just to confirm:
1. The MIPS64 ISA doc [1] talks about SYNC in a way that applies only
to memory accesses appearing in *program-order* before the SYNC
2. We need WRC+sync+addr to work, which means that the SYNC in P1 must
also capture the store in P0 as being "before" the barrier. Leonid
reckons it works, but his explanation [2] focussed on the address
dependency in P2 as to why this works. If that is the case (i.e.
address dependency provides global transitivity), then WRC+addr+addr
should also work (even though its not required).
3. It seems that WRC+addr+addr doesn't work, so I'm still suspicious
about WRC+sync+addr, because neither the architecture document or
Leonid's explanation tell me that it should be forbidden.
Will
[1] https://imgtec.com/?do-download=4302
[2] http://lkml.kernel.org/r/569565DA.2010903@xxxxxxxxxx (scroll to the end)