I think you figured this out while I was sleeping, but just to confirm:
1. The MIPS64 ISA doc  talks about SYNC in a way that applies only
to memory accesses appearing in *program-order* before the SYNC
2. We need WRC+sync+addr to work, which means that the SYNC in P1 must
also capture the store in P0 as being "before" the barrier. Leonid
reckons it works, but his explanation  focussed on the address
dependency in P2 as to why this works. If that is the case (i.e.
address dependency provides global transitivity), then WRC+addr+addr
should also work (even though its not required).
3. It seems that WRC+addr+addr doesn't work, so I'm still suspicious
about WRC+sync+addr, because neither the architecture document or
Leonid's explanation tell me that it should be forbidden.
 http://lkml.kernel.org/r/569565DA.2010903@xxxxxxxxxx (scroll to the end)