you check can happen 1 nanosecond before it sets s_active, after that the code goes into prune_dentry(), while deactivate_super() successfully sets s_active and starts umount main job. Nothing prevents the race... :(<<<< and here, when you drop sb_lock, and dentry->d_lock/dcache_lock in prune_dentry() it looks to me that we have exactly the same situation as it was without your patch:
<<<< another CPU can start umount in parallel.
<<<< maybe sb_lock barrier helps this somehow, but I can't see how yet...
From the unmount path, __mntput() is called. It sets s_active to 0 indeactivate_super(), hence our check would prevent us from pruning a dentry
that is a part of a super block that is going to go away soon. The idea
is to let the unmount do all the work here, the allocator can concentrate
on other dentries.