Re: NFS/d_splice_alias breakage

From: Oleg Drokin
Date: Thu Jun 02 2016 - 20:54:32 EST



On Jun 2, 2016, at 8:44 PM, Trond Myklebust wrote:

> That would have to be a really tight race, since the code in _nfs4_open_and_get_state() currently reads:
>
> d_drop(dentry);
> alias = d_exact_alias(dentry, state->inode);
> if (!alias)
> alias = d_splice_alias(igrab(state->inode), dentry);

We live in reality where there are no more tight races left.
1 instruction-race is now huge due to commonplace cpu-overcommitted VMs (and hyperthreading), nobody knows
when is your VM cpu going to be preempted by the host (while the other
cpus of the same VM are running ahead full steam).
Stuff that took ages to trigger again to better understand is instant now.

> IOW: something would have to be acting between the d_drop() and d_splice_alias() above…

The other CPU ;)
I checked the readdirplus theory and that's not it, there's some sort of another lookup going, this dentry we crashed on never came through nfs_prime_dcache.

> Al, I’ve been distracted by personal matters in the past 6 months. What is there to guarantee exclusion of the readdirplus dentry instantiation and the NFSv4 atomic open in the brave new world of VFS, June 2016 vintage?

Note this is not a new 2016 vintage bug. I hit this in 3.10 as well (baseline kernel from one well known enterprise vendor, only there we hit it in d_realise_unique).