Re: [RFC PATCH 3/3] fs: detect that the i_rwsem has already been taken exclusively
From: Linus Torvalds
Date: Thu Sep 28 2017 - 20:33:54 EST
On Thu, Sep 28, 2017 at 5:12 PM, Mimi Zohar <zohar@xxxxxxxxxxxxxxxxxx> wrote:
>
> Originally IMA did define it's own lock, prior to IMA-appraisal. IMA-
> appraisal introduced writing the file hash as an xattr, which required
> taking the i_mutex. process_measurement() and ima_file_free() took
> the iint->mutex first and then the i_mutex, while setxattr, chmod and
> chown took the locks in reverse order. To resolve the potential
> deadlock, the iint->mutex was eliminated.
Umm. You already have an explicit invalidation model, where you
invalidate after a write has occurred.
But the locking of the generation count (or "invalidation status" or
whatever) can - and should be - entirely independent of the locking of
the actual appraisal.
So make the appraisal itself use a semaphore ("only one appraisal at a time").
But use a separate lock for the generation count.
So then appraisal is:
- get appraisal semaphore
- get generation count lock
read generation count
- drop generation count lock
- do the actual appraisal
- drop appraisal semaphore
Note that you now have a tuple of "generation count, appraisal" that
you have *not* saved off yet, but it's your stable thing.
Now you can write the xattr:
- get exclusive inode lock (for xattr)
- get generation count lock
- if the appraisal generation does not match, do NOT write
the appraisal you just calculated, since it's pointless: it's already
stale.
- otherwise write the appraisal and generation count to the xattr
- drop generation count lock
- release exclusive inode lock
and then for anything that does setxattr or chmod or whatever, just
use that generation count lock to invalidate the appraisal. You don't
need to actual appraisal lock for that.
So now the appraisal lock is always the outermost one, and the
generation count lock is always the innermost.
Anyway, I haven't looked at the details of what IMA does, but
something like the above really sounds like it should work and seems
pretty straightforward.
No?
Linus