Re: [tpmdd-devel] [RFC] tpm2-space: add handling for global session exhaustion

From: Jarkko Sakkinen
Date: Mon Jan 30 2017 - 16:58:58 EST


On Mon, Jan 30, 2017 at 08:04:55AM -0800, James Bottomley wrote:
> On Sun, 2017-01-29 at 19:52 -0500, Ken Goldman wrote:
> > On 1/27/2017 5:04 PM, James Bottomley wrote:
> >
> > > > Beware the nasty corner case:
> > > >
> > > > - Application asks for a session and gets 02000000
> > > >
> > > > - Time elapses and 02000000 gets forcibly flushed
> > > >
> > > > - Later, app comes back, asks for a second session and again gets
> > > > 02000000.
> > > >
> > > > - App gets very confused.
> > > >
> > > > May it be better to close the connection completely, which the
> > > > application can detect, than flush a session and give this corner
> > > > case?
> > >
> > > if I look at the code I've written, I don't know what the session
> > > number is, I just save sessionHandle in a variable for later use
> > > (lets say to v1). If I got the same session number returned at a
> > > later time and placed it in v2, all I'd notice is that an
> > > authorization using v1 would fail. I'm not averse to killing the
> > > entire connection but, assuming you have fallback, it might be
> > > kinder simply to ensure that the operations with the reclaimed
> > > session fail (which is what the code currently does).
> >
> > My worry is that this session failure cannot be detected by the
> > application. An HMAC failure could cause the app to tell a user that
> > they entered the wrong password. Misleading. On the TPM, it could
> > trigger the dictionary attack lockout. For a PIN index, it could
> > consume a failure count. Killing a policy session that has e.g., a
> > policy signed term could cause the application to go back to some
> > external entity for another authorization signature.
> >
> > Let's go up to the stack. What's the attack?
> >
> > If we're worried about many simultaneous applications (wouldn't that
> > be wonderful), why not just let startauthsession fail? The
> > application can just retry periodically.
>
> How in that scenario do we ensure that a session becomes available?
> Once that's established, there's no real difference between retrying
> the startauthsession in the kernel when we know the session is
> available and forcing userspace to do the retry except that the former
> has a far greater chance of success (and it's only about 6 lines of
> code).
>
> > Just allocate them in triples so there's no deadlock.
>
> Is this the application or the kernel? If it's the kernel, that adds a
> lot of complexity.
>
> > If we're worried about a DoS attack, killing a session just helps the
> > attacker. The attacker can create a few connections and spin on
> > startauthsession, locking everyone out anyway.
>
> There are two considerations here: firstly we'd need to introduce a
> mechanism to "kill" the connection. Probably we'd simply error every
> command on the space until it was closed. The second is which scenario
> is more reasonable: Say the application simply forgot to flush the
> session and will never use it again. Simply reclaiming the session
> would produce no effect at all on the application in this scenario.
> However, I have no data to say what's likely.
>
> > ~~
> >
> > Also, let's remember that this is a rare application. Sessions are
> > only needed for remote access (requiring encryption, HMAC or salt),
> > or policy sessions.
>
> This depends what your threat model is. For ssh keys, you worry that
> someone might be watching, so you use HMAC authority even for a local
> TPM. In the cloud, you don't quite know where the TPM is, so again
> you'd use HMAC sessions ... however, in both use cases the sessions
> should be very short lived.
>
> > ~~
> >
> > Should the code also reserve a session for the kernel? Mark it not
> > kill'able?
>
> At the moment, the kernel doesn't use sessions, so let's worry about
> that problem at the point it arises (if it ever arises).
>
> James

It does. My trusted keys implementation actually uses sessions.

I'm kind dilating to an opinion that we would leave this commit out from
the first kernel release that will contain the resource manager with
similar rationale as Jason gave me for whitelisting: get the basic stuff
in and once it is used with some workloads whitelisting and exhaustion
will take eventually the right form.

How would you feel about this?

/Jarkko