Re: [PATCH v8 0/8] Fork brute force attack mitigation

From: John Wood
Date: Fri Jun 11 2021 - 11:43:11 EST


On Wed, Jun 09, 2021 at 09:52:29AM -0700, Kees Cook wrote:
> On Tue, Jun 08, 2021 at 04:38:15PM -0700, Andi Kleen wrote:
> >
> > On 6/8/2021 4:19 PM, Kees Cook wrote:
> > > On Sat, Jun 05, 2021 at 05:03:57PM +0200, John Wood wrote:
> > > > [...]
> > > > the kselftest to avoid the detection ;) ). So, in this version, to track
> > > > all the statistical data (info related with application crashes), the
> > > > extended attributes feature for the executable files are used. The xattr is
> > > > also used to mark the executables as "not allowed" when an attack is
> > > > detected. Then, the execve system call rely on this flag to avoid following
> > > > executions of this file.
> > >
> > > I have some concerns about this being actually usable and not creating
> > > DoS situations. For example, let's say an attacker had found a hard-to-hit
> > > bug in "sudo", and starts brute forcing it. When the brute LSM notices,
> > > it'll make "sudo" unusable for the entire system, yes?
> > >
> > > And a reboot won't fix it, either, IIUC.
> > >
> > The whole point of the mitigation is to trade potential attacks against DOS.
> >
> > If you're worried about DOS the whole thing is not for you.
>
> Right, but there's no need to make a system unusable for everyone else.
> There's nothing here that relaxes the defense (i.e. stop spawning apache
> for 10 minutes). Writing it to disk with nothing that undoes it seems a
> bit too much. :)

Here I have merge the first reply.

> It seems like there is a need to track "user" running "prog", and have
> that be timed out. Are there use-cases here where that wouldn't be
> sufficient?

Ok, what do you think of the following proposal:

Add an uid_t field to the structure saved in the xattr. So this struct
contains now

faults: Number of crashes.
nsecs: Last crash timestamp as the number of nanoseconds in the
International Atomic Time (TAI) reference.
period: Crash period's moving average.
flags: Statistics flags.
uid: User id not allowed to run the executable.

The logic would be the following:

1. faults, nsecs and period are updated in every crash and is a common info
for all the users.
2. If the max number of faults is reached, it is "not allowed" to run the
executable by any user. This condition blocks the file until root clear
the xattr. No timeout.
3. When an attack is detected the uid of the user that is running the app
is saved in the xattr and the executable is marked as "not allowed" to
run by this user. The "not allowed" state has a timeout (more below).
4. When someone tries to run the executable, if his uid is different from
the uid saved in the xattr, then the operation is "allowed".
5. When someone tries to run the executable, if his uid is equal to the
uid saved in the xattr, then the operation is "not allowed". This user
is banned for a timeout.
6. When someone tries to run the executable and the timeout has expired,
the operation is "allowed" and the saved uid is removed.
7. If the executable crashes again when it is run by a user different from
the one saved in the xattr (and the timeout has no expired), the file
is marked as "not allowed" to run by any user. All users are banned for
a timeout.

The timeout: I think there are two options here.

1. A fixed timeout set by a sysctl attribute.
2. A dynamic timeout calculated from the info stored in the xattr. The
timeout would be the needed period to guarantee that when the app is
run again and it crashes, the attack detection will not be triggered.
To be more clear I expose the formulas:

Mathematically the application crash period's EMA can be expressed as
follows:

period_ema[i] = period[i] * weight + period_ema[i - 1] * (1 - weight)

If we isolate period:

period[i] = (period_ema[i] - period_ema[i - 1] * (1 - weight)) / weight

Where period_ema[i] is the "crash_period_threshold", period_ema[i - 1]
is the last period ema saved in the xattr and period[i] is the dynamic
timeout.

As a final point. Possibly there are more cases but the logic would be the
one explained. I think that it is not necessary to save the uid for every
user that crashes the app nor the crashes info for every user. If more
than one user crashes the application, something "bad" is happening. So,
all users are banned for a timeout. This way the info saved in the xattr
has a fixed size and we prevent an attacker from abusing this size.

I hope this proposal can be enough. What do you think?

John Wood.