Re: Compat 32-bit syscall entry from 64-bit task!? [was: Re:[RFC,PATCH 1/2] seccomp_filters: system call filtering using BPF]
From: Indan Zupancic
Date: Thu Jan 19 2012 - 06:35:07 EST
On Thu, January 19, 2012 09:16, Chris Evans wrote:
> On Wed, Jan 18, 2012 at 4:14 PM, Indan Zupancic <indan@xxxxxx> wrote:
>> On Wed, January 18, 2012 22:13, Chris Evans wrote:
>>> On Wed, Jan 18, 2012 at 4:12 AM, Indan Zupancic <indan@xxxxxx> wrote:
>>>> On Wed, January 18, 2012 06:43, Chris Evans wrote:
>>>>> 2) Tracee traps
>>>>> 2b) Tracee could take a SIGKILL here
>>>>> 3) Tracer looks at registers; bad syscall
>>>>> 3b) Or tracee could take a SIGKILL here
>>>>> 4) The only way to stop the bad syscall from executing is to rewrite
>>>>> orig_eax (PTRACE_CONT + SIGKILL only kills the process after the
>>>>> syscall has finished)
>>>>
>>>> Yes, we rewrite it to -1.
>>>>
>>>>> 5) Disaster: the tracee took a SIGKILL so any attempt to address it by
>>>>> pid (such as PTRACE_SETREGS) fails.
>>>>
>>>> I assume that if a task can execute system calls and we get ptrace events
>>>> for that, that we can do other ptrace operations too. Are you saying that
>>>> the kernel has this ptrace gap between SIGKILL and task exit where ptrace
>>>> doesn't work but the task continues executing system calls? That would be
>>>> a huge bug, but it seems very unlikely too, as the task is stopped and
>>>> shouldn't be able to disappear till it is continued by the tracer.
>>>>
>>>> I mean, really? That would be stupid.
>>
>> Okay, I tested this scenario and you're right, we're screwed.
>>
>> What the hell guys?
>
> Steady on :) ptrace() has never been sold as a technology upon which
> its safe to build security solutions.
Well, that can be said of pretty much all kernel functionality.
That is no excuse for crazy behaviour.
I more or less fixed it by turning all SIGKILLs into SIGTERMs.
Perhaps I should use a more obscure signal instead.
>> What about other PID checks in the kernel, are they still
>> safe if the process looks dead but is still active? Or is it a ptrace-only
>> problem?
>>
>>>> If true we have to work around it by disallowing SIGKILL and just sending
>>>> them ourselves within the jail. Meh.
>>
>> I guess this helps a bit. It doesn't prevent external signals, but prisoners
>> don't have control over that.
>
> Well.... a prisoner may be able to play other tricks:
> - Allocate lots of memory... kernel may start spraying around SIGKILLs
> - Sending SIGKILL via prctl()
prctl is disallowed within our jail. Did you had PR_SET_PDEATHSIG in mind?
But doesn't the tracer become the parent when ptracing or not for this?
Or were you thinking about enabling SECCOMP and counting on the SIGKILL
being process-wide instead of thread-specific?
> - Sending SIGKILL via fcntl()
I haven't written the fcntl demultiplexor yet, but I missed fcntl could
be used for sending signals. I knew there was whacky stuff in there, but
didn't expect it to be that bad. Thanks.
> - Sending SIGKILL via clone()
How? And can you send it to another process than yourself?
>
>>
>> Is this SIGKILL specific or is it true for all task ending signals?
>
> Can't remember - try it?
Tried: It's safe with SIGTERM, so I assume the others are fine too.
I'll double check though...
>>
>>>> How will you avoid file path races with BPF?
>>>
>>> There is typically no need for file-path based access control in an FTP server.
>>> Take for example anonymous FTP, which will typically be inside a
>>> chroot() to /var/ftp. Inside that filesystem tree -- if you can open()
>>> it, you can have it.
>>
>> Ah, you count on having root access. We don't.
>>
>> Do you know any more crazy security destroying holes?
>
> Try spraying SIGCONT and / or SIGSTOP at tracees. It may be possible
> to confuse the tracer about whether a SIGTRAP event is syscall entry
> or exit.
Yes, heard about that weirdness before, but it's all ignored. We're
using PTRACE_O_TRACESYSGOOD.
> Try doing an execve() that fails. May cause similar state confusion in
> the tracer.
Our jailer pretty much ignores all signals and only handles syscalls
and task exits. We actually check execve's return value to know if we
have to do our stuff or not.
Greetings,
Indan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/