Re: [PATCH 1/1] ptrace: Get tracer PID without reliance on the proc FS

From: Eric W. Biederman
Date: Mon Sep 09 2024 - 12:46:30 EST


Roman Kisel <romank@xxxxxxxxxxxxxxxxxxx> writes:

> On 9/6/2024 1:26 PM, Linus Torvalds wrote:
>> On Fri, 6 Sept 2024 at 13:08, Roman Kisel <romank@xxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> When the process has run into a fatal error and is about to exit, having
>>> a way to break into the debugger at this exact moment wouldn't change
>>> anything about the logic of the data processing happening in the process.
>>> What's so horrible in that to have a way to land in the debugger to see
>>> what exactly is going on?
>> I don't buy it.
>> If you want to debug some fatal behavior, and a debugger *isn't*
>> attached, you want it to create a core-dump.
>> And if a debugger *is* attached, it will catch that.
>> This is basically how abort() has always worked, and it very much is
>> *not* doing some "let's check if we're being debugged" stuff. Exactly
>> because that would be a bad idea and an anti-pattern.
>> The other very traditional model - for example if you do *not* want to
>> do core-dumps for some reason, and just exit with an error message -
>> is to just put a breakpoint on the "fatal()" function (or whatever
>> your "fatal error happened" situation is) using the debugger.
>> Then the target will behave differently depending on whether it is
>> being debugged or not BECAUSE THE DEBUGGER ASKED FOR IT, not because
>> the program decided to act differently when debugged.
>> In other words, this is a long-solved problem with solid solutions
>> from before Linux even existed.
>
>
> Writing a core-dump might not be an option as you have pointed out
> hence the "fatal()" function might not be permitted to fault.

For that you just need to set core file size rlimit to 0.
Then you can safely raise SIGABRT to terminate your process.

A debugger can also stop at PTRACE_EVENT_EXIT. The process
is still available (not cleaned up), but already fatally dead.

So I think the scenario of a process exiting is safely handled.

So if that is the case you care about I would say please look at
PTRACE_EVENT_EXIT.

> Breaking into the debugger if it is attached saves a step of setting
> a breakpoint and doesn't require the knowledge of the guts of the
> standard library and/or the journey of the trap exception from the
> CPU to the debugger. The very name of the "fatal()" function is a
> tight contract, and something akin to the onion in the varnish for
> the uninitiated.
>
> Libraries like Google Breakpad, Boost.Test, C++ std, AWS C SDK,
> Unreal Engine have that "breakpoint_if_debugging" facility, folks
> find that useful. They all read and parse the "/proc/self/status" file,
> where as it may seem, just the ptrace syscall one liner could save
> the trouble of that. The kernel helps the user space do work as I
> understand, why police it? There is "fork()", and threads can deadlock.
> Quite horrible, still the user space has access to that. Here,
> there is evidence folks do want to detect if a debugger is present,
> there is evidence how the kernel can help the user space compute that
> with so much less effort, the patch is trivial. Why don't let the
> userspace burn less electricity?

If you want more than just stopping at when a process exits, such as the
not yet standardized std:is_debugger_present and
std:breakpoint_if_debugging calls. Then a real world justification
needs to be shown why the kernel should optimize for the uncommon case.

Especially because it will take time and energy and maintenance to keep
that going, all for to support a set of facilities that seem highly
dubious.

I suspect the best way to support breakpoint_if_debugging (in the
general case) is to make them nop functions, and then have something
scan the source code collect those locations, and feed those locations
into the debugger as break points. Perhaps the compiler can be that
scanner and record the list of suggest break points somewhere that
a debugger can read.

As for std:is_debugger_present I suspect the best way to handle that
is to keep the current implementation, making it expensive to use so
people won't use it unthinkingly. If people are using this facility
enough that they are wasting electricity, an expensive operation at
least has the potential to make people to stop and think about what they
are doing.

>From a don't introduce heisenbug's perspective I think optimizing any of
this in the kernel looks like a bad idea.

Am I missing something?

Eric