Re: OS stopping stack buffer overflow exploits

From: Khimenko Victor (khim@sch57.msk.ru)
Date: Sun Jun 04 2000 - 19:03:00 EST


In <00060417014100.18678@tabby> Jesse Pollard (pollard@cats-chateau.net) wrote:
JP> On Sun, 04 Jun 2000, Khimenko Victor wrote:
>>On Sun, 4 Jun 2000, Jesse Pollard wrote:
>>
>>> On Sun, 04 Jun 2000, Khimenko Victor wrote:
>>> >In <200006041042.MAA12176@oboe.it.uc3m.es> Peter T. Breuer (ptb@it.uc3m.es) wrote:
>>> >> "A month of sundays ago Florian Weimer wrote:"
>>> >>> The C and C++ frontends only generate trampolines if you use a GNU
>>> >>> extension (nested functions).
>>> >
>>> >> And I have no idea why they should want to: nesting is purely a
>>> >> question of namespaces and syntactic scoping. It should impact
>>> >> the implementation semantics not at all.
>>> >
>>> >Oh, yeah ? I eagerly wait your implementation of draw_table for sample
>>> >below without trampolines (just remember: you can not alter for_each or
>>> >use any information about it's implementation - there are exist sparate
>>> >compilation units, you know). Oh, and of course such implementation
>>> >should work in multi-threaded environment as well...
>>>
>>> Only piece I haven't covered is the multi-threaded indirect... Though that
>>> could be done by addressing an array via the thread id...
>>>
>>Yeah ? How ? How well it'll work if I'll pass pointer to function from one
>>thread to another (where for_each will create new thread and pass pointer
>>in it) ?

JP> See below, and it is already handled for the case of errno.

Yeah ? Last time I checked it was impossible to call read(2) in one thread
and the analyse errno in other (ok it was possible in very old libc
where you have one global errno and no per-thread errno - poor choice here
as well). Something changed ? I've said: "where for_each will create new
thread and pass pointer in it" (this new thread that is). So one_step will
be called in DIFFERENT thread (not in the same thread where for_each was
called).

>>What about recursion call of draw_table from one_step ?

JP> Already handled - the y = y1 and y1 = y in "one_step" provides a stack of
JP> pointers.

Not true. You forgot to add this "stack" handling in draw_table so it'll not
work but it's minor error, I agree. So here is sample (without threads) to
make your life miserable:
-- cut --
type
  proc_ptr=procedure;
var
  proc_array:array[0..99] of proc_ptr;
prcoedure store_proc(p:proc_ptr);
begin
  proc_array[trunc(rand()*100)]:=p;
end;
procedure play_procs;
var
  i:integer;
begin
  for i:=0 to 99 do
    proc_array[i];
end;
procedure foo;
var
  x:real;
  procedure print_x;
  begin
    writeln(x);
  end;
begin
  x:=random();
  store_proc(print_x);
  if random()<0.001
    play_procs
  else
    foo;
end;
procedure dummy;
begin
end;
var
  i:integer;
begin
  for i:=0 to 99 do
    proc_array[i]:=dummy;
end.
-- cut --
And, as usual, all procedures are compiled separately :-)

>>Yeah, you
>>can do this with some tables (for example you can keep address of for_each
>>call in hash table and use this to find out "right" y), but it's just
>>plain ugly: kludge over kludge.

JP> 1. So is the trampoline code.

Perhaps. One small difference: there you have ONE simple and reliable kludge
to handle all situations at once. Here you have either incomplete handling or
the whole heap of kludges and slow handling after all.

JP> 2. Only without a good optimizer. The old lisp compilers used to do this as
JP> part of "function elimination". (along with tail-recursion elimination, jump
JP> optimization, ...)

And what can "good optimizer" do when for_each and draw_table compiled
separately (usual situation for Pascal and Ada) ?

JP> After all, this is already handled for the case of errno. Nested functions
JP> could be implemented by the compiler by including the storage needed for
JP> "*y1" in the thread specific data structure, as is errno.

1. Compiler is NOT thread-aware now (glibc is not part of compiler). It'll be
real shame to add threads support in compiler just to make non-exec stack
possible.
2. errno is NOT handled right (ok it handled just like POSIX says; just
POSIX and Pascal/Ada specification says different things about local
variables).

JP> I believe some of the lisp compilers used thread specific arrays to allow
JP> all modules with thread specific data to be appended to the end of the
JP> array. (sometimes got long), but occasionally, the optimizer could even
JP> eliminate the storage.

You DO NOT need "thread specific" storage. You need "function incarnation
specific" storage. And "thread specific" storage is poor substitute here.
See example above.

>>Good as prof that "it's possible".

JP> That was all that it was intended for.

Oops. So you interpreted it as challenge by itself not as part of big
discussion ? Sorry then. It was not a such kind of challege (and you NOT
done it right even for challenge when threads or procedure arrays are
involved BTW; perhaps this can be fixed but code is ugly enough as it is).
I wanted feasible generic technique and concrete draw_table function was used
as just simple example.

>>Non-usable as general compiler technique.

JP> I didn't claim it was opimal in this particular instantiation. THIS code
JP> could be opimized by eliminated the "one_step_a" function entirely.

It's minor optimization.

JP> Most such code can be done that way. All the trampoline call does is what
JP> is included with the "static int *y1" combined with the function "one_step".

Yeah. Exacly. COMBINED is key word. You can NEVER mistook information provided
for one call with information provided for other call. With ANY thread
implemenrtation (and there are quite a few thread packages for linux) and ANY
number of procedures involved. It's hard to do with your global variables.
Real hard. And "most such code" is wrong word when you are developing compiler.
Real wrong.

JP> I have manually translated a number of nested procedures from Algol W (and
JP> modula) into C using this technique. It works quite well, and with the
JP> function elimination it ends up quite clean, and fast.

We are NOT talking about manual translation by klowleadgeable person. We are
talking about compiler for complex situation (as well as simple ones :-) where
bunch of threads libraries and separate compilation involved.

JP> One of the reasons nested functions were dropped from the original C definition
JP> (K&R) was a proof that any nested function could be implemented without the
JP> nesting and with no significant increase in generated code. Unfortunately I
JP> don't have a reference to that proof (done somewhere beween 1971 and 1975 I
JP> think, I was still in school at the time).

I'm not sure that such proof is still valid when functions without available
sources are involved - impossigble for human translation in open source world,
typical for compiler. And I'm pretty sure such compilers without trampolines
worked wrong for some weird examples as well: when logically bound information
is artifactually splitted and sent via separate channels usual end result is
wrong delivery in some cases :-))

>>
>>>
>>> /* example without tramploine */
>>>
>>> #include <stdio.h>
>>>
>>> /*
>>> procedure for_each(_from,_to:integer;procedure _do(x:integer));
>>> var
>>> i:integer;
>>> begin
>>> for i:=_from to _to do
>>> _do(i);
>>> end;
>>>
>>> procedure draw_table;
>>> var
>>> y:integer;
>>> procedure one_step(x:integer);
>>> begin
>>> write(x*y:4);
>>> end;
>>> begin
>>> for y:=1 to 10 do begin
>>> for_each(1,10,one_step);
>>> writeln;
>>> end;
>>> end;
>>> begin
>>> draw_table;
>>> end.
>>> */
>>>
>>> void for_each( int from, int to, void _do( int))
>>> {
>>> int i;
>>>
>>> for (i = from; i <= to; i++) {
>>> _do(i);
>>> }
>>> }
>>>
>>> void one_step_a(int *y, int x)
>>> {
>>> printf("%4d",x * *y);
>>> }
>>>
>>> static int *y1; /* pointer to data of function */
>>> void one_step (int x) /* substitution for trampoline */
>>> {
>>> int *y;
>>>
>>> y = y1;
>>> one_step_a(y,x);
>>> y1 = y;
>>> }
>>>
>>> void draw_table(void)
>>> {
>>> int y;
>>>
>>> y1 = & y; /* new */
>>> for (y = 1; y <= 10; y++) {
>>> for_each(1,10,one_step);
>>> printf("\n");
>>> }
>>> }
>>>
>>> int main(void)
>>> {
>>> draw_table();
>>> return(0);
>>> }
>>>
>>> output is:
>>>
>>> tabby$ ./a.out
>>> 1 2 3 4 5 6 7 8 9 10
>>> 2 4 6 8 10 12 14 16 18 20
>>> 3 6 9 12 15 18 21 24 27 30
>>> 4 8 12 16 20 24 28 32 36 40
>>> 5 10 15 20 25 30 35 40 45 50
>>> 6 12 18 24 30 36 42 48 54 60
>>> 7 14 21 28 35 42 49 56 63 70
>>> 8 16 24 32 40 48 56 64 72 80
>>> 9 18 27 36 45 54 63 72 81 90
>>> 10 20 30 40 50 60 70 80 90 100
>>>
>>> This goes back many years to the original "how is recursion implemented in
>>> pascal". There ARE compilers that don't use trampoline code, just as there
>>> are processors that don't have a hardware stack. Take a look at some of
>>> the old lisp interpreters.
>>>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Jun 07 2000 - 21:00:19 EST