Re: Thread-private mappings and graphics (was Re: Per-Processor Data Page)

Jim Gettys (jg@pa.dec.com)
Wed, 15 Dec 1999 07:45:20 -0800 (PST)


> Also as Jon pointed out the API requires that these contexts be
> identified by some thread-specific mechanism available to the graphics
> library, not by explicit stack pointers in the application. Either
> the threaded OpenGL API is broken or DRE for linux-SGI is. If this the
> case then linux will need its own special threaded OpenGL library
> compared to all the other platforms which don't require this special
> rewrite.
>

Let me give some perspective here, and historical data on design decisions: I
think it may help introduce some light, rather than heat, into the
discussion.

Both X and OpenGL are "direct rendering" systems: that is, you describe
what you want on the screen by saying "draw this", followed by "draw that".
So there was a conceptual match from the beginning. For 2D graphics,
X has generally been able to keep up with display hardware (in decently
optimized implementations): it has been "fast enough.

Then the question is what the APIs looks like the way the do in X, and
in OpenGL.

When I was doing Xlib for X11, I had a multiprocessor workstation in my
office (at least for part of the time); this was really unusual in those
days, with real threads support and Bob Scheifler was programming in CLU
and Lisp, both of which do threads. So my API design was deliberately
designed "thread safe" from the beginning (the implementation took a while
to catch up, though I started to put in locking calls as macros on day
1 as I wrote the code).

So basic Xlib rendering calls look like:
XDrawSomething(display, graphicscontext, window, args....)
XDrawSomething(display, graphicscontext, window, args....)
etc.

This all works easily and naturally in a threaded world, and presents
no challenges.

OK, the SGI folks went to town to do 3D (having done it before), and make
it run FAST, this time under X.

They noted that the save/restore register times as you went through the
rendering pipeline (which is longer for 3d than for 2d, by a number of
steps, so you end up with a deeper call stack) of procedure calls for
the triple "display, graphicscontext, window", when the actual arguments
might be as small as 2 16 bit integers, came to be a very significant
fraction of both the instructions compliers of the day generated (the
day was 1988-89 or so), and bandwidth consumed doing the computation (due
to the store/load traffic from the additional arguments). As you go through
the pipe, at different stages you also may need the GC and window data.
Bandwidth is just about everything in 3D graphics.

So OpenGL does (conceptually):
SetContext(display, graphicscontextg, window)
ODraw(args)
ODraw(args)
ODraw(args)

This interface style clearly presents hard problems for threaded systems,
unless there is thread specific data (at which point it works ok).

So when OpenGL came out, I asked why was it was this way, and this is
the answer I was given as I remember, independent of the direct rendering
pageing tricks (which, I believe, came later, along with threading OpenGL).

So if I had it to do over again, I'd certainly do something more like
what SGI did (the wire cost of sending both GC and window long ago convinced
me we made a mistake in the protocol, and the mapping that resulted; that
would have removed one argument out of the X call list and on the wire,
and I might have gone further, to a single context pointer which would
have had all of three associated with it).

If SGI had it to do over again, (and given the much better compiler
technology, which may use registers much better), they might do something
slightly differently (they might have added one extra argument on calls,
maybe, if the compilers didn't spill the arguments out of registers,
generating the memory traffic that was a signficant performance issue
on the write back caches of the day).

But the genesis of the API's has, I believe, little to do with the DRI
case now, and should be removed from the flame war. And there is too much
code to change the interfaces, though I'd certainly do so with 20-20
hindsight. There wasn't enough threads experience then in the community (nor
compilers good enough) to understand what the right API was.

So let's figure out how to do what is needed "cleanly", and run like a
bandit, and not argue about whether the API's are stupid or not (I'm arguing
they both are, but the people doing the work weren't stupid, just with
limited experience and inability to crystal ball gaze correctly)....
Somehow, to do OpenGL (and some other API's), you need really cheap thread
specific data, and DRI presents other challenges, as the conversation
shows.

- Jim Gettys

--
Jim Gettys
Technology and Corporate Development
Compaq Computer Corporation
jg@pa.dec.com

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/