Re: Speeding up screen output for dialog

Brad Pepers (pepersb@cuug.ab.ca)
Tue, 3 Sep 1996 16:56:17 -0600 (MDT)


> Date: Mon, 2 Sep 1996 09:57:33 +0200 (MET DST)
> From: Bernd Schmidt <crux@pool.informatik.rwth-aachen.de>
>
> I've been wondering why some dialog-based applications
> (e.g. menuconfig) take so long even on a P90 to draw the screen. So I
> made some investigations with strace. I found that ncurses writes to
> the screen in rather large blocks of bytes, but these get broken up
> in write_chan() in n_tty.c into single byte writes to the console,
> which is inefficient (I assume that setting the cursor after each
> write is the most time-consuming part). So I came up with a hack:
> Instead of calling opost() for each byte, I wrote a new function
> opost_block() that examines the buffer and returns the number of
> bytes that can be directly written. These are then passed to
> con_write() instead of put_char().
>
> You've defintiely identified a real problem in the slowness of the
> console. However, the solution isn't really the best one, as it
> requires that characters be read an extra time --- once in opost_block,
> and once in the driver's write routine. When you consider that most
> interrupt driven devices have to buffer characters themselves (what BSD
> calls "pseudo-dma"), that means that there will be three copies, and
> that's non-optimal.

I have written a patch for this which implements put_char() and
flush_chars() for the console. It seems to work quite well. A
test program that writes 4000 byte blocks at a time, 1000 times,
takes about 9.5 seconds in my patched kernel and 55.8 in the
old kernel. The only problem now is that I just grabbed 2.0.17 to
make my patches against that and now my X server won't start (and
of course I have it going straight to xdm from inittab so I get
stuck). The X server starts but the monitor just can't seem to sync
to the video signal (monitor is blank and all the lights are
flashing!). Its a 21" Hitachi and the graphics card is a #9 Imagine
128 with 4Mb video RAM. I'm running a heavily modified Caldera
system and the Accelerated-X server. I'm going to reverse the
2.0.17 patches until I can figure out what caused the problem...

Anyways - soon as I get my system up in a normal fashion I will
post the patches. A couple questions while doing the patch:

1. I looked at the serial driver for how it does the buffering. It
in one place defines the serial buffer as 4096 and the uses a
page_alloc routine to get a page. Is it guarenteed in Linux that
the page size will always be 4096 on all archs? I used kmalloc
for now - does it have overhead (should I make the buffer a bit
smaller than 4096)?

2. There seems to be a number of structs involved with virtual
consoles. There is vc_cons and vt_cons and some others. I put
the transmit buffer into vt_cons since I couldn't see much reason
for either one. Anyone have an opinion which would be better?

3. I'm using a buffer of 4096 right now - too large? My guess is
that it might as well be a full page.

> In the long run, opost() should probably be written so that it takes a
> pointer, a from_user boolean, and a count. This will eliminate one of
> the procedure activiations per character that's in the current system.
> However, I think that fixing the console driver so that it supports
> put_char() and flush_chars() will provide nearly all of the procedure
> improvements you were looking for, while not impacting the performance
> of the other tty drivers.

The changes do seem to make a big difference (about 5.9 times faster
with large blocks) and its a fairly simple change.

> - Ted

======================================================================
Brad Pepers Proud supporter of Linux and
Ramparts Management Group Ltd. Caldera in Canada!
ramparts@agt.net
http://www.agt.net/public/ramparts Linux rules!