Asynchronous read-ahead

Gerard Roudier (groudier@iplus.fr)
Sat, 6 Apr 1996 14:06:16 +0000 (GMT)


Read-ahead seems to work better under linux 1.3.8X.
But the read-ahead seems to be only SYNCHRONOUS.
Read-ahead is done only when we are trying to read a page which is locked.
Generally, the device is plugged at this moment, and we have to wait the
IO to be complete before moving some data to user space.

For example, normal FAST SCSI2 devices can read linear data at about 4 MB/sec.
When they have cached data, these data can be moved to core memory at about
8 MB/sec.
In that situation, an application that can compute 4 MB/sec at 100% CPU
cannot use 100% CPU time, since it shall be waiting about 33% of time
for next data from the disk.
With only SYNCHRONOUS read-ahead the speed of a such application is about:
2.67 MB/sec 67% CPU LOAD.

With ASYNCHRONOUS read-ahead, such applications can run at full speed:
4 MB/sec 100% CPU LOAD. (theorically)
(It's not possible to get 100% of CPU LOAD. However, we can get about 90%)

In order to run at full speed (4MB/sec) with only SYNCHRONOUS read-ahead,
an application program shall compute 8MB/sec at 100% CPU time
since it shall be waiting about 50% of time for next data from the disk.
4 MB/sec 50% CPU LOAD.

On this example, we can guess that ASYNCHRONOUS read-ahead increase performance
of applications that use more than 50% of CPU LOAD when they get data at
the maximum speed of hard disks.

In order to verify that, I have written an experimental patch for Linux-1.3.84
that implement ASYNCHRONOUS read ahead for access through the file system.

Configuration: P90 / 24MB / IBMS12 / NCR53C810 / ncrBsd2Linux

Bonnie results:
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
SyncReada 200 3118 90.7 3492 37.5 1266 26.0 2162 65.0 3952 28.0
AsyncReada 200 3099 90.6 3563 38.1 1595 32.6 2946 91.9 3960 31.2

(I skipped seekers for these tests)

ASYNCHRONOUS read-ahead has improved performance of "sequential input per char"
of 36 %. CPU LOAD is greater that 90 % as expected.

(Rewrite result is 26% better. Have you some idea about this ?)
Other results are quite identical as expected.

With this patch, Linux is as fast as some other Free Unix System for
sequential input.
However, this patch is a bit obscure and I invite Linus to rewrite some more
aesthetic code that does the same work.

Here is the patch for linux-1.3.84:

This patch add some fields to the structure "file" (fs.h).
Do "make clean" first if you have doubts about dependencies.

-------------------------- CUT HERE ----------------------------------
--- linux/mm/filemap.c.00 Fri Apr 5 23:00:30 1996
+++ linux/mm/filemap.c Sat Apr 6 12:13:01 1996
@@ -296,10 +296,14 @@
* of the logic when it comes to error handling etc.
*/
#define MAX_READAHEAD (PAGE_SIZE*8)
+#define MIN_READAHEAD (PAGE_SIZE*2)
int generic_file_read(struct inode * inode, struct file * filp, char * buf, int count)
{
int error, read;
unsigned long pos, page_cache;
+#ifndef ONLY_SYNCHRONOUS_READ_AHEAD
+ int try_async;
+#endif

if (count <= 0)
return 0;
@@ -308,6 +312,17 @@
page_cache = 0;

pos = filp->f_pos;
+#ifndef ONLY_SYNCHRONOUS_READ_AHEAD
+ /*
+ * Dont beleive f_reada
+ */
+ if (pos + count < MIN_READAHEAD)
+ filp->f_reada = 0;
+ else if (pos <= filp->f_rapos && pos + filp->f_ralen > filp->f_rapos)
+ filp->f_reada = 1;
+ try_async = filp->f_reada ? 1 : 0;
+#endif
+
for (;;) {
struct page *page;
unsigned long offset, addr, nr;
@@ -350,6 +365,8 @@
if (nr > count)
nr = count;

+#ifdef ONLY_SYNCHRONOUS_READ_AHEAD
+/* SYNCHRONOUS only read-ahead code */
/*
* We may want to do read-ahead.. Do this only
* if we're waiting for the current page to be
@@ -373,6 +390,73 @@
__wait_on_page(page);
}
unlocked_page:
+#else /* ASYNCHRONOUS read-ahead code */
+{
+ unsigned long max_ahead, ahead, rapos, ppos;
+
+ ppos = pos & PAGE_MASK;
+
+ /* Do some synchronous read-ahead */
+ if (page->locked) {
+ if (!filp->f_reada) {
+ max_ahead = MIN_READAHEAD - PAGE_SIZE;
+ try_async = 0;
+ }
+ else {
+ max_ahead = MAX_READAHEAD - PAGE_SIZE;
+ try_async = 1;
+ }
+ rapos = ppos;
+ }
+ /* Do some asynchronous read-ahead */
+ else if (try_async == 1 && filp->f_rapos <= ppos + filp->f_ralen) {
+ struct page *a_page;
+
+ if (!filp->f_reada)
+ max_ahead = MIN_READAHEAD;
+ else
+ max_ahead = MAX_READAHEAD;
+
+ rapos = filp->f_rapos & PAGE_MASK;
+
+ if (rapos < (inode->i_size & PAGE_MASK)) {
+ a_page = find_page(inode, rapos);
+ if (a_page) {
+ if (a_page->locked)
+ max_ahead = 0;
+ a_page->count--;
+ }
+ }
+ else
+ max_ahead = 0;
+ try_async = 2;
+ }
+ else {
+ rapos = ppos;
+ max_ahead = 0;
+ }
+
+ ahead = 0;
+ while (ahead < max_ahead) {
+ ahead += PAGE_SIZE;
+ page_cache = try_to_read_ahead(inode, rapos + ahead, page_cache);
+ }
+ if (ahead > 0) {
+ filp->f_rapos = rapos + ahead;
+ filp->f_ralen = ahead;
+
+ /* Force unplug device in order to start asynchronous */
+ if (try_async == 2) {
+ schedule();
+ try_async = 1;
+ }
+ }
+ if (page->locked) {
+ __wait_on_page(page);
+ }
+}
+#endif
+
if (!page->uptodate)
goto read_page;
if (nr > inode->i_size - pos)
--- linux/include/linux/fs.h.00 Wed Mar 27 00:50:36 1996
+++ linux/include/linux/fs.h Sat Apr 6 11:26:22 1996
@@ -321,6 +321,12 @@
unsigned short f_flags;
unsigned short f_count;
off_t f_reada;
+
+/* Added for asynchronous read-ahead patch */
+ loff_t f_rapos; /* Last read-ahead position */
+ unsigned long f_ralen; /* Length of previous read-ahead */
+ unsigned long f_ramax; /* Current max read-ahead (futur use) */
+
struct file *f_next, *f_prev;
int f_owner; /* pid or -pgrp where SIGIO should be sent */
struct inode * f_inode;
-------------------------- CUT HERE ----------------------------------

Regards, Gerard.