Re: mmap/write vs read/write supprise
Fri, 11 Jul 1997 08:41:22 -0400

If we had madvise, the responsibility for determining the access
pattern could be put on the application. We could still use a better
default access pattern for mmap'd pages, though. Is the VM code
sophisticated enough yet to take advantage of an madvise syscall?

>>>>> ">" == Mark Hemment <> writes:


>> For sequentail access to the file, I would expect the read()
>> (file I/O) method to be faster. This is because the kernel
>> performs much more read-ahead for files accessed by this method
>> than mmap(). For mmap(), only one page is read-ahead of the
>> faulting address.

>> It is possible to implement page-fault prediction per vm-area,
>> with the kernel reading ahead further has it becomes more sure
>> of the faulting pattern. o When a fault occurs the faulting
>> address is stored in the vm-area structure. o If the faulting
>> address is the one expected, then increase the read-ahead
>> distance (or read-behind if the file is being accessed
>> backwards), and start I/O on the predicated pages if they are
>> not already incore (or I/O locked, which indicates they are "on
>> their way"). Based upon the success, calculate the next
>> faulting address. o If the faulting address is not the one
>> expected, then decrease (throttle back) the read-ahead/behind
>> distance. (Or maybe, even change the fault prediction
>> direction).

>> If the mmap()ed file has no (determinable) access pattern, then
>> the read-ahead/behind will not kick in. (Note: Because of
>> VM_CLONE the faulting stats are not really per vm_area, but per
>> reference to a vm_area - nasty!).

>> With the current design of the page-cache, this has a small
>> problem. Unmapped (that is pages which are not part any user
>> address-space) pages are not 'aged' in the way (currently)
>> mapped pages are. Their only defence against being reaped is
>> the 'PG_referenced' bit. This means pages read in with the
>> hope they will be needed soon are quickly shreaded if memory
>> becomes low. (This, of course, also happens with traditional
>> file I/O pages). To compound this, more free-pages are needed
>> for the read-ahead. A partial solution here is to add another
>> allocation priority that does not try very hard to find a free
>> page. (Infact, the priority should decay as the 'distance' of
>> the original fault increases).