Re: Address spaces on a i386 - Getting Confused (fwd)

Stephen C. Tweedie (sct@redhat.com)
Wed, 7 Apr 1999 00:18:19 +0100 (BST)


Hi,

On Sun, 4 Apr 1999 20:56:59 +0200, Jamie Lokier
<lkd@tantalophile.demon.co.uk> said:

>> Block devices are not mmapable, so there is no issue here.

> Files on block devices _are_ mmapable. Or they should be.

They are always cached through the page cache, not the buffer cache.
They are always in standard addressable memory. We already use
buffer-cache aliasing to populate these pages. Everything is already
set up to work correctly here.

> One imagines that files on /dev/ramdisk map the memory directly

No. Think about a 1k-blocksize filesystem in /dev/ram1. We _must_
expect to copy the files to get the alignment correct for mmap(). You'd
need a 4k-blocksize filesystem to allow direct mapping, but we don't
optimise that case yet.

> And on non-Intel? This fine distinction (bitops vs. arithmetic) does
> not seem to be consistent across architectures.

True, but basically as long as MAP_NR does the Right Thing and doesn't
alias io space and physical memory, the rest of the kernel won't care.

>> Me too. It gets worse: I really want to be able to do block device
>> IO anywhere in the first 4G of physical memory. ... it does
>> complicate matters and reqquires us to live with temporary virt/phys
>> mappings for IOs in progress.

> Fortunately temporary mappings are required for the minority of IOs on
> <1G machines.

We can optimise things to group memory types anyway. For example, until
we support 64-bit PCI on Intel, IO >4G will require bounce buffers (done
transparently below the ll_rw_block level). We can minimise the impact
by preferring to allocate page cache below the 4G boundary and anonymous
pages above it, on the basis that most IO will be to cache pages (we
don't expect to be swapping too much on such a machine!).

> And the mappings can be made very fast when required. We can avoid
> flushing the TLB contexts by using a circular buffer of mappings, and
> only flushing when we wrap around that buffer.

Exactly. *Precisely* the solution I outlined with Linus. You'd like to
be able to use a single buffer per CPU, but that is just too incomplete
a solution to work: for programmed IO, you don't know which CPUs are
going to be fielding the interrupts which access the data. You need the
mappings to be visible across CPUs. Doing the largish buffer minimises
the inter-CPU callbacks which are required when we invalidate, and as
long as we are careful to do a local TLB invalidate after we release a
mapping, we'll avoid trashing the TLB cache in the common case where we
only access the mapping from one CPU.

> I was about to complain about the dynamic mapping overhead.
> But then I though of the brilliant TLB-flush-avoiding strategy above :)

Great minds think alike!

> You haven't said how to get the _bus_ addresses for DMA though.
> I haven't seen a portable phys_to_bus yet...

Correct; that is something we will need.

> 1. Sometimes an address passed to a system call does not have an associated
> pte. Dave Miller once pointed out that the Sparc port handles the
> direct-mapped kernel area in the TLB miss handler without tables.
> Occasionally, a system call is called with such an address.

Raw IO is designed to operate out of user VA. If it supports ioremapped
device driver pages or mapped kernel pages, then that is nice, but it is
not _required_ for the kernel mechanism to be useful.

> I agree that with a few ifs and elses, getting the physical address is
> easy, for all addresses that a system call may be passed.

More than that: for all of the addresses which a raw IO implementation
is required to support, it is guaranteed to be easy.

Cheers,
Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/