Re: Address spaces on a i386 - Getting Confused (fwd)

Stephen C. Tweedie (sct@redhat.com)
Thu, 1 Apr 1999 02:17:19 +0100 (BST)


Hi,

On Thu, 1 Apr 1999 02:55:18 +0200, Jamie Lokier
<lkd@tantalophile.demon.co.uk> said:

>> No problem. Finding the physical address of a kernel virtual address
>> is easy, and in fact we don't usually even need to do that, since
>> interrogating the page tables to find the user page's address in
>> memory gives us a true physical address directly.

> Hold on! :-)

> To provide transparent user-space DMA, i.e. user does read() and a DMA
> is done directly, you also have to allow for those things happening
> into kernel space. As that's the API read() et al. provide, and it is
> used on occasion. (Hence set_fs()).

That's not really an issue. It's very much like mmap(): the general
case is IO to/from a local block device (handled by generic_readpage(),
which calls the filesystem's own bmap function to map the data to disk).
For that, we are always talking buffer_head IO, in which b_data is
always a kernel virtual address, and the device drivers already take
care of any necessary translations.

> When you take this into account, walking the page tables using the plain
> macros doesn't work consistently across architectures. You have to
> check for the kernel "virt" address range specifically, and walk page
> tables for the rest. Otherwise you hit the 4M pages on i386,

No we don't, because we are always talking about user VA addresses
here. We don't hit the 4M pages.

> and the pages-with-no-tables on Sparc.

Again, we're using exactly the same techniques which (say) ptrace uses
to walk the user VA. It had _better_ work!

>> Finding the bus address of a kernel physical address is easy.

> virt_to_bus(phys_to_virt()) doesn't work on i386 due to the bit masking
> operations used.

I can't see any way in which it should fail for any valid VA page.

> The "ISA legacy area" thing butts in here too.

Again, we are _only_ talking about valid VA entries. This will be fine.
These pages already have kernel virtual addresses (or large parts of the
kernel would stop working), and they already have valid bus addresses.
It's quite true that once we start doing very large memory stuff, there
is a problem: virt_to_bus isn't good enough to emulate phys_to_bus if
the physical region is larger than the virtual region. Right now, the
conversions will be lossless on Intel.

> For the moment, I use this and hope for the best:

> #ifndef __i386__ /* I only know this is right for i386. */
> return __pa(pte_page(*pte));
> #else
> return virt_to_bus((void *) pte_page(*pte));
> #endif

Fair enough, but I'm not talking to the devices directly. Buffer_heads
are already defined to have a virtual address in bh->b_data, and the
block device driver layer is already required to do the right thing with
that.

>> > The device driver must be written to divide its DMA requirements into
>> > regions that don't cross non-contiguous page boundaries.
>>
>> For block device IO, that's just fine: we either split up the IO or
>> submit it as a single block using scatter-gather DMA.

> Of course. I meant to imply that you can't do it for any old device
> driver, you have to have the driver specifically support user-space DMA.
> That's me jumping on "set kernel address for this range" thing again.

No. There are no device drivers right now in Linux which "specifically"
support user-space DMA: until recently there simply hasn't been such a
thing for them to support. They just support buffer_head IO. However,
the scsi layers already naturally support scatter-gather to
buffer_heads, as we usually cache contiguous regions of disk in
discontiguous buffers in memory. We simply take advantage of the
existing mechanisms: no extra support is required.

>> There are perfectly standard ways of doing this inside the kernel: the
>> macros to walk over page tables are already architecture-independent.

> However macros to find the bus address for any bus address, from any
> address passed to a system call, are not.

Translating a system call address to a kernel virtual address is a
standard operation in the VM. Converting a virtual address to a bus
address is a standard function in the architecture-dependent code.

> I assume your stuff is happening totally in the block device layer then?
> (Sorry haven't read your code yet). That would be rather nice, to give
> generic user-space DMA support to all block devices in one smooth go.

Exactly.

> The stuff I have written, and not yet published, provides for clean,
> generic user-space DMA services to device drivers that want it.
> E.g. the video capture boards, 3d video boards and custom network
> hacks could benefit from it.

OK, we probably need to talk about this offline. I'll follow up on this
tomorrow.

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/