Re: Address spaces on a i386 - Getting Confused (fwd)

Jamie Lokier (lkd@tantalophile.demon.co.uk)
Thu, 1 Apr 1999 02:55:18 +0200


Stephen C. Tweedie wrote:
> >> c) Then trap into the kernel using an ioctl to set up kernel address
> >> for this range. ( I bet there must be some VM calls to acheive that
> >> ). What i mean is that get the pages behind the locked user-address
> >> behind the kernel address.
>
> > You make it sound so simple. Unfortunately it isn't.
>
> > c) as you describe not possible:
>
> That's odd, because I already have it working. :)

Yes I know :)
Me too :)

I understood Ramakrishna's "set up kernel address for this range" to
mean set up a kernel space mapping of the kind device drivers are used
to doing DMA with. Rereading, I see I was jumping a bit presumptiously
on the words "kernel address".

> > In the kernel, a "kernel address" that can be DMA'd using the
> > documented method (virt_to_bus) is in the direct mapped region. On
> > i386, that's the region from 0xc0000000 up.
>
> No problem. Finding the physical address of a kernel virtual address is
> easy, and in fact we don't usually even need to do that, since
> interrogating the page tables to find the user page's address in memory
> gives us a true physical address directly.

Hold on! :-)

To provide transparent user-space DMA, i.e. user does read() and a DMA
is done directly, you also have to allow for those things happening into
kernel space. As that's the API read() et al. provide, and it is used
on occasion. (Hence set_fs()).

When you take this into account, walking the page tables using the plain
macros doesn't work consistently across architectures. You have to
check for the kernel "virt" address range specifically, and walk page
tables for the rest. Otherwise you hit the 4M pages on i386, and the
pages-with-no-tables on Sparc.

> Finding the bus address of a kernel physical address is easy.

I don't think so. There isn't a phys_to_bus() macro.
I think it is messily architecture-dependent.

virt_to_bus(phys_to_virt()) doesn't work on i386 due to the bit masking
operations used. The "ISA legacy area" thing butts in here too.

See the convolutions on the Alpha that are done, though
virt_to_bus(phys_to_virt()) probably works fine on 64-bit architectures
with <4Gb memory because there's enough spare address space. I wouldn't
count on it without checking though.

For the moment, I use this and hope for the best:

#ifndef __i386__ /* I only know this is right for i386. */
return __pa(pte_page(*pte));
#else
return virt_to_bus((void *) pte_page(*pte));
#endif

> > The device driver must be written to divide its DMA requirements into
> > regions that don't cross non-contiguous page boundaries.
>
> For block device IO, that's just fine: we either split up the IO or
> submit it as a single block using scatter-gather DMA.

Of course. I meant to imply that you can't do it for any old device
driver, you have to have the driver specifically support user-space DMA.
That's me jumping on "set kernel address for this range" thing again.

> > It must look up the physical address from user-space (an architecture
> > specific thing for which there is no reliable macro yet).
>
> There are perfectly standard ways of doing this inside the kernel: the
> macros to walk over page tables are already architecture-independent.

However macros to find the bus address for any bus address, from any
address passed to a system call, are not.

> > As a future problem, some user-space pages will not be reachable by
> > DMA anyway, because they are outside the bus address range of the
> > device doing DMA. (cf. 32-bit PCI cards on >4GB memory machines, or
> > complexities of multi-bus machines).
>
> Indeed, and this is one of the problems I'll have to deal with for the
> large memory support. IO bounce buffers will be necessary, but that
> will be entirely transparent above the block device request layer.

I assume your stuff is happening totally in the block device layer then?
(Sorry haven't read your code yet). That would be rather nice, to give
generic user-space DMA support to all block devices in one smooth go.

The stuff I have written, and not yet published, provides for clean,
generic user-space DMA services to device drivers that want it.
E.g. the video capture boards, 3d video boards and custom network hacks
could benefit from it.

Have nice day,
-- Jamie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/