Re: CONFIG_BIGMEM and rawio

Stephen C. Tweedie (sct@redhat.com)
Fri, 3 Sep 1999 14:15:14 +0100 (BST)


Hi,

On Fri, 3 Sep 1999 12:50:35 +0200 (CEST), Andrea Arcangeli
<andrea@suse.de> said:

> If you instead need to do I/O from/to shm memory then you can't expect
> raw-io to work correctly if you want to alloc up to 4giga of shm memory by
> using the BIGMEM feature.

> I sure agree that the rule is: "if your application is using raw-io and
> you enabled CONFIG_BIGMEM, then you may expect your application to not
> work anymore".

Unfortunately, one of the major users of raw IO -- Oracle -- requires
raw IO to shared memory. We need to make this work for them.

> What you meane exactly with bounce-buffers? I guess you don't mean to copy
> the contents of every bigmem-buffer to another regular buffer and do I/O
> with it. That would seems too slow to me (IMHO it would be faster to avoid
> raw-io completly ;).

Then you don't understand why we want raw IO. It isn't just for
zero-copy speed. It is primarily to avoid caching. Caching is
unnecessary wastage of memory in the case of a database. For Oracle
Parallel Server, the requirement is even more strict: we must guarantee
precise synchronisation with the disk, not with cache, because the
database is doing clever locking behind the scenes to allow more than
one machine to be accessing the shared scsi disk at once. Reading an
old value from cache may result in missing an update from another
machine.

> I just tried to put a wrapper in ll_rw_block to remap all b_data and
> req-> buffer in the fixmap area but then you have the problem that you
> can have lots of buffer queued in only one request and so you should
> have lots of kmap slots per-request (more kmap slots than
> NR_REQUEST).

You only actually need the mapping when you copy the buffer. The IO
then takes place in normal kernel va space.

> kmap_request was doing a _SMP_ TLB flush of the interested kmap (since you
> don't know in which CPU the irq-completation handler will happen). Now we
> don't have a way to issue a SMP invlpg tlb flush so I was using the
> equally safe (but slower) smp_flush_all().

Yes, the completion interrupt for reads is nastier. However, we don't
need the original kmap in this case: we just need a local interrupt kmap
(to avoid stomping on any foreground kmap going on at the time). So we
do need a pool of vas to use for this, but we don't need to make them
persist (at least not until we extend bigmem support to the page cache).

> BTW, I have two questions about the ll_rw_block/request internals.

> 1) can req->bh be null? IMHO no and it seems lots of code it's
> wasting time checking for null there.

It used to be possible: ll_rw_swap didn't use temporary buffer heads
once upon a time, and req->bh would have been null in that case. Right
now I expect every request should have a bh: swapping goes via brw_page
now, and that uses buffer_heads.

> 2) which is the difference between req->buffer and req->bh->b_data?
> I can't see the difference. (if answer of (1) is yes, then when
> I'll have understood (1) then probably I'll solve (2) too ;)

req->buffer was used for the swapping case above where we didn't have a
bh.

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/