Re: Status of AIO

From: Benjamin LaHaise
Date: Mon Mar 06 2006 - 20:42:02 EST


On Mon, Mar 06, 2006 at 04:51:29PM -0800, David S. Miller wrote:
> I think any such VM tricks need serious thought. It has serious
> consequences as far as cost especially on SMP. Evgivny has some data
> that shows this, and chapter 5 of Networking Algorithmics has a lot of
> good analysis and paper references on this topic.

VM tricks do suck, so you just have to use the tricks that nobody else
is... My thinking is to do something like the following: have a structure
to reference a set of pages. When it is first created, it takes a reference
on the pages in question, and it is added to the vm_area_struct of the user
so that the vm can poke it for freeing when memory pressure occurs. The
sk_buff dataref also has to have a pointer to the pageref added. Now, the
trick to making it useful is as follows:

struct pageref {
atomic_t free_count;
int use_count; /* protected by socket lock */
...
unsigned long user_address;
unsigned long length;
struct socket *sock; /* backref for VM */
struct page *pages[];
};

The fast path in network transmit becomes:

if (sock->pageref->... overlaps buf) {
for each packet built {
use_count++;
<add pageref to skb's dataref happily without atomics
or memory copying>
}
}

Then the kfree_skb() path does an atomic_dec() on pageref->free_count
instead of the page. (Or get rid of the atomic using knowledge about the
fact that a given pageref could only be freed by the network driver it was
given to.) That would make the transmit path bloody cheap, and the tx irq
context no more expensive than it already is.

It's probably easier to show this tx path with code that gets the details
right.

-ben
--
"Time is of no importance, Mr. President, only life is important."
Don't Email: <dont@xxxxxxxxx>.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/