RE: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM

From: Dan Magenheimer
Date: Thu Mar 15 2012 - 15:37:07 EST


> From: Avi Kivity [mailto:avi@xxxxxxxxxx]
> Sent: Thursday, March 15, 2012 12:11 PM
> To: Konrad Rzeszutek Wilk
> Cc: Dan Magenheimer; Akshay Karle; linux-kernel@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; ashu tripathi;
> nishant gulhane; amarmore2006; Shreyas Mahure; mahesh mohan
> Subject: Re: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
>
> On 03/15/2012 08:02 PM, Konrad Rzeszutek Wilk wrote:
> > >
> > > Nice. This takes care of the tail-end of the streaming (the more
> > > important one - since it always involves a cold copy). What about the
> > > other side? Won't the read code invoke cleancache_get_page() for every
> > > page? (this one is just a null hypercall, so it's cheaper, but still
> > > expensive).
> >
> > That is something we should fix - I think it was mentioned in the frontswap
> > email thread the need for batching and it certainly seems required as those
> > hypercalls aren't that cheap.
>
> In fact when tmem was first proposed I asked for two changes - make it
> batchable, and make it asynchronous (so we can offload copies to a dma
> engine, etc). Of course that would have made tmem significantly more
> complicated.

(Sorry, I'm not typing fast enough to keep up with the thread...)

Hi Avi --

In case it wasn't clear from my last reply, RAMster shows
that tmem CAN be used asynchronously... by making it more
complicated, but without making the core kernel changes more
complicated.

In RAMster, pages are locally cached (compressed using zcache)
and then, depending on policy, a separate thread sends the pages
to a remote machine. So the first part (compress and store locally)
still must be synchronous, but the second part (transmit to
another -- remote or possibly host? -- system) can be done
asynchronously. The RAMster code has to handle all the race
conditions, which is a pain but seems to work.

This is all working today in RAMster (which is in linux-next).
Batching is still not implemented by any tmem backend, but RAMster
demonstrates how the backend implementation COULD do batching without
any additional core kernel changes. I.e. no changes necessary
to frontswap or cleancache.

So, you see, I *was* listening. I just wasn't willing to fight
the uphill battle of much more complexity in the core kernel
for a capability that could be implemented differently.

That said, I still think it remains to be proven that
reducing the number of hypercalls by 2x or 3x (or whatever
the batching factor you choose) will make a noticeable
performance difference. But if it does, batching can
be done... and completely hidden in the backend.

(I hope Andrea is listening ;-)

Dan

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/