Re: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM

From: Konrad Rzeszutek Wilk
Date: Thu Mar 15 2012 - 15:50:50 EST


On Thu, Mar 15, 2012 at 12:36:48PM -0700, Dan Magenheimer wrote:
> > From: Avi Kivity [mailto:avi@xxxxxxxxxx]
> > Sent: Thursday, March 15, 2012 12:11 PM
> > To: Konrad Rzeszutek Wilk
> > Cc: Dan Magenheimer; Akshay Karle; linux-kernel@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; ashu tripathi;
> > nishant gulhane; amarmore2006; Shreyas Mahure; mahesh mohan
> > Subject: Re: [RFC 0/2] kvm: Transcendent Memory (tmem) on KVM
> >
> > On 03/15/2012 08:02 PM, Konrad Rzeszutek Wilk wrote:
> > > >
> > > > Nice. This takes care of the tail-end of the streaming (the more
> > > > important one - since it always involves a cold copy). What about the
> > > > other side? Won't the read code invoke cleancache_get_page() for every
> > > > page? (this one is just a null hypercall, so it's cheaper, but still
> > > > expensive).
> > >
> > > That is something we should fix - I think it was mentioned in the frontswap
> > > email thread the need for batching and it certainly seems required as those
> > > hypercalls aren't that cheap.
> >
> > In fact when tmem was first proposed I asked for two changes - make it
> > batchable, and make it asynchronous (so we can offload copies to a dma
> > engine, etc). Of course that would have made tmem significantly more
> > complicated.
>
> (Sorry, I'm not typing fast enough to keep up with the thread...)
>
> Hi Avi --
>
> In case it wasn't clear from my last reply, RAMster shows
> that tmem CAN be used asynchronously... by making it more
> complicated, but without making the core kernel changes more
> complicated.
>
> In RAMster, pages are locally cached (compressed using zcache)
> and then, depending on policy, a separate thread sends the pages
> to a remote machine. So the first part (compress and store locally)
> still must be synchronous, but the second part (transmit to
> another -- remote or possibly host? -- system) can be done
> asynchronously. The RAMster code has to handle all the race
> conditions, which is a pain but seems to work.
>
> This is all working today in RAMster (which is in linux-next).
> Batching is still not implemented by any tmem backend, but RAMster
> demonstrates how the backend implementation COULD do batching without
> any additional core kernel changes. I.e. no changes necessary
> to frontswap or cleancache.
>
> So, you see, I *was* listening. I just wasn't willing to fight
> the uphill battle of much more complexity in the core kernel
> for a capability that could be implemented differently.

Dan, please stop this.

The frontswap work is going through me and my goal is to provide
the batching and asynchronous option. It might take longer than
anticipated b/c it might require redoing some of the code - that
is OK. We can do this in steps too - first do the synchronous
(as is right now in implementation) and then add on the batching
and asynchrnous work. This means breaking the ABI/API, and I believe
Avi would like the ABI be as much baked as possible so that he does
not have to provide a v2 (or v3) of the tmem support in KVM.

I appreciate you having done that in RAMster but the "transmit"
option is what we need to batch. Think of Scatter Gather DMA.

>
> That said, I still think it remains to be proven that
> reducing the number of hypercalls by 2x or 3x (or whatever
> the batching factor you choose) will make a noticeable

I was thinking 32 - about the same number that we do in
Xen with PV MMU upcalls. We also batch it there with multicalls.

> performance difference. But if it does, batching can
> be done... and completely hidden in the backend.
>
> (I hope Andrea is listening ;-)
>
> Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/