On Fri, Mar 22, 2019 at 10:44:33AM +0000, Oleksandr Andrushchenko wrote:This is really great, looking forward to it: any date in your mind
On 3/20/19 5:50 AM, Munehisa Kamata wrote:Hi Oleksandr,
On 3/18/2019 3:02 AM, Oleksandr Andrushchenko wrote:Thank you for your work!
+AmazonHi Oleksandr,
pls see inline
Let me add some comments as the original author of the series.
Looks like my previous email did not make it to mailing list. May be some issues with myYes, I saw that code, thank youOn 3/14/19 9:00 PM, Julien Grall wrote:AFAIK .suspend and .resume callbacks in frontend drivers are
Hi,Julien, thanks a lot for bringing these patches to our attention which we obviously missed.
On 3/14/19 3:40 PM, Boris Ostrovsky wrote:
On 3/14/19 11:10 AM, Oleksandr Andrushchenko wrote:I remember a series from amazon [1] that plays around suspend and hibernation. The patch [2] leads me to think that guest triggered suspend/resume does not work properly. It looks like the series has never been fully reviewed. Not sure why...
On 3/14/19 5:02 PM, Boris Ostrovsky wrote:xl save / xl restore
On 3/14/19 10:52 AM, Oleksandr Andrushchenko wrote:Ah, the difference can be of the way we get the guest enter
On 3/14/19 4:47 PM, Boris Ostrovsky wrote:My tree is about a month old so perhaps there is some sort of regression
On 3/14/19 9:17 AM, Oleksandr Andrushchenko wrote:Exactly, if you take a look at the .resume callback as it is now
From: Oleksandr Andrushchenko <oleksandr_andrushchenko@xxxxxxxx>What do you mean? Are you saying that after resume you lose
Currently on driver resume we remove all the network queues and
destroy shared Tx/Rx rings leaving the driver in its current state
and never signaling the backend of this frontend's state change.
This leads to the number of consequences:
- when frontend withdraws granted references to the rings etc. it
cannot
???????? be cleanly done as the backend still holds those (it was not
told to
???????? free the resources)
- it is not possible to resume driver operation as all the
communication
???????? means with the backned were destroyed by the frontend, thus
???????? making the frontend appear to the guest OS as functional, but
???????? not really.
connectivity?
what it does it destroys the rings etc. and never notifies the backend
of that, e.g. it stays in, say, connected state with communication
channels destroyed. It never goes into any other Xen bus state, so
there is
no way its state machine can help recovering.
but this certainly works for me. After resume netfront gets
XenbusStateInitWait from backend which causes xennet_connect().
the suspend state. I am making my guest to suspend with:
echo mem > /sys/power/state
And then I use an interrupt to the guest (this is a test code)
to wake it up.
Could you please share your exact use-case when the guest enters suspend
and what you do to resume it?
I can see no way backend may want enter XenbusStateInitWait in myYours looks like ACPI path, I don't know how well it was tested TBH.
use-case
as it simply doesn't know we want him to.
Anyway, from my understanding this series may solve Oleksandr issue. However, this would only address the common code side. AFAIK Oleksandr is targeting Arm platform. If so, I think this would require more work than this series. Arm code still miss few bits properly suspend/resume arch specific code (see [2]).First of all, let me make it clear that we are interested in hibernation long term, so it would be
I have a branch on my git to track the series. However, they never have been resent after Ian Campbell left Citrix. I would be happy to review them if someone wants to pick them up and repost them.
desirable to re-use as much work form resume/suspend as we can. But, we see it as a step by
step work, e.g. first S2RAM and later on hibernation.
Let me clarify the immediate use-case that we have, so it is easier to understand what we want
and what we don't at the moment. We are about to continue work started by Mirela/Xilinx on
Suspend-to-RAM for ARM [3] and we made number of assumptions:
1. We are talking about *system* suspend, e.g. the goal is to suspend all the components
of the system and Xen itself at once. Think about this as fast-boot and/or energy saving
feature if you will.
2. With suspend/resume there is no intention to migrate VMs to any other host.
3. Most probably configuration of the back/front won't change between suspend/resume.
But long term we are also thinking for supporting suspend/resume in its broader meaning,
e.g. what is probably what you mean by suspend/resume.
specifically for xl save/restore case rather than the normal "system"
suspend. i.e. The former is Boris' case and something I called "Xen
suspend" in the patch series, the latter should be your interest and
called "ACPI path" here, and I referred to as "PM suspend". They are
very different code paths, see drivers/xen/manage.c for details of
Xen suspend.
For the immediate task we have at the moment we think we can re-useGiven that, we think that we don't need Xen support to save grants, page tables and otherIf you just (re)implement .suspend and .resume so without taking care
VM's context on suspend at least at the first stage as we are implementing not a fully
blown suspend/resume, but only S2RAM part of it which is much more simpler than a generic
suspend implementation. We only need changes to Linux kernel frontend drivers from [1] - the
piece that we miss is suspend/resume implementation in the netfront driver. What is more, as
we are not changing back/front configuration, we can even live with empty .resume/.suspend
frontend's callbacks because event channels, rings etc. are "statically" allocated in our
use-case at the first system start (cold boot). And indeed, tests show that waking domains
in the right order do allow that.
So, frankly, from [3] we are immediately interested in implementing .resume/.suspend, not
of Xen suspend, you can easily break the existing functionality. The
patch series introduced .freeze and .restore callbacks for both PM
suspend and hibernation, and kept .suspend (not implemented in most
frontend though) and .resume with no changes for Xen suspend.
Note that xenbus has mapped freeze/thaw/restore events to suspend,
resume and cancel callbacks to handle "checkpoint" case[4]. This was a
bit tricky and led me to the design to have the separate set of
callbacks at each frontend driver level[5]. You might need to consider
a similar approach even if your immediate interest at the moment is PM
suspend.
your work and implement .suspend/.resume based on it (we are targeting
S2RAM as the first stage).
But long term - we do support the idea of fully implemented
suspend and *hibernate* functionality as you describe it.
So, yes, we are also thinking about that.
Great, could you please let us know what is the progress and further planseven freeze/thaw/restore callbacks: if Amazon has will and capacity to continue working on [3]In fact, Anchal has taken over my initial work and she may want to chime
then once that gets into the upstream it also solves our S2RAM use-case, but if not then we
can probably re-work netfront patch and only provide .resume/.suspend callbacks which we need
for now (remember our very specific use-case which can survive suspend without callbacks
implemented).
IMO, patches at [2] seem to be useful while implementing generic suspend/resume and can
be postponed for S2RAM.
Julien/Juergen/Boris/Amazon - could you please express your view on the above?
Is it acceptable that for now we only take re-worked netfront patch from [3] with full
implementation in mind for later (we reuse code for .resume/.suspend)?
in here.
on that, so we do not work on the same code and can coordinate our
efforts somehow? Anchal, could you please shed some light on this?
email server settings. Giving it another shot.
Yes, I am working on those patches and plan to re-post them in an effort to upstream.
I agree with Munehisa here on considering the patches that are already out there asOk, so I'll plan my efforts accordingly
I plan to keep the same model to distinguish PM SUSPEND and PM HIBERNATION from xen
suspend and resume. There may be minor fixes here and there however, the overall
idea will still remain the same.
As the previous patches there will be support forThank you,
only xen-blkfront and xen-netfront in the initial patchset.
Thanks,That said, I'd be very happy to review patches if you come up with yourSure, thank you!
own ones, so feel free to add me in that case.
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b3e96c0c756211e805c6941d4a6e5f6e1995cb6bCheers,[3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg01093.html
[1] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00823.html
[2] http://xenbits.xen.org/gitweb/?p=people/julieng/linux-arm.git;a=shortlog;h=refs/heads/xen-migration/v2
[5] https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg00825.html
Thanks,Thank you,-boris
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
Oleksandr
Munehisa
Anchal