RE: [PATCH 1/1] Drivers: base: memory: Export symbols for onliningmemory blocks

From: KY Srinivasan
Date: Wed Jul 24 2013 - 15:47:42 EST

Next message: Jason Cooper: "Re: [GIT PULL] clockevents/clocksource: Add Marvell Orion SoC timer"
Previous message: David Miller: "Re: [40/85] net/tg3: Avoid delay during MMIO access"
In reply to: Dave Hansen: "Re: [PATCH 1/1] Drivers: base: memory: Export symbols for onliningmemory blocks"
Next in thread: Dave Hansen: "Re: [PATCH 1/1] Drivers: base: memory: Export symbols for onliningmemory blocks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

> -----Original Message-----
> From: Dave Hansen [mailto:dave@xxxxxxxx]
> Sent: Wednesday, July 24, 2013 12:43 PM
> To: KY Srinivasan
> Cc: Dave Hansen; Michal Hocko; gregkh@xxxxxxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx; devel@xxxxxxxxxxxxxxxxxxxxxx; olaf@xxxxxxxxx;
> apw@xxxxxxxxxxxxx; andi@xxxxxxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; linux-
> mm@xxxxxxxxx; kamezawa.hiroyuki@xxxxxxxxx; hannes@xxxxxxxxxxx;
> yinghan@xxxxxxxxxx; jasowang@xxxxxxxxxx; kay@xxxxxxxx
> Subject: Re: [PATCH 1/1] Drivers: base: memory: Export symbols for onlining
> memory blocks
>
> On 07/23/2013 10:21 AM, KY Srinivasan wrote:
> >> You have allocated some large, physically contiguous areas of memory
> >> under heavy pressure. But you also contend that there is too much
> >> memory pressure to run a small userspace helper. Under heavy memory
> >> pressure, I'd expect large, kernel allocations to fail much more often
> >> than running a small userspace helper.
> >
> > I am only reporting what I am seeing. Broadly, I have two main failure
> conditions to
> > deal with: (a) resource related failure (add_memory() returning -ENOMEM)
> and (b) not being
> > able to online a segment that has been successfully hot-added. I have seen
> both these failures
> > under high memory pressure. By supporting "in context" onlining, we can
> eliminate one failure
> > case. Our inability to online is not a recoverable failure from the host's point of
> view - the memory
> > is committed to the guest (since hot add succeeded) but is not usable since it is
> not onlined.
>
> Could you please precisely report on what you are seeing in detail?
> Where are the -ENOMEMs coming from? Which allocation site? Are you
> seeing OOMs or page allocation failure messages on the console?

The ENOMEM failure I see from the call to hot add memory - the call to
add_memory(). Usually I don't see any OOM messages on the console.

>
> The operation was split up in to two parts for good reason. It's
> actually for your _precise_ use case.

I agree and without this split, I could not implement the balloon driver with
hot-add.

>
> A system under memory pressure is going to have troubles doing a
> hot-add. You need memory to add memory. Of the two operations ("add"
> and "online"), "add" is the one vastly more likely to fail. It has to
> allocate several large swaths of contiguous physical memory. For that
> reason, the system was designed so that you could "add" and "online"
> separately. The intention was that you could "add" far in advance and
> then "online" under memory pressure, with the "online" having *VASTLY*
> smaller memory requirements and being much more likely to succeed.
>
> You're lumping the "allocate several large swaths of contiguous physical
> memory" failures in to the same class as "run a small userspace helper".
> They are _really_ different problems. Both prone to allocation
> failures for sure, but _very_ separate problems. Please don't conflate
> them.

I don't think I am conflating these two issues; I am sorry if I gave that
impression. All I am saying is that I see two classes of failures: (a) Our
inability to allocate memory to manage the memory that is being hot added
and (b) Our inability to bring the hot added memory online within a reasonable
amount of time. I am not sure the cause for (b) and I was just speculating that
this could be memory related. What is interesting is that I have seen failure related
to our inability to online the memory after having succeeded in hot adding the
memory.

>
> >> It _sounds_ like you really want to be able to have the host retry the
> >> operation if it fails, and you return success/failure from inside the
> >> kernel. It's hard for you to tell if running the userspace helper
> >> failed, so your solution is to move what what previously done in
> >> userspace in to the kernel so that you can more easily tell if it failed
> >> or succeeded.
> >>
> >> Is that right?
> >
> > No; I am able to get the proper error code for recoverable failures (hot add
> failures
> > because of lack of memory). By doing what I am proposing here, we can avoid
> one class
> > of failures completely and I think this is what resulted in a better "hot add"
> experience in the
> > guest.
>
> I think you're taking a huge leap here: "We could not online memory,
> thus we must take userspace out of the loop."
>
> You might be right. There might be only one way out of this situation.
> But you need to provide a little more supporting evidence before we all
> arrive at the same conclusion.

I am not even suggesting that. All I am saying is that there should be a mechanism
for "in context" onlining of memory in addition to the existing sysfs mechanism
for bringing memory online from a kernel context. Hyper-V balloon driver
can certainly use this functionality. I should be sending out the patches for this
shortly.
>
> BTW, it doesn't _require_ udev. There could easily be another listener
> for hotplug events.

Agreed; but structurally it is identical to having a udev rule.

Regards,

K. Y

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jason Cooper: "Re: [GIT PULL] clockevents/clocksource: Add Marvell Orion SoC timer"
Previous message: David Miller: "Re: [40/85] net/tg3: Avoid delay during MMIO access"
In reply to: Dave Hansen: "Re: [PATCH 1/1] Drivers: base: memory: Export symbols for onliningmemory blocks"
Next in thread: Dave Hansen: "Re: [PATCH 1/1] Drivers: base: memory: Export symbols for onliningmemory blocks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]