Re: Linux-VServer example results for sharing vs. separate mappings...

From: Andrew Morton
Date: Sun Mar 25 2007 - 00:30:45 EST


On Sun, 25 Mar 2007 04:21:56 +0200 Herbert Poetzl <herbert@xxxxxxxxxxxx> wrote:

> > a) slice the machine into 128 fake NUMA nodes, use each node as the
> > basic block of memory allocation, manage the binding between these
> > memory hunks and process groups with cpusets.
>
> 128 sounds a little small to me, considering that we
> already see 300+ Guests on older machines ....
> (or am I missing something here?)

Yes, you're missing something very significant. I'm talking about resource
management (ie: partitioning) and you're talking about virtual servers.
They're different applications, with quite a lot in common.

For resource management, a few fives or tens of containers is probably an
upper bound.

An impementation needs to address both requirements.

> > This is what google are testing, and it works.
> >
> > b) Create a new memory abstraction, call it the "software zone",
> > which is mostly decoupled from the present "hardware zones". Most of
> > the MM is reworked to use "software zones". The "software zones" are
> > runtime-resizeable, and obtain their pages via some means from the
> > hardware zones. A container uses a software zone.
> >
> > c) Something else, similar to the above. Various schemes can be
> > envisaged, it isn't terribly important for this discussion.
>
> for me, the most natural approach is the one with
> the least impact and smallest number of changes
> in the (granted quite complex) system: leave
> everything as is, from the 'entire system' point
> of view, and do adjustments and decisions with the
> additional Guest/Context information in mind ...
>
> e.g. if we decide to reclaim pages, and the 'normal'
> mechanism would end up with 100 'equal' candidates,
> the Guest badness can be a good additional criterion
> to decide which pages get thrown out ...
>
> OTOH, the Guest status should never control the
> entire system behaviour in a way which harms the
> overall performance or resource efficiency

On the contrary - if one container exceeds its allotted resource, we want
the processes in that container to bear the majority of the cost of that.
Ideally, all of the cost.

>
> > All doable, if we indeed have a demonstrable problem
> > which needs to be addressed.
>
> all in all I seem to be missing the 'original problem'
> which basically forces us to do all those things you
> describe instead of letting the Linux Memory System
> work as it works right now and just get the accounting
> right ...

The VM presently cannot satisfy resource management requirements, because
piggy activity from one job will impact the performance of all other jobs.

> > > note that the 'frowned upon' accounting Linux-VServer
> > > does seems to work for those cases quite fine .. here
> > > the relevant accounting/limits for three guests, the
> > > first two unified and started in strict sequence, the
> > > third one completely separate
> > >
> > > Limit current min/max soft/hard hits
> > > VM: 41739 0/ 64023 -1/ -1 0
> > > RSS: 8073 0/ 9222 -1/ -1 0
> > > ANON: 3110 0/ 3405 -1/ -1 0
> > > RMAP: 4960 0/ 5889 -1/ -1 0
> > > SHM: 7138 0/ 7138 -1/ -1 0
> > >
> > > Limit current min/max soft/hard hits
> > > VM: 41738 0/ 64163 -1/ -1 0
> > > RSS: 8058 0/ 9383 -1/ -1 0
> > > ANON: 3108 0/ 3505 -1/ -1 0
> > > RMAP: 4950 0/ 5912 -1/ -1 0
> > > SHM: 7138 0/ 7138 -1/ -1 0
> > >
> > > Limit current min/max soft/hard hits
> > > VM: 41738 0/ 63912 -1/ -1 0
> > > RSS: 8050 0/ 9211 -1/ -1 0
> > > ANON: 3104 0/ 3399 -1/ -1 0
> > > RMAP: 4946 0/ 5885 -1/ -1 0
> > > SHM: 7138 0/ 7138 -1/ -1 0
> >
> > Sorry, I tend to go to sleep when presented with rows and rows of
> > numbers. Sure, it's good to show the data but I much prefer it if the
> > sender can tell us what the data means: the executive summary.
>
> sorry, I'm more the technical person and I hate
> 'executive summaries' and similar stuff, but the
> message is simple and clear: accouting works even
> for shared/unified guests, all three guests show
> reasonably similar values ...

I don't see "accounting" as being useful for resource managment. I mean,
so we have a bunch of numbers - so what?

The problem is: what do we do when the jobs in a container exceed their
allotment?

With zone-based physical containers we already have pretty much all the
accounting we need, in the existing per-zone accounting.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/