Re: [benchmark] 1% performance overhead of paravirt_ops on native kernels

From: Nick Piggin
Date: Tue Jun 09 2009 - 08:11:18 EST

Next message: mirrwao: "linux-kernel"
Previous message: Ingo Molnar: "Re: [RFC PATCH 0/5] simplify the print fmt in the event formatfiles"
In reply to: Ingo Molnar: "Re: [benchmark] 1% performance overhead of paravirt_ops on nativekernels"
Next in thread: Ingo Molnar: "Re: [benchmark] 1% performance overhead of paravirt_ops on nativekernels"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Jun 09, 2009 at 01:17:19PM +0200, Ingo Molnar wrote:
>
> * Nick Piggin <npiggin@xxxxxxx> wrote:
>
> > On Thu, Jun 04, 2009 at 08:02:14AM -0700, Linus Torvalds wrote:
> > >
> > >
> > > On Thu, 4 Jun 2009, Rusty Russell wrote:
> > > > >
> > > > > Turn off HIGHMEM64G, please (and HIGHMEM4G too, for that matter - you
> > > > > can't compare it to a no-highmem case).
> > > >
> > > > Thanks, your point is demonstrated below. I don't think HIGHMEM4G is
> > > > unreasonable for a distro tho, so I turned that on instead.
> > >
> > > Well, I agree that HIGHMEM4G is a _reasonable_ thing to turn on.
> > >
> > > The thing I disagree with is that it's at all valid to then compare to
> > > some all-software feature thing. HIGHMEM doesn't expand any esoteric
> > > capability that some people might use - it's about regular RAM for regular
> > > users.
> > >
> > > And don't get me wrong - I don't like HIGHMEM. I detest the damn thing. I
> > > hated having to merge it, and I still hate it. It's a stupid, ugly, and
> > > very invasive config option. It's just that it's there to support a
> > > stupid, ugly and very annoying fundamental hardware problem.
> >
> > I was looking forward to be able to get rid of it... unfortunately
> > other 32-bit architectures are starting to use it again :(
> >
> > I guess it is not incredibly intrusive for generic mm code. A bit
> > of kmap sprinkled around which is actually quite a useful
> > delimiter of where pagecache is addressed via its kernel mapping.
> >
> > Do you hate more the x86 code? Maybe that can be removed?
>
> IMHO what hurts most about highmem isnt even its direct source code
> overhead, but three factors:
>
> - The buddy allocator allocates top down, with highmem pages first.
> So a lot of critical apps (the first ones started) will have
> highmem footprint, and that shows up every time they use it for
> file IO or other ops. kmap() overhead and more.

Yeah this really sucks about it. OTOH, we have basically the same
thing today with NUMA allocations and task placement.

> - Highmem is not really a 'solvable' problem in terms of good VM
> balancing. It gives conflicting constraints and there's no single
> 'good VM' that can really work - just a handful of bad solutions
> that differ in their level and area of suckiness.

But we have other zones too. And you also run into similar (and
in some senses harder) choices with NUMA as well.

> - The kmap() cache itself can be depleted,

Yeah, the rule is not allowed to do 2 nested ones.

> and using atomic kmaps
> is fragile and error-prone. I think we still have a FIXME of a
> possibly triggerable deadlock somewhere in the core MM code ...

Not that I know of. I fixed the last long standing known one
with the write_begin/write_end changes a year or two ago. It
wasn't exactly related to kmap of the pagecache (but page fault
of the user address in copy_from_user).

> OTOH, highmem is clearly a useful hardware enablement feature with a
> slowly receding upside and a constant downside. The outcome is
> clear: when a critical threshold is reached distros will stop
> enabling it. (or more likely, there will be pure 64-bit x86 distros)

Well now lots of embedded type archs are enabling it... So the
upside is slowly increasing again I think.

> Highmem simply enables a sucky piece of hardware so the code itself
> has an intrinsic level of suckage, so to speak. There's not much to
> be done about it but it's not a _big_ problem either: this type of
> hw is moving fast out of the distro attention span.

Yes but Linus really hated the code. I wonder whether it is
generic code or x86 specific. OTOH with x86 you'd probably
still have to support different page table formats, at least,
so you couldn't rip it all out.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: mirrwao: "linux-kernel"
Previous message: Ingo Molnar: "Re: [RFC PATCH 0/5] simplify the print fmt in the event formatfiles"
In reply to: Ingo Molnar: "Re: [benchmark] 1% performance overhead of paravirt_ops on nativekernels"
Next in thread: Ingo Molnar: "Re: [benchmark] 1% performance overhead of paravirt_ops on nativekernels"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]