Re: [patch] epoll use a single inode ...

From: Linus Torvalds
Date: Wed Mar 07 2007 - 22:21:27 EST

Next message: Paul Mackerras: "Re: [patch 2/6 -rt] powerpc 2.6.20-rt8: to convert spinlocks to raw ones."
Previous message: Stephen Hemminger: "Re: [PATCH] tcp_cubic: use 32 bit math"
In reply to: Kyle Moffett: "Re: [patch] epoll use a single inode ..."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 7 Mar 2007, Michael K. Edwards wrote:
>
> People's prejudices against prefetch instructions are sometimes
> traceable to the 3DNow! prefetch(w) botch, which some processors
> "support" as no-ops and others are too aggressive about (Opteron
> prefetches are reputed to be "strong", i. e., not dropped on DTLB
> miss).

No, I just checked, and Intel's own optimization manual makes it clear
that you should be careful. They talk about performance penalties due to
resource constraints - which makes tons of sense with a core that is good
at handling its own resources and could quite possibly use those resources
better to actually execute the loads and stores deeper down the
instruction pipeline.

So it's not just 3DNow! making AMD look bad, or Intel would obviously
suggest people use it out of the wazoo ;)

> XScale gets it right.

Blah. XScale isn't even an OoO CPU, *of*course* it needs prefetching.
Calling that "getting it right" is ludicrous. If anything, it gets things
so wrong that prefetching is *required* for good performance.

I'm talking about real CPU's with real memory pipelines that already do
prefetching in hardware. The better the core is, the less the prefetch
helps (and often the more it hurts in comparison to how much it helps).

But if you mean "doesn't try to fill the TLB on data prefetches", then
yes, that's generally the right thing to do.

> (Oddly, Prescott seems to have initiated a page table walk on DTLB miss
> during software prefetch -- just one of many weird Prescott flaws.)

Netburst in general is *very* happy to do speculative TLB fills, I think.

> I'm guessing Pentium M and its descendants (Core Solo and Duo) get it
> right but I'm having a hell of a time finding out for sure. Can any of
> the x86 experts answer this?

I just suspect that the upside for Core 2 Due is likely fairly low. The L2
cache is good, the memory re-ordering is working.. I doubt "prefetch"
helps in generic code that much for things like linked list following, you
should probably limit it to code that has *known* access patterns and you
know it's not going to be in the cache.

(In other words, I bet prefetching can help a lot with MMX/media kind of
code, I doubt it's a huge win for "for_each_entry()")

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Paul Mackerras: "Re: [patch 2/6 -rt] powerpc 2.6.20-rt8: to convert spinlocks to raw ones."
Previous message: Stephen Hemminger: "Re: [PATCH] tcp_cubic: use 32 bit math"
In reply to: Kyle Moffett: "Re: [patch] epoll use a single inode ..."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]