Re: Interesting scheduling times - NOT

Richard Gooch (rgooch@atnf.csiro.au)
Fri, 25 Sep 1998 21:08:10 +1000


Kurt Garloff writes:
>
> --jRHKVT23PllUwdXP
> Content-Type: text/plain; charset=us-ascii
>
> On Wed, Sep 23, 1998 at 01:55:57AM +0300, Jukka Tapani Santala wrote:
> > On Tue, 22 Sep 1998, Kurt Garloff wrote:
> > > w/32 procs per proc
> > > proc thread proc thread proc thread
> > > 2.1.120 6.5 2.8 28.3 22.0 0.68 0.60
> > > 2.1.122 FPU 6.0 3.9 28.1 22.1 0.69 0.57
> > > 2.1.122 both 4.7 2.5 16.4 11.2 0.37 0.27
> >
> > I'm surprised... It's my recollection that unaligned data is far slower
> > than cache misses. I guess accessing byte-aligned bytes isn't that bad,
> > though. Still I'd be very interested to see statistics on different
> > computers, and (if the structures aren't specific to one architechture -
> > can't check just now. If they are, ignore this;) most importantly
> > architechtures. Which is the unfortunate point in optimizations like
> > this; they're kinda architechture-dependent.
>
> As I pointed out in a private mail, the SMP fields (which I changed to
> bytes) are not accessed on my UP machine. So the quite good results have
> nothing to do with them.
> IIRC, accessing byte-aligned bytes on a IA32 is not that bad. Maybe PPro and
> P-II don't like it, I don't know, but Pentium, Cx6X86 and K6-2 are OK, AFAIK.
>
> > But if you're going to optimize for special cases, see the "Optimization
> > Manuals" on Intel's website - they give good insight into the cache- and
> > burst-loading sequences on Intel architechtures. I would, also, try to
> > profile with int's instead of char's to see if it's possible to find an
> > even faster combination between cache-line use and misalignment costs.
> > But then, I don't have the references in question handy to say if that's
> > supposed to have any effect, either ;)
>
> It might be bad on other archs to use bytes, so I changed it back to ints.
> I had to move exec_domain to the third cache line in order to have enough
> space for the important variables.
> I could have done this in the first place, but I didn't want to touch too
> much fields.
>
> I append a current patch. It's not tested. I don't know if the kernel will
> crash (very unlikely, unless I messed the order within INIT_TASK and by
> chance the compiler doesn't catch it, because the types are the same)
> or what the scheduling performance will be (I'm pretty sure it would be the
> same as with my previous patch on UP systems). It compiles at least.
>
> I will provide results after the weekend, when I'm back.
>
> Linus, what do you think? Regardless, whether Richard's test is
> broken (as Larry claims) or not (as I think), it is certainly a good
> idea to have the task_struct ordered to be cache-friendly, isn't
> it. I really think that it would be a good idea to have it in the
> kernel.

Careful. I did further measurements and some other things slowed
down. Check http://www.atnf.csiro.au/~rgooch/benchmarks/ for the
details. That may be because I didn't try to optimise other cases:
your patch might not have the same problems. It would be worth you
trying out the benchmark to see what effects it has.

Regards,

Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/