RE: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack

From: David Laight
Date: Sat Feb 10 2018 - 10:40:57 EST

Next message: Bjorn Helgaas: "Re: [PATCH v1] PCI: Make PCI_SCAN_ALL_PCIE_DEVS work for Root as well as Downstream Ports"
Previous message: Dr Musa Zongo: "With Due Respect !!!"
In reply to: Linus Torvalds: "Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack"
Next in thread: Denys Vlasenko: "Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Linus Torvalds
> Sent: 09 February 2018 19:49
...
> I think the instruction scheduling ends up basically breaking around
> microcoded instructions, which is why you'll get something like 12+n
> cycles for "rep movs" on some uarchs, but at that point it's probably
> mostly in the noise compared to all the other nasty PTI things.

Or 48+n on P4

> You won't see any of the _real_ advantages (which are about moving
> cachelines at a time), so with smallish copies you really only see the
> downsides of "rep movs", which is mainly that instruction scheduling
> hickup with any miocrocode.

I thought that the hardware optimisation for 'rep movsb' on recent
Intel cpus generated word sized memory accesses even for misaligned
short transfers.
My thoughts were that they'd implemented a cache line sized barrel
shift register.
If that isn't true then using it for all memcpy() is probably stupid
(but not as stupid as doing all memcpy backwards!)

David

Next message: Bjorn Helgaas: "Re: [PATCH v1] PCI: Make PCI_SCAN_ALL_PCIE_DEVS work for Root as well as Downstream Ports"
Previous message: Dr Musa Zongo: "With Due Respect !!!"
In reply to: Linus Torvalds: "Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack"
Next in thread: Denys Vlasenko: "Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]