Re: [RFC V2] mm: add the zero case to page[1].compound_nr in set_compound_order

From: Matthew Wilcox
Date: Thu Dec 15 2022 - 16:47:12 EST

Next message: kernel test robot: "drivers/clocksource/timer-clint.c:82:24: sparse: sparse: cast removes address space '__iomem' of expression"
Previous message: Wolfram Sang: "[PULL REQUEST] i2c-for-6.2-rc1"
In reply to: Nico Pache: "Re: [RFC V2] mm: add the zero case to page[1].compound_nr in set_compound_order"
Next in thread: Nico Pache: "Re: [RFC V2] mm: add the zero case to page[1].compound_nr in set_compound_order"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Dec 15, 2022 at 02:38:28PM -0700, Nico Pache wrote:
> To expand a little more on the analysis:
> I computed the latency/throughput between <+24> and <+27> using
> intel's manual (APPENDIX D):
>
> The bitmath solutions shows a total latency of 2.5 with a Throughput of 0.5.
> The branch solution show a total latency of 4 and throughput of 1.5.
>
> Given this is not a tight loop, and the next instruction is requiring
> the data computed, better (lower) latency is the more ideal situation.
>
> Just wanted to add that little piece :)

I appreciate how hard you're working on this, but it really is straining
at gnats ;-) For a modern cpu, the most important thing is cache misses
and avoiding dirtying cachelines. Cycle counting isn't that important
when an L3 cache miss takes 2000 (or more) cycles.

Next message: kernel test robot: "drivers/clocksource/timer-clint.c:82:24: sparse: sparse: cast removes address space '__iomem' of expression"
Previous message: Wolfram Sang: "[PULL REQUEST] i2c-for-6.2-rc1"
In reply to: Nico Pache: "Re: [RFC V2] mm: add the zero case to page[1].compound_nr in set_compound_order"
Next in thread: Nico Pache: "Re: [RFC V2] mm: add the zero case to page[1].compound_nr in set_compound_order"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]