This implementation would require core pmu.c to be changed, but there isThanks John!
ways that this could be done without needing to change core pmu.c
Thanks,
John
You are right about broadwell, it is an extreme case of sharing. IIRC
BDX is the server core/uncore events, BDW is the consumer core/uncore
and BDW-DE is consumer core with server uncore - so the sharing is
inherent in this. Metrics become interesting as they may mix core and
uncore, but I'll ignore that here.
In the old code every event needs 15 char*s, with 64-bit that is 15*8
bytes per entry with 741 core and 23 uncore entries for BDW, and 372
core and 1284 uncore for BDX. I expect the strings themselves will be
shared by the C compiler, and so I just focus on the pointer sizes.
With the new code every event is just 1 32-bit int. So for BDW we go
from 15*8*(741+23)=91,680 to 4*(741+23)=3056, BDX is
15*8*(372+1284)=198720 to 4*(372+1284)=6624. This means we've gone
from 290,400bytes to 9,680bytes for BDW and BDX. BDW-DE goes from
243,000bytes to 8,100bytes -
we can ignore the costs of the strings as
they should be fully shared, especially for BDW-DE.
If we added some kind of table sharing, so BDW-DE was core from BDW
and uncore from BDX and the tables shared, then in the old code you
could save nearly 0.25MB but with the new code the saving is only
around 8KB. I think we can go after that 8KB but it is less urgent
after this change which gets 96% of the benefit. We have 72
architectures for jevents at the moment and so I'd ball park (assuming
they all saved as much as BDW-DE) the max saving as about 0.5MB, which
is 12% of what is saved here.
Longer term I'd like to make the pmu-events.c logic look closer to the
sysfs API. Perhaps we can unify the uncore events for BDX and BDW-DE
with some notion of a common PMU, or PMUs with common events and
tables, and automate deduction of this. It also isn't clear to me the
advantage of storing the metrics inside the events, separate tables
feel cleaner. Anyway, there's lots of follow up.