On Wed, Jul 02, 2014 at 03:41:21PM +0900, Yasuaki Ishimatsu wrote:
llc_shared_mask is not cleared even if cpu is offline or hot removed.
So when hot-plugging CPU, the mask has wrong value. The mask is used
by CSF schduler. So it breaks CFS scheduler.
Here is a example on my system.
My system has 4 sockets and each socket has 15 cores and HT is enabled.
In this case, each core of sockes is numbered as follows:
| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-44, 90-104
Socket#3 | 45-59, 105-119
Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
It means that cache of Socket#2 is shared with CPU#30-44 and 90-104.
When hot-removing socket#2 and #3, each core of sockets is numbered
as follows:
| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains
having 0x3fff80000001fffc0000000.
After that, when hot-adding socket#2 and #3, each core of sockets is
numbered as follows:
| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-59
Socket#3 | 90-119
Ok, this doesn't make too much sense to me. Why would the readded cores
have new numbers?
Because if they kept their old numbers, you wouldn't have to correct the
LLC mask.
Shouldn't the hotplug code keep stable core ids based on APIC id and
node and whatever across physical hotplug operations?