Re: [RFC PATCH 1/1] x86/sgx: Explicitly give up the CPU in EDMM's ioctl() to avoid softlockup
From: Reinette Chatre
Date: Tue Apr 23 2024 - 13:10:41 EST
Hi Kai,
On 4/23/2024 4:50 AM, Huang, Kai wrote:
>> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
>> index b65ab214bdf5..2340a82fa796 100644
>> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
>> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
>> @@ -806,6 +806,9 @@ sgx_enclave_restrict_permissions(struct sgx_encl *encl,
>> }
>>
>> mutex_unlock(&encl->lock);
>> +
>> + if (need_resched())
>> + cond_resched();
>> }
>>
>> ret = 0;
>> @@ -1010,6 +1013,9 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl,
>> entry->type = page_type;
>>
>> mutex_unlock(&encl->lock);
>> +
>> + if (need_resched())
>> + cond_resched();
>> }
>>
>> ret = 0;
>> @@ -1156,6 +1162,9 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
>> kfree(entry);
>>
>> mutex_unlock(&encl->lock);
>> +
>> + if (need_resched())
>> + cond_resched();
>> }
>>
>
> You can remove the need_reshced() in all 3 places above but just call
> cond_resched() directly.
>
This change will call cond_resched() after dealing with each page in a
potentially large page range (cover mentions 30GB but we have also had to
make optimizations for enclaves larger than this). Adding a cond_resched()
here will surely placate the soft lockup detector, but we need to take care
how changes like this impact the performance of the system and having actions
on these page ranges take much longer than necessary.
For reference, please see 7b72c823ddf8 ("x86/sgx: Reduce delay and interference
of enclave release") that turned frequent cond_resched() into batches
to address performance issues.
It looks to me like the need_resched() may be a quick check that can be used
to improve performance? I am not familiar with all use cases that need to be
considered to determine if a batching solution may be needed.
Reinette