Re: [PATCH bpf-next] uprobes: Fix the xol slots reserved for uretprobe trampoline
From: Liao, Chang
Date: Wed Jun 19 2024 - 22:59:23 EST
Hi, Jiri
在 2024/6/20 0:22, Jiri Olsa 写道:
> On Wed, Jun 19, 2024 at 01:34:11AM +0000, Liao Chang wrote:
>> When the new uretprobe system call was added [1], the xol slots reserved
>> for the uretprobe trampoline might be insufficient on some architecture.
>
> hum, uretprobe syscall is x86_64 specific, nothing was changed wrt slots
> or other architectures.. could you be more specific in what's changed?
I observed a significant performance degradation when using uprobe to trace Redis
on arm64 machine. redis-benchmark showed a decrease of around 7% with uprobes
attached to two hot functions, and a much worse result with uprobes on more hot
functions. Here is a samll snapshot of benchmark result.
No uprobe
---------
SET: 73686.54 rps
GET: 73702.83 rps
Uprobes on two hot functions
----------------------------
SET: 68441.59 rps, -7.1%
GET: 68951.25 rps, -6.4%
Uprobes at three hot functions
------------------------------
SET: 40953.39 rps,-44.4%
GET: 41609.45 rps,-43.5%
To investigate the potential improvements, i ported the uretprobe syscall and
trampoline feature for arm64. The trampoline code used on arm64 looks like this:
uretprobe_trampoline_for_arm64:
str x8, [sp, #-8]!
mov x8, __NR_uretprobe
svc #0
Due to arm64 uses fixed-lenghth instruction of 4 bytes, the total size of the trampoline
code is 12 bytes, since the ixol slot size is typical 4 bytes, the misfit bewteen the
slot size of trampoline size requires more than one slot to reserve.
Thanks.
>
> thanks,
> jirka
>
>> For example, on arm64, the trampoline is consist of three instructions
>> at least. So it should mark enough bits in area->bitmaps and
>> and area->slot_count for the reserved slots.
>>
>> [1] https://lore.kernel.org/all/20240611112158.40795-4-jolsa@xxxxxxxxxx/
>>
>> Signed-off-by: Liao Chang <liaochang1@xxxxxxxxxx>
>> ---
>> kernel/events/uprobes.c | 11 +++++++----
>> 1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
>> index 2816e65729ac..efd2d7f56622 100644
>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -1485,7 +1485,7 @@ void * __weak arch_uprobe_trampoline(unsigned long *psize)
>> static struct xol_area *__create_xol_area(unsigned long vaddr)
>> {
>> struct mm_struct *mm = current->mm;
>> - unsigned long insns_size;
>> + unsigned long insns_size, slot_nr;
>> struct xol_area *area;
>> void *insns;
>>
>> @@ -1508,10 +1508,13 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
>>
>> area->vaddr = vaddr;
>> init_waitqueue_head(&area->wq);
>> - /* Reserve the 1st slot for get_trampoline_vaddr() */
>> - set_bit(0, area->bitmap);
>> - atomic_set(&area->slot_count, 1);
>> insns = arch_uprobe_trampoline(&insns_size);
>> + /* Reserve enough slots for the uretprobe trampoline */
>> + for (slot_nr = 0;
>> + slot_nr < max((insns_size / UPROBE_XOL_SLOT_BYTES), 1);
>> + slot_nr++)
>> + set_bit(slot_nr, area->bitmap);
>> + atomic_set(&area->slot_count, slot_nr);
>> arch_uprobe_copy_ixol(area->pages[0], 0, insns, insns_size);
>>
>> if (!xol_add_vma(mm, area))
>> --
>> 2.34.1
>>
--
BR
Liao, Chang