Re: [PATCH v10 1/5] kasan: support backing vmalloc space with real shadow memory

From: Daniel Axtens
Date: Thu Oct 31 2019 - 05:36:55 EST


Uladzislau Rezki <urezki@xxxxxxxxx> writes:

> Hello, Daniel
>
>>
>> @@ -1294,14 +1299,19 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
>> spin_lock(&free_vmap_area_lock);
>> llist_for_each_entry_safe(va, n_va, valist, purge_list) {
>> unsigned long nr = (va->va_end - va->va_start) >> PAGE_SHIFT;
>> + unsigned long orig_start = va->va_start;
>> + unsigned long orig_end = va->va_end;
>>
>> /*
>> * Finally insert or merge lazily-freed area. It is
>> * detached and there is no need to "unlink" it from
>> * anything.
>> */
>> - merge_or_add_vmap_area(va,
>> - &free_vmap_area_root, &free_vmap_area_list);
>> + va = merge_or_add_vmap_area(va, &free_vmap_area_root,
>> + &free_vmap_area_list);
>> +
>> + kasan_release_vmalloc(orig_start, orig_end,
>> + va->va_start, va->va_end);
>>
> I have some questions here. I have not analyzed kasan_releace_vmalloc()
> logic in detail, sorry for that if i miss something. __purge_vmap_area_lazy()
> deals with big address space, so not only vmalloc addresses it frees here,
> basically it can be any, starting from 1 until ULONG_MAX, whereas vmalloc
> space spans from VMALLOC_START - VMALLOC_END:
>
> 1) Should it be checked that vmalloc only address is freed or you handle
> it somewhere else?
>
> if (is_vmalloc_addr(va->va_start))
> kasan_release_vmalloc(...)

So in kasan_release_vmalloc we only free the region covered by the
shadow of orig_start to orig_end, and possibly 1 page to either side. So
it will never attempt to free an enormous area. And it will also do
nothing if called for a region where there is no shadow backin
installed.

Having said that, there should be a test on orig_start, and I've added
that in v11 - good catch.

> 2) Have you run any bencmarking just to see how much overhead it adds?
> I am asking, because probably it make sense to add those figures to the
> backlog(commit message). For example you can run:
>
> <snip>
> sudo ./test_vmalloc.sh performance
> and
> sudo ./test_vmalloc.sh sequential_test_order=1
> <snip>

I have now done that:

Testing with test_vmalloc.sh on an x86 VM with 2 vCPUs shows that:

- Turning on KASAN, inline instrumentation, without this feature, introuduces
a 4.1x-4.2x slowdown in vmalloc operations.

- Turning this on introduces the following slowdowns over KASAN:
* ~1.76x slower single-threaded (test_vmalloc.sh performance)
* ~2.18x slower when both cpus are performing operations
simultaneously (test_vmalloc.sh sequential_test_order=1)

This is unfortunate but given that this is a debug feature only, not
the end of the world.

The full figures are:


Performance

No KASAN KASAN original x baseline KASAN vmalloc x baseline x KASAN

fix_size_alloc_test 1697913 14229459 8.38 22981983 13.54 1.62
full_fit_alloc_test 1841601 15152633 8.23 17902922 9.72 1.18
long_busy_list_alloc_test 17874082 58856758 3.29 103925371 5.81 1.77
random_size_alloc_test 9356047 29544085 3.16 57871338 6.19 1.96
fix_align_alloc_test 3188968 19821620 6.22 37979436 11.91 1.92
random_size_align_alloc_te 3033507 17584339 5.80 32588942 10.74 1.85
align_shift_alloc_test 325 1154 3.55 7263 22.35 6.29
pcpu_alloc_test 231952 278181 1.20 318977 1.38 1.15
Total Cycles 235852824254 985040965542 4.18 1733258779416 7.35 1.76

Sequential, 2 cpus

No KASAN KASAN original x baseline KASAN vmalloc x baseline x KASAN

fix_size_alloc_test 2505806 17989253 7.18 39651038 15.82 2.20
full_fit_alloc_test 3579676 18829862 5.26 21142645 5.91 1.12
long_busy_list_alloc_test 21594983 74766736 3.46 140701363 6.52 1.88
random_size_alloc_test 10884695 34282077 3.15 91945108 8.45 2.68
fix_align_alloc_test 4133226 26304745 6.36 76163270 18.43 2.90
random_size_align_alloc_te 4261175 22927883 5.38 55236058 12.96 2.41
align_shift_alloc_test 948 4827 5.09 4144 4.37 0.86
pcpu_alloc_test 371789 307654 0.83 374412 1.01 1.22
Total Cycles 99965417402 412710461642 4.13 897968646378 8.98 2.18
fix_size_alloc_test 2502718 17921542 7.16 39893515 15.94 2.23
full_fit_alloc_test 3547996 18675007 5.26 21330495 6.01 1.14
long_busy_list_alloc_test 21522579 74610739 3.47 139822907 6.50 1.87
random_size_alloc_test 10881507 34317349 3.15 91110531 8.37 2.65
fix_align_alloc_test 4119755 26180887 6.35 75818927 18.40 2.90
random_size_align_alloc_te 4297708 23058344 5.37 55969004 13.02 2.43
align_shift_alloc_test 956 5574 5.83 4591 4.80 0.82
pcpu_alloc_test 306340 347014 1.13 571289 1.86 1.65
Total Cycles 99642832084 412084074628 4.14 896497227762 9.00 2.18


Regards,
Daniel

> Thanks!
>
> --
> Vlad Rezki