Re: [PATCH v2] kernel: add kcov code coverage

From: Dmitry Vyukov
Date: Fri Jan 15 2016 - 09:08:26 EST

On Fri, Jan 15, 2016 at 2:05 PM, Andrey Ryabinin <ryabinin.a.a@xxxxxxxxx> wrote:
> 2016-01-14 17:30 GMT+03:00 Dmitry Vyukov <dvyukov@xxxxxxxxxx>:
>> On Thu, Jan 14, 2016 at 11:50 AM, Andrey Ryabinin
>> <ryabinin.a.a@xxxxxxxxx> wrote:
>>> 2016-01-13 15:48 GMT+03:00 Dmitry Vyukov <dvyukov@xxxxxxxxxx>:
>>>> + /* Read number of PCs collected. */
>>>> + n = __atomic_load_n(&cover[0], __ATOMIC_RELAXED);
>>>> + /* PCs are shorten to uint32_t, so we need to restore the upper part. */
>>>> + for (i = 0; i < n; i++)
>>>> + printf("0xffffffff%0lx\n", (unsigned long)cover[i + 1]);
>> Thanks for the review!
>> Mailed v3 with fixes.
>> Comments inline.
>>> This works only for x86-64.
>>> Probably there is no simple way to make this arch-independent with
>>> 32-bit values.
>> We probably could add an ioctl that returns base of the stripped PCs.
> You forgot about modules. With stripped PCs you'll start mixing
> kernel's and module's PC (if distance between module and kernel > 4G).

It's just that on x86 text and modules are within 4GB.

I've checked that on arm64 it also seems to be the case:

48 * The module space lives between the addresses given by TASK_SIZE
49 * and PAGE_OFFSET - it must be within 128MB of the kernel text.
50 */

Again, we can store wither u32s or u64s and expose this info in an ioctl...

>>> Note that this works only for cache-coherent architectures.
>>> For incoherent arches you'll need to flush_dcache_page() somewhere.
>>> Perhaps it could be done on exit to userspace, since flushing here is
>>> certainly an overkill.
>> I can say that I understand the problem. Does it have to do with the
>> fact that the buffer is shared between kernel and user-space?
>> Current code is OK from the plain multi-threading side, as user must
>> not read buffer concurrently with writing (that would not yield
>> anything useful).
> It's not about SMP.
> This problem is about virtually indexed aliasing D-caches and could be
> observed on uniprocessor system.
> You have 3 virtual addresses (user-space, linear mapping and vmalloc)
> mapped to the same physical page.
> With aliasing cache it's possible to have multiple cache-lines
> representing the same physical page.
> So the kernel might not see the update made by userspace and vise
> versa because kernel/userspace use different virtual addresses.
> And btw, flush_dcache_page() would be a wrong choice, since kcov_area
> is a vmalloc address, not a linear address.
> So we need something that flushes vmalloc addresses.
> Alternatively we could simply mlock that memory and talk to user space
> via get/put_user(). No flush will be required.
> And we will avoid another potential problem - lack of vmalloc address
> space on 32-bits.

Do you mean that user-space allocates a buffer and passes this buffer
to ioctl(KCOV_INIT); kernel locks this range and then directly writes
to it?

I afraid it becomes prohibitively expensive with put_user/get_user:

Also, won't it require the same flush since the region is mmaped into
several processes (and process that reads is not the one that setups
the region)?

Size of coverage buffer that I currently use is 64K. I hope it is not
a problem for 32-bit archs.

>> We could add an ioctl that does the flush. But I would prefer if it is
>> done when we port kcov to such an arch. Does arm64 require the flush?
> I think, it doesn't. AFAIK arm64 has non-aliasing D-cache.
> arm64/include/asm/cacheflush.h says:
> Please note that the implementation assumes non-aliasing VIPT D-cache
> However, I wonder why it implements flush_dcache_page(). Per my
> understanding it is not need for non-aliasing caches.
> And Documentation/cachetlb.txt agrees with me:
> void flush_dcache_page(struct page *page)
> If D-cache aliasing is not an issue, this routine may
> simply be defined as a nop on that architecture.
> Catalin, Will, could you please shed light on this?