Re: [BUG] seltests/iommu: runaway ./iommufd consuming 99% CPU after a failed assert()
From: Joao Martins
Date: Mon Mar 25 2024 - 16:25:25 EST
On 23/03/2024 20:13, Mirsad Todorovac wrote:
>
>
> On 3/19/24 14:58, Jason Gunthorpe wrote:
>> On Tue, Mar 12, 2024 at 07:35:40AM +0100, Mirsad Todorovac wrote:
>>> Hi,
>>>
>>> (This is verified on the second test box.)
>>>
>>> In the most recent 6.8.0 release of torvalds tree kernel with selftest
>>> configs on,
>>> process ./iommufd appears to consume 99% of a CPU core for quote a while in an
>>> endless loop:
>>
>> There is a "bug" in the ksefltest framework where if you call a
>> kselftest assertion from the setup/teardown it infinite loops
>>
>> The fix I know is to replace kselftest assertions with normal assert()
>>
>> But I don't see an obvious thing here saying you are hitting that..
>>
>> Jason
>
> Hi,
>
> I'm not that deep into kselftest for that intervention.
>
> Yet, with the v6.8-11743-ga4145ce1e7bc build, the problem with ./iommufd did not
> stuck.
> Instead I got these 10 failed tests:
>
> # # RUN iommufd_dirty_tracking.domain_dirty128M_huge.enforce_dirty ...
> # iommufd: iommufd.c:1749: iommufd_dirty_tracking_setup: Assertion `vrc ==
> self->buffer' failed.
> # # enforce_dirty: Test terminated by assertion
> # # FAIL iommufd_dirty_tracking.domain_dirty128M_huge.enforce_dirty
> # not ok 156 iommufd_dirty_tracking.domain_dirty128M_huge.enforce_dirty
> # # RUN
> iommufd_dirty_tracking.domain_dirty128M_huge.set_dirty_tracking ...
> # iommufd: iommufd.c:1749: iommufd_dirty_tracking_setup: Assertion `vrc ==
> self->buffer' failed.
> # # set_dirty_tracking: Test terminated by assertion
> # # FAIL iommufd_dirty_tracking.domain_dirty128M_huge.set_dirty_tracking
> # not ok 157 iommufd_dirty_tracking.domain_dirty128M_huge.set_dirty_tracking
> # # RUN
> iommufd_dirty_tracking.domain_dirty128M_huge.device_dirty_capability ...
> # iommufd: iommufd.c:1749: iommufd_dirty_tracking_setup: Assertion `vrc ==
> self->buffer' failed.
> # # device_dirty_capability: Test terminated by assertion
> # # FAIL
> iommufd_dirty_tracking.domain_dirty128M_huge.device_dirty_capability
> # not ok 158 iommufd_dirty_tracking.domain_dirty128M_huge.device_dirty_capability
> # # RUN iommufd_dirty_tracking.domain_dirty128M_huge.get_dirty_bitmap
> ...
> # iommufd: iommufd.c:1749: iommufd_dirty_tracking_setup: Assertion `vrc ==
> self->buffer' failed.
> # # get_dirty_bitmap: Test terminated by assertion
> # # FAIL iommufd_dirty_tracking.domain_dirty128M_huge.get_dirty_bitmap
> # not ok 159 iommufd_dirty_tracking.domain_dirty128M_huge.get_dirty_bitmap
> # # RUN
> iommufd_dirty_tracking.domain_dirty128M_huge.get_dirty_bitmap_no_clear ...
> # iommufd: iommufd.c:1749: iommufd_dirty_tracking_setup: Assertion `vrc ==
> self->buffer' failed.
> # # get_dirty_bitmap_no_clear: Test terminated by assertion
> # # FAIL
> iommufd_dirty_tracking.domain_dirty128M_huge.get_dirty_bitmap_no_clear
> # not ok 160 iommufd_dirty_tracking.domain_dirty128M_huge.get_dirty_bitmap_no_clear
> .
> .
> .
> # # RUN iommufd_dirty_tracking.domain_dirty256M_huge.enforce_dirty ...
> # iommufd: iommufd.c:1749: iommufd_dirty_tracking_setup: Assertion `vrc ==
> self->buffer' failed.
> # # enforce_dirty: Test terminated by assertion
> # # FAIL iommufd_dirty_tracking.domain_dirty256M_huge.enforce_dirty
> # not ok 166 iommufd_dirty_tracking.domain_dirty256M_huge.enforce_dirty
> # # RUN
> iommufd_dirty_tracking.domain_dirty256M_huge.set_dirty_tracking ...
> # iommufd: iommufd.c:1749: iommufd_dirty_tracking_setup: Assertion `vrc ==
> self->buffer' failed.
> # # set_dirty_tracking: Test terminated by assertion
> # # FAIL iommufd_dirty_tracking.domain_dirty256M_huge.set_dirty_tracking
> # not ok 167 iommufd_dirty_tracking.domain_dirty256M_huge.set_dirty_tracking
> # # RUN
> iommufd_dirty_tracking.domain_dirty256M_huge.device_dirty_capability ...
> # iommufd: iommufd.c:1749: iommufd_dirty_tracking_setup: Assertion `vrc ==
> self->buffer' failed.
> # # device_dirty_capability: Test terminated by assertion
> # # FAIL
> iommufd_dirty_tracking.domain_dirty256M_huge.device_dirty_capability
> # not ok 168 iommufd_dirty_tracking.domain_dirty256M_huge.device_dirty_capability
> # # RUN iommufd_dirty_tracking.domain_dirty256M_huge.get_dirty_bitmap
> ...
> # iommufd: iommufd.c:1749: iommufd_dirty_tracking_setup: Assertion `vrc ==
> self->buffer' failed.
> # # get_dirty_bitmap: Test terminated by assertion
> # # FAIL iommufd_dirty_tracking.domain_dirty256M_huge.get_dirty_bitmap
> # not ok 169 iommufd_dirty_tracking.domain_dirty256M_huge.get_dirty_bitmap
> # # RUN
> iommufd_dirty_tracking.domain_dirty256M_huge.get_dirty_bitmap_no_clear ...
> # iommufd: iommufd.c:1749: iommufd_dirty_tracking_setup: Assertion `vrc ==
> self->buffer' failed.
> # # get_dirty_bitmap_no_clear: Test terminated by assertion
> # # FAIL
> iommufd_dirty_tracking.domain_dirty256M_huge.get_dirty_bitmap_no_clear
> # not ok 170 iommufd_dirty_tracking.domain_dirty256M_huge.get_dirty_bitmap_no_clear
> .
> .
> .
> # # FAILED: 170 / 180 tests passed.
> # # Totals: pass:170 fail:10 xfail:0 xpass:0 skip:0 error:0
> not ok 1 selftests: iommu: iommufd # exit=1
>
> It seems like the same assertion failed in all 10 failed tests?
>
.. It means that the hugetlb mmap() failed, which is required for this specific
tests. Because we need to allocate a bigger IOVA range, and in hugepages to
exercise the test.
> However, I am not smart enough to figure out why ...
>
> Apparently, from the source, mmap() fails to allocate pages on the desired address:
>
> 1746 assert((uintptr_t)self->buffer % HUGEPAGE_SIZE == 0);
> 1747 vrc = mmap(self->buffer, variant->buffer_size, PROT_READ |
> PROT_WRITE,
> 1748 mmap_flags, -1, 0);
> → 1749 assert(vrc == self->buffer);
> 1750
>
> But I am not that deep into the source to figure our what was intended and what
> went
> wrong :-/
I can SKIP() the test rather assert() in here if it helps. Though there are
other tests that fail if no hugetlb pages are reserved.
But I am not sure if this is problem here as the initial bug email had an
enterily different set of failures? Maybe all you need is an assert() and it
gets into this state?
Joao