Re: mmiotracer hangs the system

From: Karol Herbst
Date: Thu Nov 24 2016 - 15:50:40 EST


sorry for that, but I forgot the patch

2016-11-19 11:56 GMT+01:00 Karol Herbst <karolherbst@xxxxxxxxx>:
> this is odd, I found a bug related to nouveau (modprobe/bind doesn't
> return), but that isn't related to your issue at all or maybe it is
> exactly this, cause the binding of the device doesn't return and
> depending on the kind of driver, it would hang the system... yeah,
> maybe it is the same issue.
>
> anyway, could you try to trace with the attached patch? Maybe the
> additional output would help me to verify it. Currently I am working
> on the bugfix I mentioned above and this may also fix your issue. I
> was still able to get a working mmiotrace file, even if the dvice
> binding didn't finish. Is this the same for you? (try cat
> "/sys/kernel/debug/tracing/trace_pipe > some_file"; and see if this
> contains anything usefull).
>
> This really looks like an odd issue, because the mmiotracer still
> behaves as expected.
>
> 2016-10-22 18:02 GMT+02:00 Andy Shevchenko <andy.shevchenko@xxxxxxxxx>:
>> On Fri, Oct 14, 2016 at 12:12 AM, Karol Herbst <karolherbst@xxxxxxxxx> wrote:
>>> sorry for the delay fixing that bug. I got occupied with other things
>>> and didn't really got to the issue again, it is on my todo list as the
>>> next item though and I hope I will be able to get a fix ready this
>>> weekend. I think I might know where the issue is, but didn't confirm
>>> it yet.
>>
>> Thanks.I'm still using revert. Feel free to Cc me when you will have
>> some material to test.
>>
>>>
>>> Again, sorry for the delay.
>>>
>>> Karol
>>>
>>> 2016-08-19 22:46 GMT+02:00 Karol Herbst <karolherbst@xxxxxxxxx>:
>>>> Hi again,
>>>>
>>>> I was able to get a crash/freeze/something while unbinding/binding my
>>>> nvidia gpu from nouveau.
>>>>
>>>> Guess that means something is odd. I will investigate this more over
>>>> the weekend.
>>>>
>>>> 2016-08-19 17:35 GMT+02:00 Andy Shevchenko <andy.shevchenko@xxxxxxxxx>:
>>>>> On Fri, Aug 19, 2016 at 6:08 PM, karol herbst <karolherbst@xxxxxxxxx> wrote:
>>>>>> 2016-08-19 15:02 GMT+02:00 Andy Shevchenko <andy.shevchenko@xxxxxxxxx>:
>>>>>>> On Fri, Aug 19, 2016 at 1:35 PM, karol herbst <karolherbst@xxxxxxxxx> wrote:
>>>>>>>> is there any update on that issue I missed somehow? I really don't
>>>>>>>> want to leave the mmiotracer in a state, where it breaks something
>>>>>>>> while fixing other issues.
>>>>>>>
>>>>>>> No updates. I'm busy right now with more priority tasks and revert
>>>>>>> works for me. Issue is reproducible in my case 100%.
>>>>>>>
>>>>>>
>>>>>> Is there something I could do with a "normal" haswell desktop system
>>>>>> to reproduce this issue?
>>>>>
>>>>> Try LPSS UART device(s)
>>>>>
>>>>>>
>>>>>> I'll try to play around the next days a bit and maybe I find something
>>>>>> that works out here as well. It seems to be related to
>>>>>> unmapping-mapping cycles.
>>>>>
>>>>> That is the only thing I would think of.
>>>>>
>>>>>>
>>>>>> Because if this only happens with the pwm-lpss driver,
>>>>>
>>>>> It has nothing to do with pwm-lpss since it's a HS UART and served by
>>>>> intel-lpss driver.
>>>>>
>>>>>> it may be
>>>>>> really troublesome to debug, because I don't really know the code that
>>>>>> well to be sure where the issue might be.
>>>>>>
>>>>>>> So, I would able to attach dmesg in case it would be helpful.
>>>>>>> Otherwise tell me exact instructions how to debug the issue.
>>>>>>>
>>>>>>> Here you are:
>>>>>>> http://pastebin.com/raw/VfTZENt7
>>>>>>>
>>>>>>>> But for now, without being able to even reproduce the issue, I can't
>>>>>>>> really do much, because the code in the current state looks sane to
>>>>>>>> me. Maybe this case includes the mmiotracer cleaning things up and
>>>>>>>> arms new region for mmiotracing and that's why it fails? Besides that,
>>>>>>>> I have no idea and no way to reproduce this, so I can't help this way.
>>>>>>>
>>>>>>> Maybe. First thing happened is iounmap().
>>>>>
>>>>>
>>>>> --
>>>>> With Best Regards,
>>>>> Andy Shevchenko
>>
>>
>>
>> --
>> With Best Regards,
>> Andy Shevchenko
From 92aea447a776f10aad0a2e971b5f2b208a1161d2 Mon Sep 17 00:00:00 2001
From: Karol Herbst <nouveau@xxxxxxxxxxxxxx>
Date: Thu, 24 Nov 2016 21:46:27 +0100
Subject: [PATCH] temp hack

---
arch/x86/mm/kmmio.c | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/kmmio.c b/arch/x86/mm/kmmio.c
index afc47f5c9531..a002ee314a0c 100644
--- a/arch/x86/mm/kmmio.c
+++ b/arch/x86/mm/kmmio.c
@@ -97,11 +97,16 @@ static DEFINE_PER_CPU(struct kmmio_context, kmmio_ctx);
static struct kmmio_probe *get_kmmio_probe(unsigned long addr)
{
struct kmmio_probe *p;
+ struct kmmio_probe *result = NULL;
list_for_each_entry_rcu(p, &kmmio_probes, list) {
- if (addr >= p->addr && addr < (p->addr + p->len))
- return p;
+ if (addr >= p->addr && addr < (p->addr + p->len)) {
+ if (!result)
+ result = p;
+ else
+ printk(KERN_ERR " %s collision detected %lu", __FUNCTION__, addr);
+ }
}
- return NULL;
+ return result;
}

/* You must be holding RCU read lock. */
@@ -109,6 +114,7 @@ static struct kmmio_fault_page *get_kmmio_fault_page(unsigned long addr)
{
struct list_head *head;
struct kmmio_fault_page *f;
+ struct kmmio_fault_page *result = NULL;
unsigned int l;
pte_t *pte = lookup_address(addr, &l);

@@ -116,11 +122,16 @@ static struct kmmio_fault_page *get_kmmio_fault_page(unsigned long addr)
return NULL;
addr &= page_level_mask(l);
head = kmmio_page_list(addr);
+
list_for_each_entry_rcu(f, head, list) {
- if (f->addr == addr)
- return f;
+ if (f->addr == addr) {
+ if (!result)
+ return f;
+ else
+ printk(KERN_ERR " %s collision detected %lu", __FUNCTION__, addr);
+ }
}
- return NULL;
+ return result;
}

static void clear_pmd_presence(pmd_t *pmd, bool clear, pmdval_t *old)
@@ -375,6 +386,7 @@ static int add_kmmio_fault_page(unsigned long addr)
{
struct kmmio_fault_page *f;

+ printk(KERN_WARNING " %s %lx", __FUNCTION__, addr);
f = get_kmmio_fault_page(addr);
if (f) {
if (!f->count)
@@ -406,6 +418,7 @@ static void release_kmmio_fault_page(unsigned long addr,
{
struct kmmio_fault_page *f;

+ printk(KERN_WARNING " %s %lx", __FUNCTION__, addr);
f = get_kmmio_fault_page(addr);
if (!f)
return;
@@ -445,6 +458,8 @@ int register_kmmio_probe(struct kmmio_probe *p)
}

pte = lookup_address(p->addr, &l);
+ printk(KERN_WARNING " %s %lx %u", __FUNCTION__, p->addr, l);
+
if (!pte) {
ret = -EINVAL;
goto out;
@@ -537,6 +552,8 @@ void unregister_kmmio_probe(struct kmmio_probe *p)
if (!pte)
return;

+ printk(KERN_WARNING " %s %lx %u", __FUNCTION__, p->addr, l);
+
spin_lock_irqsave(&kmmio_lock, flags);
while (size < size_lim) {
release_kmmio_fault_page(p->addr + size, &release_list);
--
2.11.0.rc2