Re: possible deadlock in sk_clone_lock

From: Michal Hocko
Date: Tue Mar 02 2021 - 05:12:36 EST


On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
> On 3/1/21 9:23 AM, Michal Hocko wrote:
> > On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> >> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
> > [...]
> >>> Then how come this can ever be a problem? in_task() should exclude soft
> >>> irq context unless I am mistaken.
> >>>
> >>
> >> If I take the following example of syzbot's deadlock scenario then
> >> CPU1 is the one freeing the hugetlb pages. It is in the process
> >> context but has disabled softirqs (see __tcp_close()).
> >>
> >> CPU0 CPU1
> >> ---- ----
> >> lock(hugetlb_lock);
> >> local_irq_disable();
> >> lock(slock-AF_INET);
> >> lock(hugetlb_lock);
> >> <Interrupt>
> >> lock(slock-AF_INET);
> >>
> >> So, this deadlock scenario is very much possible.
> >
> > OK, I see the point now. I was focusing on the IRQ context and hugetlb
> > side too much. We do not need to be freeing from there. All it takes is
> > to get a dependency chain over a common lock held here. Thanks for
> > bearing with me.
> >
> > Let's see whether we can make hugetlb_lock irq safe.
>
> I may be confused, but it seems like we have a general problem with
> calling free_huge_page (as a result of put_page) with interrupts
> disabled.
>
> Consider the current free_huge_page code. Today, we drop the lock
> when processing gigantic pages because we may need to block on a mutex
> in cma code. If our caller has disabled interrupts, then it doesn't
> matter if the hugetlb lock is irq safe, when we drop it interrupts will
> still be disabled we can not block . Right? If correct, then making
> hugetlb_lock irq safe would not help.
>
> Again, I may be missing something.
>
> Note that we also are considering doing more with the hugetlb lock
> dropped in this path in the 'free vmemmap of hugetlb pages' series.
>
> Since we need to do some work that could block in this path, it seems
> like we really need to use a workqueue. It is too bad that there is not
> an interface to identify all the cases where interrupts are disabled.

Wouldn't something like this help? It is quite ugly but it would be
simple enough and backportable while we come up with a more rigorous
solution. What do you think?

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4bdb58ab14cb..c9a8b39f678d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, free_hpage_workfn);
void free_huge_page(struct page *page)
{
/*
- * Defer freeing if in non-task context to avoid hugetlb_lock deadlock.
+ * Defer freeing if in non-task context or when put_page is called
+ * with IRQ disabled (e.g from via TCP slock dependency chain) to
+ * avoid hugetlb_lock deadlock.
*/
- if (!in_task()) {
+ if (!in_task() || irqs_disabled()) {
/*
* Only call schedule_work() if hpage_freelist is previously
* empty. Otherwise, schedule_work() had been called but the
--
Michal Hocko
SUSE Labs