Re: kernel BUG at mm/huge_memory.c:2613!

From: Andrea Arcangeli
Date: Thu Jun 18 2020 - 22:40:37 EST


Hello,

On Thu, Jun 18, 2020 at 06:14:49PM -0700, Roman Gushchin wrote:
> I agree. The whole
>
> page = alloc_pages_node(nid, alloc_flags, order);
> if (!page)
> continue;
> if (!order)
> break;
> if (!PageCompound(page)) {
> split_page(page, order);
> break;
> } else if (!split_huge_page(page)) {
> break;
> }
>
> looks very suspicious to me.
> My wild guess is that gfp flags changed somewhere above, so we hit
> the branch which was never hit before.

Right to be suspicious about the above: split_huge_page on a regular
page allocated by a driver was never meant to work.

The PageLocked BUG_ON is just a symptom of a bigger issue, basically
split_huge_page it may survive, but it'll stay compound and in turn it
must be freed as compound.

The respective free method doesn't even contemplate freeing compound
pages, the only way the free method can survive, is by removing
__GFP_COMP forcefully in the allocation that was perhaps set here
(there are that many __GFP_COMP in that directory):

static void snd_malloc_dev_pages(struct snd_dma_buffer *dmab, size_t size)
{
gfp_t gfp_flags;

gfp_flags = GFP_KERNEL
| __GFP_COMP /* compound page lets parts be mapped */

And I'm not sure what the comment means here, compound or non compound
doesn't make a difference when you map it, it's not a THP, the
mappings must be handled manually so nothing should check PG_compound
anyway in the mapping code.

Something like this may improve things, it's an untested quick hack,
but this assumes it's always a bug to setup a compound page for these
DMA allocations and given the API it's probably a correct
assumption.. Compound is slower, unless you need it, you can avoid it
and then split_page will give contiguous memory page granular. Ideally
the code shouldn't call split_page at all and it should free it all at
once by keeping track of the order and by returning the order to the
caller, something the API can't do right now as it returns a plain
array that can only represent individual small pages.

Once this is resolved, you may want to check your config, iommu passthrough
sounds more optimal for a soundcard.

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f68a62c3c32b..3dfbc010fa83 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -499,6 +499,10 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev,

/* IOMMU can map any pages, so himem can also be used here */
gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
+ if (unlikely(gfp & __GFP_COMP)) {
+ WARN();
+ gfp &= ~__GFP_COMP;
+ }

while (count) {
struct page *page = NULL;
@@ -522,13 +526,8 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev,
continue;
if (!order)
break;
- if (!PageCompound(page)) {
- split_page(page, order);
- break;
- } else if (!split_huge_page(page)) {
- break;
- }
- __free_pages(page, order);
+ split_page(page, order);
+ break;
}
if (!page) {
__iommu_dma_free_pages(pages, i);
diff --git a/sound/core/memalloc.c b/sound/core/memalloc.c
index 6850d13aa98c..378f5a36ec5f 100644
--- a/sound/core/memalloc.c
+++ b/sound/core/memalloc.c
@@ -28,7 +28,6 @@ static void snd_malloc_dev_pages(struct snd_dma_buffer *dmab, size_t size)
gfp_t gfp_flags;

gfp_flags = GFP_KERNEL
- | __GFP_COMP /* compound page lets parts be mapped */
| __GFP_NORETRY /* don't trigger OOM-killer */
| __GFP_NOWARN; /* no stack trace print - this call is non-critical */
dmab->area = dma_alloc_coherent(dmab->dev.dev, size, &dmab->addr,