Re: 82fef0ad811f "x86/mm: unencrypted non-blocking DMA allocations use coherent pools" was Re: next-0519 on thinkpad x60: sound related? window manager crash

From: David Rientjes
Date: Sun Jun 07 2020 - 20:58:14 EST


On Sun, 7 Jun 2020, Alex Xu (Hello71) wrote:

> > On Sun, 7 Jun 2020, Pavel Machek wrote:
> >
> >> > I have a similar issue, caused between aaa2faab4ed8 and b170290c2836.
> >> >
> >> > [ 20.263098] BUG: unable to handle page fault for address: ffffb2b582cc2000
> >> > [ 20.263104] #PF: supervisor write access in kernel mode
> >> > [ 20.263105] #PF: error_code(0x000b) - reserved bit violation
> >> > [ 20.263107] PGD 3fd03b067 P4D 3fd03b067 PUD 3fd03c067 PMD 3f8822067 PTE 8000273942ab2163
> >> > [ 20.263113] Oops: 000b [#1] PREEMPT SMP
> >> > [ 20.263117] CPU: 3 PID: 691 Comm: mpv Not tainted 5.7.0-11262-gb170290c2836 #1
> >> > [ 20.263119] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450 Pro4, BIOS P4.10 03/05/2020
> >> > [ 20.263125] RIP: 0010:__memset+0x24/0x30
> >> > [ 20.263128] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
> >> > [ 20.263131] RSP: 0018:ffffb2b583d07e10 EFLAGS: 00010216
> >> > [ 20.263133] RAX: 0000000000000000 RBX: ffff8b8000102c00 RCX: 0000000000004000
> >> > [ 20.263134] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb2b582cc2000
> >> > [ 20.263136] RBP: ffff8b8000101000 R08: 0000000000000000 R09: ffffb2b582cc2000
> >> > [ 20.263137] R10: 0000000000005356 R11: ffff8b8000102c18 R12: 0000000000000000
> >> > [ 20.263139] R13: 0000000000000000 R14: ffff8b8039944200 R15: ffffffff9794daa0
> >> > [ 20.263141] FS: 00007f41aa4b4200(0000) GS:ffff8b803ecc0000(0000) knlGS:0000000000000000
> >> > [ 20.263143] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> > [ 20.263144] CR2: ffffb2b582cc2000 CR3: 00000003b6731000 CR4: 00000000003406e0
> >> > [ 20.263146] Call Trace:
> >> > [ 20.263151] ? snd_pcm_hw_params+0x3f3/0x47a
> >> > [ 20.263154] ? snd_pcm_common_ioctl+0xf2/0xf73
> >> > [ 20.263158] ? snd_pcm_ioctl+0x1e/0x29
> >> > [ 20.263161] ? ksys_ioctl+0x77/0x91
> >> > [ 20.263163] ? __x64_sys_ioctl+0x11/0x14
> >> > [ 20.263166] ? do_syscall_64+0x3d/0xf5
> >> > [ 20.263170] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >> > [ 20.263173] Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev snd_usb_audio videobuf2_common snd_hwdep snd_usbmidi_lib input_leds snd_rawmidi led_class
> >> > [ 20.263182] CR2: ffffb2b582cc2000
> >> > [ 20.263184] ---[ end trace c6b47a774b91f0a0 ]---
> >> > [ 20.263187] RIP: 0010:__memset+0x24/0x30
> >> > [ 20.263190] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
> >> > [ 20.263192] RSP: 0018:ffffb2b583d07e10 EFLAGS: 00010216
> >> > [ 20.263193] RAX: 0000000000000000 RBX: ffff8b8000102c00 RCX: 0000000000004000
> >> > [ 20.263195] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb2b582cc2000
> >> > [ 20.263196] RBP: ffff8b8000101000 R08: 0000000000000000 R09: ffffb2b582cc2000
> >> > [ 20.263197] R10: 0000000000005356 R11: ffff8b8000102c18 R12: 0000000000000000
> >> > [ 20.263199] R13: 0000000000000000 R14: ffff8b8039944200 R15: ffffffff9794daa0
> >> > [ 20.263201] FS: 00007f41aa4b4200(0000) GS:ffff8b803ecc0000(0000) knlGS:0000000000000000
> >> > [ 20.263202] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> > [ 20.263204] CR2: ffffb2b582cc2000 CR3: 00000003b6731000 CR4: 00000000003406e0
> >> >
> >> > I bisected this to 82fef0ad811f "x86/mm: unencrypted non-blocking DMA
> >> > allocations use coherent pools". Reverting 1ee18de92927 resolves the
> >> > issue.
> >> >
> >> > Looks like Thinkpad X60 doesn't have VT-d, but could still be DMA
> >> > related.
> >>
> >> Note that newer -next releases seem to behave okay for me. The commit
> >> pointed out by siection is really simple:
> >>
> >> AFAIK you could verify it is responsible by turning off
> >> CONFIG_AMD_MEM_ENCRYPT on latest kernel...
> >>
> >> Best regards,
> >> Pavel
> >>
> >> index 1d6104ea8af0..2bf2222819d3 100644
> >> --- a/arch/x86/Kconfig
> >> +++ b/arch/x86/Kconfig
> >> @@ -1520,6 +1520,7 @@ config X86_CPA_STATISTICS
> >> config AMD_MEM_ENCRYPT
> >> bool "AMD Secure Memory Encryption (SME) support"
> >> depends on X86_64 && CPU_SUP_AMD
> >> + select DMA_COHERENT_POOL
> >> select DYNAMIC_PHYSICAL_MASK
> >> select ARCH_USE_MEMREMAP_PROT
> >> select ARCH_HAS_FORCE_DMA_UNENCRYPTED
> >
> > Thanks for the report!
> >
> > Besides CONFIG_AMD_MEM_ENCRYPT, do you have CONFIG_DMA_DIRECT_REMAP
> > enabled? If so, it may be caused by the virtual address passed to the
> > set_memory_{decrypted,encrypted}() functions.
> >
> > And I assume you are enabling SME by using mem_encrypt=on on the kernel
> > command line or CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT is enabled.
> >
> > We likely need an atomic pool for devices that support DMA to addresses in
> > sme_me_mask as well. I can test this tomorrow, but wanted to get it out
> > early to see if it helps?
>
> This patch doesn't seem to help. I have the same problem (kernel page
> fault, __memset, snd_pcm_hw_params...).
>
> I don't have CONFIG_DMA_DIRECT_REMAP enabled, and AFAICT it doesn't seem
> to be selectable currently on x86, unless there are some patches
> floating around for that.
>

Thanks for trying it out, Alex. Would you mind sending your .config and
command line? I assume either mem_encrypt=on or
CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT is enabled.

Could you also give this a try?

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -99,10 +99,11 @@ static inline bool dma_should_alloc_from_pool(struct device *dev, gfp_t gfp,
static inline bool dma_should_free_from_pool(struct device *dev,
unsigned long attrs)
{
- if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
+ if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
+ return false;
+ if (force_dma_unencrypted(dev))
return true;
- if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
- !force_dma_unencrypted(dev))
+ if (attrs & DMA_ATTR_NO_KERNEL_MAPPING)
return false;
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP))
return true;