Re: [WARNING][AMDGPU] WQ_MEM_RECLAIM with Radeon RX 6600

From: Tejun Heo
Date: Wed Dec 18 2024 - 13:09:11 EST


Hello, sorry about the delay.

On Mon, Dec 16, 2024 at 04:34:00PM -0800, Matthew Brost wrote:
> > However, after further discussion, I think the warning is actually a
> > false positive. See this discussion:
> > https://lists.freedesktop.org/archives/amd-gfx/2024-November/117349.html
> >
> > From the thread:
> > "Question is - does check_flush_dependency() need to skip the
> > !WQ_MEM_RECLAIM flushing WQ_MEM_RECLAIM warning *if* the work is already
> > running *and* it was called from cancel_delayed_work_sync()?"
> >
>
> See my reply just now [1] — I’m going to have to disagree with AMD's
> assessment, but I’m not certain.
>
> Again, I believe Tejun is the authority here.

I think we can skip the warning if the flushing is coming from
cancel*_work_sync() as flush takes place iff the work item already has a
worker running - ie. it can't be blocked from lack of memory. Tvrtko, can
you write up a patch to exclude the condition from check_flush_dependency()?
I think it can just skip check_flush_dependency() when @from_cancel is set.

Taking a step back, if an actual dependency develops in the future - memory
reclaim actually blocking on gpu work items, one way to handle that would be
adding subsystem-wide workqueues so that the rescuer can be shared across
GPU drivers / devices. As long as they don't depend on each other for making
forward progress, which they most likely wouldn't, sharing a rescuer across
them is completely fine.

Thanks.

--
tejun