On 3/13/25 3:49 PM, David Hildenbrand wrote:
On 12.03.25 16:21, Christoph Hellwig wrote:
On Fri, Mar 07, 2025 at 08:23:08PM +0000, Matthew Wilcox wrote:
Howver, the problem is real.
What is the problem?
I think the problem is the CMA allocation failure, not the latency.
"if a large amount of direct IO is requested constantly, this can make
pages in CMA pageblocks pinned and unable to migrate outside of the
pageblock"
We'd need a more reliable way to make CMA allocation -> page migration
make progress. For example, after we isolated the pageblocks and
migration starts doing its thing, we could disallow any further GUP
pins. (e.g., make GUP spin or wait for migration to end)
We could detect in GUP code that a folio is soon expected to be migrated
by checking the pageblock (isolated) and/or whether the folio is locked.
Jason Gunthorpe and Matthew both had some ideas about how to fix this [1],
which were very close (maybe the same) to what you're saying here: sleep
and spin in an killable loop.
It turns out to be a little difficult to do this--I had trouble making
the folio's "has waiters" bit work for this, for example. And then...squirrel!
However, I still believe, so far, this is the right approach. I'm just not
sure which thing to wait on, exactly.