Re: [PATCH 2/2] block: allow different-pgmap pages as separate bvecs in bio_add_page
From: Naman Jain
Date: Thu Apr 02 2026 - 05:01:27 EST
On 4/2/2026 11:00 AM, Christoph Hellwig wrote:
On Thu, Apr 02, 2026 at 10:51:05AM +0530, Naman Jain wrote:
When a direct I/O request spans pages from different chunks (different
pgmaps), the current code rejected the second page entirely:
if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
return 0; // Rejection - forces bio split or I/O error
Both chunks are regular RAM from the DMA perspective
(MEMORY_DEVICE_GENERIC, not P2PDMA). The only requirement is that they not
be merged into the same bvec segment, which patch 1/2 enforces by adding
the pgmap check to biovec_phys_mergeable().
This patch allows pages from different pgmaps to be added as separate bvec
entries in the same bio, eliminating bio splits and I/O failures
when buffers span pgmap boundaries.
Which as I said we can't do in general, as different pgmaps cna have
different DMA mapping requirements. We might be able to relax this
if we know multiple pgmaps can be mapped in the same way. I.e.
replace zone_device_pages_have_same_pgmap with
zone_device_pages_compatible and add additional conditions to it.
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -231,6 +231,9 @@ int bio_integrity_add_page(struct bio *bio, struct page
*page,
if (bip->bip_vcnt > 0) {
struct bio_vec *bv = &bip->bip_vec[bip->bip_vcnt - 1];
+ if (is_pci_p2pdma_page(bv->bv_page) !=
+ is_pci_p2pdma_page(page))
+ return 0;
if (zone_device_pages_have_same_pgmap(bv->bv_page, page) &&
The above is implied by not having the same pgmap.
Thanks. If I understand correctly, here is how this would look like.
Please let me know if this is what you suggested.
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index e79eaf0477943..e54c6e06e1cbb 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -231,10 +231,10 @@ int bio_integrity_add_page(struct bio *bio, struct page *page,
if (bip->bip_vcnt > 0) {
struct bio_vec *bv = &bip->bip_vec[bip->bip_vcnt - 1];
- if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
+ if (!zone_device_pages_compatible(bv->bv_page, page))
return 0;
-
- if (bvec_try_merge_hw_page(q, bv, page, len, offset)) {
+ if (zone_device_pages_have_same_pgmap(bv->bv_page, page) &&
+ bvec_try_merge_hw_page(q, bv, page, len, offset)) {
bip->bip_iter.bi_size += len;
return len;
}
diff --git a/block/bio.c b/block/bio.c
index 77067fa346d35..0e70bb912338c 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1034,10 +1034,10 @@ int bio_add_page(struct bio *bio, struct page *page,
if (bio->bi_vcnt > 0) {
struct bio_vec *bv = &bio->bi_io_vec[bio->bi_vcnt - 1];
- if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
+ if (!zone_device_pages_compatible(bv->bv_page, page))
return 0;
-
- if (bvec_try_merge_page(bv, page, len, offset)) {
+ if (zone_device_pages_have_same_pgmap(bv->bv_page, page) &&
+ bvec_try_merge_page(bv, page, len, offset)) {
bio->bi_iter.bi_size += len;
return len;
}
diff --git a/block/blk.h b/block/blk.h
index 0cb3441638284..c5710ba4c81b9 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -136,6 +136,23 @@ static inline bool biovec_phys_mergeable(struct request_queue *q,
return true;
}
+/*
+ * Check if two pages from potentially different zone device pgmaps can
+ * coexist as separate bvec entries in the same bio.
+ *
+ * The block DMA iterator (blk_dma_map_iter_start) caches the P2PDMA mapping
+ * state from the first segment and applies it to all subsequent segments, so
+ * P2PDMA and non-P2PDMA pages must never be mixed in the same bio.
+ *
+ * Other zone device types (FS_DAX, GENERIC) use the same dma_map_phys() path
+ * as normal RAM. PRIVATE and COHERENT pages never appear in bios.
+ */
+static inline bool zone_device_pages_compatible(const struct page *a,
+ const struct page *b)
+{
+ return is_pci_p2pdma_page(a) == is_pci_p2pdma_page(b);
+}
+
static inline bool __bvec_gap_to_prev(const struct queue_limits *lim,
struct bio_vec *bprv, unsigned int offset)
{