Re: [dm-devel] new patchset to eliminate DM's use of BIOSET_NEED_RESCUER

From: NeilBrown
Date: Tue Nov 21 2017 - 23:01:09 EST


On Tue, Nov 21 2017, Mikulas Patocka wrote:

> On Tue, 21 Nov 2017, Mike Snitzer wrote:
>
>> On Tue, Nov 21 2017 at 4:23pm -0500,
>> Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:
>>
>> > This is not correct:
>> >
>> > 2206 static void dm_wq_work(struct work_struct *work)
>> > 2207 {
>> > 2208 struct mapped_device *md = container_of(work, struct mapped_device, work);
>> > 2209 struct bio *bio;
>> > 2210 int srcu_idx;
>> > 2211 struct dm_table *map;
>> > 2212
>> > 2213 if (!bio_list_empty(&md->rescued)) {
>> > 2214 struct bio_list list;
>> > 2215 spin_lock_irq(&md->deferred_lock);
>> > 2216 list = md->rescued;
>> > 2217 bio_list_init(&md->rescued);
>> > 2218 spin_unlock_irq(&md->deferred_lock);
>> > 2219 while ((bio = bio_list_pop(&list)))
>> > 2220 generic_make_request(bio);
>> > 2221 }
>> > 2222
>> > 2223 map = dm_get_live_table(md, &srcu_idx);
>> > 2224
>> > 2225 while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
>> > 2226 spin_lock_irq(&md->deferred_lock);
>> > 2227 bio = bio_list_pop(&md->deferred);
>> > 2228 spin_unlock_irq(&md->deferred_lock);
>> > 2229
>> > 2230 if (!bio)
>> > 2231 break;
>> > 2232
>> > 2233 if (dm_request_based(md))
>> > 2234 generic_make_request(bio);
>> > 2235 else
>> > 2236 __split_and_process_bio(md, map, bio);
>> > 2237 }
>> > 2238
>> > 2239 dm_put_live_table(md, srcu_idx);
>> > 2240 }
>> >
>> > You can see that if we are in dm_wq_work in __split_and_process_bio, we
>> > will not process md->rescued list.
>>
>> Can you elaborate further? We cannot be "in dm_wq_work in
>> __split_and_process_bio" simultaneously. Do you mean as a side-effect
>> of scheduling away from __split_and_process_bio?
>>
>> The more detail you can share the better.
>
> Suppose this scenario:
>
> * dm_wq_work calls __split_and_process_bio
> * __split_and_process_bio eventually reaches the function snapshot_map
> * snapshot_map attempts to take the snapshot lock
>
> * the snapshot lock could be released only if some bios submitted by the
> snapshot driver to the underlying device complete
> * the bios submitted to the underlying device were already offloaded by
> some other task and they are waiting on the list md->rescued
> * the bios waiting on md->rescued are not processed, because dm_wq_work is
> blocked in snapshot_map (called from __split_and_process_bio)

Yes, I think you are right.

I think the solution is to get rid of the dm_offload() infrastructure
and make it not necessary.
i.e. discard my patches
dm: prepare to discontinue use of BIOSET_NEED_RESCUER
and
dm: revise 'rescue' strategy for bio-based bioset allocations

And build on "dm: ensure bio submission follows a depth-first tree walk"
which was written after those and already makes dm_offload() less
important.

Since that "depth-first" patch, every request to the dm device, after
the initial splitting, allocates just one dm_target_io structure, and
makes just one __map_bio() call, and so will behave exactly the way
generic_make_request() expects and copes with - thus avoiding awkward
dependencies and deadlocks. Except....

a/ If any target defines ->num_write_bios() to return >1,
__clone_and_map_data_bio() will make multiple calls to alloc_tio()
and __map_bio(), which might need rescuing.
But no target defines num_write_bios, and none have since it was
removed from dm-cache 4.5 years ago.
Can we discard num_write_bios??

b/ If any target sets any of num_{flush,discard,write_same,write_zeroes}_bios
to a value > 1, then __send_duplicate_bios() will also make multiple
calls to alloc_tio() and __map_bio().
Some do.
dm-cache-target: flush=2
dm-snap: flush=2
dm-stripe: discard, write_same, write_zeroes all set to 'stripes'.

These will only be a problem if the second (or subsequent) alloc_tio()
blocks waiting for an earlier allocation to complete. This will only
be a problem if multiple threads are each trying to allocate multiple
dm_target_io from the same bioset at the same time.
This is rare and should be easier to address than the current
dm_offload() approach.
One possibility would be to copy the approach taken by
crypt_alloc_buffer() which needs to allocate multiple entries from a
mempool.
It first tries the with GFP_NOWAIT. If that fails it take a mutex and
tries with GFP_NOIO. This mean only one thread will try to allocate
multiple bios at once, so there can be no deadlock.

Below are two RFC patches. The first removes num_write_bios.
The second is incomplete and makes a stab are allocating multiple bios
at once safely.
A third would be needed to remove dm_offload() etc... but I cannot quite
fit that in today - must be off.

Thanks,
NeilBrown

From: NeilBrown <neilb@xxxxxxxx>
Date: Wed, 22 Nov 2017 14:25:18 +1100
Subject: [PATCH] DM: remove num_write_bios target interface.

No target provides num_write_bios and none has done
since 2013.
Having the possibility of num_write_bios > 1 complicates
bio allocation.
So remove the interface and assume there is only one bio
needed.
If a target ever needs more, it must provide a suitable
bioset and allocate itself based on its particular needs.

Signed-off-by: NeilBrown <neilb@xxxxxxxx>
---
drivers/md/dm.c | 22 ++++------------------
include/linux/device-mapper.h | 15 ---------------
2 files changed, 4 insertions(+), 33 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index b20febd6cbc7..8c1a05609eea 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1323,27 +1323,13 @@ static int __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti,
{
struct bio *bio = ci->bio;
struct dm_target_io *tio;
- unsigned target_bio_nr;
- unsigned num_target_bios = 1;
int r = 0;

- /*
- * Does the target want to receive duplicate copies of the bio?
- */
- if (bio_data_dir(bio) == WRITE && ti->num_write_bios)
- num_target_bios = ti->num_write_bios(ti, bio);
-
- for (target_bio_nr = 0; target_bio_nr < num_target_bios; target_bio_nr++) {
- tio = alloc_tio(ci, ti, target_bio_nr);
- tio->len_ptr = len;
- r = clone_bio(tio, bio, sector, *len);
- if (r < 0) {
- free_tio(tio);
- break;
- }
+ tio = alloc_tio(ci, ti, 0);
+ tio->len_ptr = len;
+ r = clone_bio(tio, bio, sector, *len);
+ if (r >= 0)
__map_bio(tio);
- }
-
return r;
}

diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index a5538433c927..5a68b366e664 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -220,14 +220,6 @@ struct target_type {
#define DM_TARGET_WILDCARD 0x00000008
#define dm_target_is_wildcard(type) ((type)->features & DM_TARGET_WILDCARD)

-/*
- * Some targets need to be sent the same WRITE bio severals times so
- * that they can send copies of it to different devices. This function
- * examines any supplied bio and returns the number of copies of it the
- * target requires.
- */
-typedef unsigned (*dm_num_write_bios_fn) (struct dm_target *ti, struct bio *bio);
-
/*
* A target implements own bio data integrity.
*/
@@ -291,13 +283,6 @@ struct dm_target {
*/
unsigned per_io_data_size;

- /*
- * If defined, this function is called to find out how many
- * duplicate bios should be sent to the target when writing
- * data.
- */
- dm_num_write_bios_fn num_write_bios;
-
/* target specific data */
void *private;

--
2.14.0.rc0.dirty


-----------------------------------
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 8c1a05609eea..8762661df2ef 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1265,8 +1265,7 @@ static int clone_bio(struct dm_target_io *tio, struct bio *bio,
}

static struct dm_target_io *alloc_tio(struct clone_info *ci,
- struct dm_target *ti,
- unsigned target_bio_nr)
+ struct dm_target *ti)
{
struct dm_target_io *tio;
struct bio *clone;
@@ -1276,34 +1275,66 @@ static struct dm_target_io *alloc_tio(struct clone_info *ci,

tio->io = ci->io;
tio->ti = ti;
- tio->target_bio_nr = target_bio_nr;
+ tio->target_bio_nr = 0;

return tio;
}

-static void __clone_and_map_simple_bio(struct clone_info *ci,
- struct dm_target *ti,
- unsigned target_bio_nr, unsigned *len)
+static void alloc_multiple_bios(struct bio_list *blist, struct clone_info *ci,
+ struct dm_target *ti, unsigned num_bios)
{
- struct dm_target_io *tio = alloc_tio(ci, ti, target_bio_nr);
- struct bio *clone = &tio->clone;
+ int try;

- tio->len_ptr = len;
+ for (try = 0; try < 2; try++) {
+ int bio_nr;
+ struct bio *bio;
+
+ if (try)
+ mutex_lock(&ci->md->table_devices_lock);
+ for (bio_nr = 0; bio_nr < num_bios; bio_nr++) {
+ bio = bio_alloc_bioset(try ? GFP_NOIO : GFP_NOWAIT,
+ 0, ci->md->bs);
+ if (bio) {
+ struct dm_target_io *tio;
+ bio_list_add(blist, bio);
+ tio = container_of(bio, struct dm_target_io, clone);

- __bio_clone_fast(clone, ci->bio);
- if (len)
- bio_setup_sector(clone, ci->sector, *len);
+ tio->io = ci->io;
+ tio->ti = ti;
+ tio->target_bio_nr = bio_nr;
+ } else
+ break;
+ }
+ if (try)
+ mutex_unlock(&ci->md->table_devices_lock);
+ if (bio_nr == num_bios)
+ return;

- __map_bio(tio);
+ while ((bio = bio_list_pop(blist)) != NULL)
+ bio_put(bio);
+ }
}

static void __send_duplicate_bios(struct clone_info *ci, struct dm_target *ti,
unsigned num_bios, unsigned *len)
{
- unsigned target_bio_nr;
+ struct bio_list blist = BIO_EMPTY_LIST;
+ struct bio *bio;

- for (target_bio_nr = 0; target_bio_nr < num_bios; target_bio_nr++)
- __clone_and_map_simple_bio(ci, ti, target_bio_nr, len);
+ if (num_bios == 1)
+ bio_list_add(&blist, &alloc_tio(ci, ti)->clone);
+ else
+ alloc_multiple_bios(&blist, ci, ti, num_bios);
+
+ while ((bio = bio_list_pop(&blist)) != NULL) {
+ struct dm_target_io *tio = container_of(
+ bio, struct dm_target_io, clone);
+ tio->len_ptr = len;
+ __bio_clone_fast(bio, ci->bio);
+ if (len)
+ bio_setup_sector(bio, ci->sector, *len);
+ __map_bio(tio);
+ }
}

static int __send_empty_flush(struct clone_info *ci)
@@ -1325,7 +1356,7 @@ static int __clone_and_map_data_bio(struct clone_info *ci, struct dm_target *ti,
struct dm_target_io *tio;
int r = 0;

- tio = alloc_tio(ci, ti, 0);
+ tio = alloc_tio(ci, ti);
tio->len_ptr = len;
r = clone_bio(tio, bio, sector, *len);
if (r >= 0)

Attachment: signature.asc
Description: PGP signature