Re: [PATCH] usb: gadget: f_mass_storage: break IO operations via configfs

From: Maxim Devaev
Date: Thu Apr 07 2022 - 13:48:50 EST


В Thu, 7 Apr 2022 12:06:01 -0400
Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:

> On Wed, Apr 06, 2022 at 09:36:34PM +0300, Maxim Devaev wrote:
> > В Wed, 6 Apr 2022 13:51:40 -0400
> > Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > > On Wed, Apr 06, 2022 at 07:52:34PM +0300, Maxim Devaev wrote:
> > > > > It's not clear to me how breaking I/O operations allows you to do a
> > > > > "force eject". It seems that what you would need is something like
> > > > > fsg_store_file() that omits the curlun->prevent_medium_removal check.
> > > > > Interrupting a lengthy I/O operation doesn't really have anything to do
> > > > > with this.
> > > >
> > > > Perhaps I chose the wrong path, it's just how my userspace code works now.
> > > > If the drive is connected to a Linux host, then in order to clear
> > > > the "file" and extract the image, I sent a SIGUSR1 signal to the "file-storage"
> > > > thread. This interrupted long IO operations, reset curlun->prevent_medium_removal
> > > > and I got the ability to extract.
> > >
> > > Oh, I see. That's kind of an unintended side effect of not calling
> > > raise_exception().
> > >
> > > And while it does interrupt long I/O operations, it does so in
> > > non-sanctioned way. To the host it will appear as though the gadget's
> > > firmware has crashed, since the gadget will stop sending or receiving
> > > data. Eventually the host will time out and reset the gadget.
> > >
> > > Maybe that's the sort of thing you want, but I rather doubt it.
> >
> > It's hard to say how it actually should work in case of force removing.
> > At least the currect approach with SIGUSR1 is really working on thousands
> > systems and with Linux, Mac and Windows. I believe that the criterion
> > of the experiment is quite important here. I know of several other utilities
> > that use SIGUSR1 for similar purposes.
>
> This merely means that the current unintended behavior of userspace USR1
> signals must not be changed. But it doesn't mean you have to continue
> to rely on that behavior; you can implement something better.

So I suggest break_io :) I haven't come up with anything better.

> > > > Will masking the curlun->prevent_medium_removal flag be enough?
> > >
> > > I think so. But it will be blocked to some extent by long-running I/O
> > > operations, because those operations acquire the filesem rw-semaphore
> > > for reading.
> > >
> > > More precisely, each individual command holds the rw-semaphore. But the
> > > semaphore is dropped between commands, and a long-running I/O operation
> > > typically consists of many separate commands. So the blocking may be
> > > acceptable.
> >
> > It is very important for KVM-over-IP to be able to command "turn it off immediately".
>
> Why is this? A lot of actual devices (DVD drives, for instance) don't
> give you the ability to eject the media when the host has prevented it.
> Why should f-mass-storage be different?

The DVD drive has the ability to physically eject the disc. It's not too good
for the drive itself, but it's just there. We can also urgently remove
the USB flash drive.

At least there is one situation where the behavior of f_mass_storage differs
from the behavior of a real drive. What happens when you click on the physical
"eject" button? Yes, the OS can block this, but the problem is that we don't have
an "eject" here. If I connect the gadget to the Linux host and don't even mount
the image, Linux won't let me change the image in the "file", since the gadget
will be constantly busy with some IO.

But I believe creating a virtual "eject" button is a separate task that
does not depend on "break_io".

> > In this context, I would prefer "break_io" rather than "allow_force_remove".
>
> Okay. But what about the 30-second host timeout I mentioned above?
> Does this actually happen with your approach? It seems like the kind of
> thing you don't want in a "turn it off immediately" situation. (I
> haven't tried doing this myself -- maybe I should.)

Neither I nor my users noticed any problems related to this. After extracting
the image using SIGUSR1/"file", I can just assign a new "file"image
and everything will work.

> > > > > You should not call send_sig_info() directly; instead call
> > > > > raise_exception(). It already does the work you need (including some
> > > > > things you left out).
> > > >
> > > > raise_exception() assumes the setting of a new state, and I did not want to do this,
> > > > since the same does not happen when throwing a signal from userspace.
> > >
> > > Userspace isn't supposed to send the USR1 signal, only the INT, TERM, or
> > > KILL signals. USR1 is supposed to be reserved for the driver's internal
> > > use. Unfortunately, AFAIK there's no way to allow the driver to send a
> > > signal to itself without also allowing the signal to be sent by
> > > userspace. :-(
> >
> > It's funny that you actually helped me solve my problem thanks to this undocumented
> > behavior. If it were not for the ability to send a signal, I would not be able to make
> > the necessary code, and my software would always be waiting for the completion of IO.
> >
> > So here I am grateful to you - I didn't have to patch the kernel a few years ago,
> > and now I just want to turn it into a clear feature :)
> >
> > Given the needs of the userspace code, maybe the suggested "break_io"
> > would be the best choice?
>
> It sounds like what you really want is a combination of both "interrupt
> I/O" and "forced eject".

Indeed. But I didn't want to introduce some complex entities into the "file" attribute
or make magic prefixes for the image name or something. So I suggested
"echo > break_io && echo > file". This will not break the current behavior of the drive.

> > > And sending the signal _does_ set a new state, whether you intended to
> > > or not. Although in this case, the new state is always the same as the
> > > old state, i.e., FSG_STATE_NORMAL.
> >
> > So I could call raise_exception(fsg->common, FSG_STATE_NORMAL) instead of sending
> > the signal from break_io handler. There will be a slight difference
> > in exception_req_tag and exception_arg, but it does not seem to cause any side effects.
> > Please correct me if I'm wrong.
>
> In fact, the best approach would be to introduce a new state (let's call
> it FSG_STATE_FORCED_EJECT) with priority just above
> FSG_STATE_ABORT_BULK_OUT. You would call raise_exception with
> FSG_STATE_FORCED_EJECT, not FSG_STATE_NORMAL. handle_exceptions() would
> treat this state partially like ABORT_BULK_OUT in that it would avoid
> resetting all the LUN data values and would call send_status_common() if
> a command had been underway. But in addition it would do the forced
> eject.

Do you mean something like this?

if (old_state != FSG_STATE_ABORT_BULK_OUT) {
for (i = 0; i < ARRAY_SIZE(common->luns); ++i) {
curlun = common->luns[i];
if (!curlun)
continue;
curlun->prevent_medium_removal = 0;
if (old_state != FSG_STATE_FORCED_EJECT) {
curlun->sense_data = SS_NO_SENSE;
curlun->unit_attention_data = SS_NO_SENSE;
curlun->sense_data_info = 0;
curlun->info_valid = 0;
}
}
}

> Also, the sysfs routine should be careful to see whether the command
> currently being executed is for the LUN being ejected. I guess you
> have never tried issuing your USR1 signal to a mass-storage gadget
> running more than one LUN. If you did, you would find that it clears
> the prevent_medium_removal flag for all of them, not just the one that
> you wanted.

I haven't tried it, but I figured it out along the way when I discovered
the SIGUSR1 feature. I perceive it as something that should work that way.
Like, we hit the whole device.