Re: [PATCH] s390/vfio-ap: handle response code 01 on queue reset

From: Anthony Krowiak
Date: Thu Dec 07 2023 - 10:31:29 EST



On 12/6/23 12:17 PM, Halil Pasic wrote:
On Tue, 05 Dec 2023 09:04:23 +0100
Harald Freudenberger <freude@xxxxxxxxxxxxx> wrote:

On 2023-12-04 17:15, Halil Pasic wrote:
On Mon, 4 Dec 2023 16:16:31 +0100
Christian Borntraeger <borntraeger@xxxxxxxxxxxxx> wrote:
Am 04.12.23 um 15:53 schrieb Tony Krowiak:

On 11/29/23 12:12, Christian Borntraeger wrote:
Am 29.11.23 um 15:35 schrieb Tony Krowiak:
In the current implementation, response code 01 (AP queue number not valid)
is handled as a default case along with other response codes returned from
a queue reset operation that are not handled specifically. Barring a bug,
response code 01 will occur only when a queue has been externally removed
from the host's AP configuration; nn this case, the queue must
be reset by the machine in order to avoid leaking crypto data if/when the
queue is returned to the host's configuration. The response code 01 case
will be handled specifically by logging a WARN message followed by cleaning
up the IRQ resources.
To me it looks like this can be triggered by the LPAR admin, correct? So it
is not desireable but possible.
In that case I prefer to not use WARN, maybe use dev_warn or dev_err instead.
WARN can be a disruptive event if panic_on_warn is set.
Yes, it can be triggered by the LPAR admin. I can't use dev_warn here because we don't have a reference to any device, but I can use pr_warn if that suffices.
Ok, please use pr_warn then.
Shouldn't we rather make this an 'info'. I mean we probably do not want
people complaining about this condition. Yes it should be a besNo info logging is done via the S390 Debug Feature in vfio_ap.
There are a few warning messages logged solely in the handle_pqap
and vfio_ap_irq_enable functions. The question is, why are we
talking about the S390 Debug Feature? We are talking about using
pr_warn verses pr_info. What am I missing here?t
practice
to coordinate such things with the guest, and ideally remove the
resource
from the guest first. But AFAIU our stack is supposed to be able to
handle something like this. IMHO issuing a warning is excessive
measure.
I know Reinhard and Tony probably disagree with the last sentence
though.
Halil, Tony, the thing about about info versus warning versus error is
our
own stuff. Keep in mind that these messages end up in the "debug
feature"
as FFDC data. So it comes to the point which FFDC data do you/Tony want
to
see there ? It should be enough to explain to a customer what happened
without the need to "recreate with higher debug level" if something
serious
happened. So my private decision table is:
1) is it something serious, something exceptional, something which may
not
come up again if tried to recreate ? Yes -> make it visible on the
first
occurrence as error msg.
2) is it something you want to read when a customer hits it and you tell
him
to extract and examine the debug feature data ? Yes -> make it a
warning
and make sure your debug feature by default records warnings.
3) still serious, but may flood the debug feature. Good enough and high
probability to reappear on a recreate ? Yes -> make it an info
message
and live with the risk that you may not be able to explain to a
customer
what happened without a recreate and higher debug level.
4) not 1-3, -> maybe a debug msg but still think about what happens when
a
customer enables "debug feature" with highest level. Does it squeeze
out
more important stuff ? Maybe make it dynamic debug with pr_debug()
(see
kernel docu admin-guide/dynamic-debug-howto.rst).
AFAIU the default log level of the S390 Debug Feature is 3 that is
error. So warnings do not help us there by default. And if we are
already asking the reporter to crank up the loglevel of the debug
feature, we can as the reporter to crank it up to 5, assumed there
is not too much stuff that log level 5 in that area... How much
info stuff do we have for the 'ap' debug facility (I hope
that is the facility used by vfio_ap)?


No info logging is done via the S390 Debug Feature in vfio_ap. There are a few warning messages logged solely in the handle_pqap and vfio_ap_irq_enable functions. The question is, why are we talking about the S390 Debug Feature given the discussion is about using pr_warn verses pr_info. What am I missing here?



I think log levels are supposed to be primarily about severity, and
and I'm not sure that a queue becoming unavailable in G1 without
fist re-configuring the G2 so that it no more has access to the
given queue is not really a warning severity thing. IMHO if we
really do want people complaining about this should they ever see it,
yes it should be a warning. If not then probably not.

Regards,
Halil