Re: [PATCHv4 1/2] usb: USB Type-C connector class

From: Guenter Roeck
Date: Sun Jul 03 2016 - 17:29:21 EST


On 07/03/2016 12:38 PM, Heikki Krogerus wrote:
On Fri, Jul 01, 2016 at 07:33:12AM -0700, Guenter Roeck wrote:
On Fri, Jul 01, 2016 at 03:05:35PM +0300, Heikki Krogerus wrote:
I've updated my github branch with a commit where both of these issues
should be fixed. Can you give it a try?


This is going to be very difficult. So far, calls such as
typec_set_data_role() were independent (asynchronous) of role change
requests through sysfs, meaning they happened whenever lower level
confirmed that a role change was complete, for whatever reason. Now
I have to check if a role change request through the class code
is pending and not call typec_set_data_role() and friends in that case.

I'm sorry about the misunderstanding.

What you want is basically that we only support non-blocking mode with
this interface, and we can't do that.


No, that is not what I said or asked for. The problems I am seeing are due
to locking in the class code. "Asynchronous" above referred to the state
machine vs. role change requests by the class code, which operates independent
of each other in my code. Set requests from the class code still wait for
the state machine to complete.

The problem is that the state machine now needs to know if the class code
has a set role request pending, because in that case it can no longer report
role changes directly to the class code. This includes role changes unrelated
to the actual set role request. That code is now much more complicated, especially
since each callback into the typec code is handled differently. For example,
typec_disconnect() does not require a lock (unless I missed it), but many
other functions do.

Would you even be able to make it work? How would you report errors
for example?


It worked just great when you had no locks. All I had to do is handle
role changes in the driver, and to implement the necessary locks there.
Actually, before you introduced the locks, you had suggested that this
was the way to go, which is why I implemented it that way.

This becomes even more complicated in situations with parallel role
change requests both from the class code and from the partner.
In such cases the class code may have acquired the lock, and before it
calls the driver, a role change complete is reported. The call to
typec_set_XXX_role() will then get stuck. So I'll have to add another
flag before calling typec_set_data_role() and friends, and then reject
the call to dr_set and all other set requests with -EBUSY.
race conditions.

Or we'll just export the port locks somehow, or provide separate API
for handling them.

..typec_get_port_lock(..
{
return &port->lock;
}

or

typec_lock(..
typec_test_and_lock(..
typec_unlock(..
...

Thinking more ... ultimately, this means that I'll have to reject role
change requests from the class code whenever the state machine is busy,
because I never know if the state machine activity would result in a
call into the typec class code.

Yes, why is that a problem? The other option would be that you queue
the requests from users, and I'm pretty sure we want to avoid that. It
would mean you have to consider a lot of conditions to avoid any
races, and you would also have to make a lot of extra decisions. As an
example, what do you do if two role change requests are in the queue
for the same role? If the first requests is successful, you probable
can just report success to the second request also, _maybe_! But if
the first one fails, do you try again for the second request, and in
that case do you wait for the result of the second request before
reporting to the first, or do you just fail both of them? etc.


No, that is not correct. Again, all I had to do was to implement a lock
in the driver which waited until the state machine was idle, and _then_
check if the set command needed to do anything. No queuing necessary,
no race conditions.

Keep in mind there are no role _change_ requests, the requests are to set
a specific role - and I can do that once I have the state machine lock.
The second request would again be a request to set a specific role. It waits
for the state machine lock, and after it is acquired decides if a role
change is necessary or not.

Sure, I can switch to your method, and I guess I'll have to.
I just think that the resulting -EBUSY responses are unnecessary.

I will reiterate, though, that this all worked just fine with the previous
version of your code, where sysfs role change requests did not require
or implement a lock in the class code.

At the same time, the protocol driver
will have to reject incoming role change requests while a locally
requested role change request is pending, even if that internal
role change request has not yet resulted in a PD message being sent.

Why?

Example:

Class code requests data role change, remote partner requests power role
change. Both are handled in parallel by the state machine. State machine
tries to report power role change to class code and gets stuck because the
class code holds a lock due to the pending data role change.

Therefore, a role change request from the partner now has to be rejected
while a (different) role change request from the class code is pending.

Is this really necessary ? It creates a lot of interdependencies.

In summary, this means a substantial amount of code and testing, which
is going to take a while.

You might want to mention all this in the API documentation.

OK.

I've also removed the supports_usb_power_delivery and id_header_vdo
attributes from partners and cables. We really have to put all the USB
PD specific attributes to its own group, and that group can't be in
control of this class. We will simply not always have access to the
kind of USB PD specific details you want to show, for example details
that you get from discover identity command, even when USB PD is fully
supported. For example on systems that use UCSI.


I think we should have a single unified ABI, independent of the underlying
driver, that informs the user about the partner device. I really don't
quite understand why the class code should not be able to report what device
it is connected to (while at the same time being able to report the alternate
modes it supports).

OK, let's plan this more then. Maybe we can make for example a layer
that creates the groups for the PD specific details to the class?

The problem is still that we can only provide results from for example
request identity command reliably when the PD protocol layer is
completely inside kernel, and that is not always the case. So we
really need to group those details in its own group, and that group
will basically indicate is the PD stack inside the kernel or not.

We should not forget also that the userspace can never rely on those
details because of the fact that they simply will not always be
available.

On the other side, not being able to rely on a well defined ABI makes the
ABI much less useful.

Guenter