Re: Linux 2.6.35/TIPC 2.0 ABI breaking changes

From: Leandro Lucarella
Date: Mon Oct 18 2010 - 22:17:53 EST


Neil Horman, el 18 de octubre a las 19:45 me escribiste:
> > What I think has happened here (and I'll double check this
> > tomorrow, since it is before I started assisting with tipc)
> > is that a backwards incompatible change *did* inadvertently
> > creep in via these two (related) commits:
> >
> > --------------
> > commit d88dca79d3852a3623f606f781e013d61486828a
> > Author: Neil Horman <nhorman@xxxxxxxxxxxxx>
> > Date: Mon Mar 8 12:20:58 2010 -0800
> >
> > tipc: fix endianness on tipc subscriber messages
> > --------------
> >
> > and
> >
> > ---------------
> > commit c6537d6742985da1fbf12ae26cde6a096fd35b5c
> > Author: Jon Paul Maloy <jon.maloy@xxxxxxxxxxxx>
> > Date: Tue Apr 6 11:40:52 2010 +0000
> >
> > TIPC: Updated topology subscription protocol according to latest spec
> > ---------------
> >
> > Based on Leandro's info, I think it comes down to userspace
> > not knowing exactly where to find these bits anymore:
> >
> > #define TIPC_SUB_SERVICE 0x00 /* Filter for service availability */
> > #define TIPC_SUB_PORTS 0x01 /* Filter for port availability */
> > #define TIPC_SUB_CANCEL 0x04 /* Cancel a subscription */
> >
> That shouldn't be the case. Prior to the above changes the tipc implementation
> tracked the endianess of the hosts to which it was connected and swapped data
> that it sent to those hosts accordingly. With these changes the kernel client
> simply swaps the data to network byte order on send and swaps it back to local
> order on receive universally. That second commit added a bit from the reserved
> pool of one of the connection establishment messages to indicate that a peer was
> using this new protocol. If some non-local byte order information is making it
> into user space, thats a bug that needs fixing.
>
> What may be happening is some old client that doesn't know about the new bit
> might be communicating with an new client that does. IIRC the spec called for
> clients that set bits in the reserved field to drop frames from that client, so
> that condition shouldn't occur, but TIPC may just be ignoring reserved bits. I
> wouldn't be suprised.
>
> Its also possible that the payload data between applications using tipc follow
> the same broken byte swapping method that the protocol itself did, but if that
> were the case I would expect the application to continue running normally,
> unless user space had direct access to the protocol header in its entirety, and
> read it directly, in which case I think I would just cry.

I think there is some misunderstanding here. The compatibility was
broken only for subscriptions messages. The subscriptions messages are
not sent between tipc clients (or maybe they are, but that's not how
tipc developers normally use them AFAIK). You send a subscription
message to your host tipc stack and the stack reply you with event
notifications. Even when they are message sent through a socket, they
are used as an API.

So, this has nothing to do with payload data transmitted by applications
using tipc. We are talking about the tipc API, which is "masked" into
a socket.

Here is a small example (~150 SLOC with comments). Using TIPC 2.0 API:
http://tipc.cslab.ericsson.net/cgi-bin/gitweb.cgi?p=people/allan/tipcutils.git;a=blob;h=efdfa3802e51d9a2a9091b3d97625de9e686b72e;hb=tipcutils2.0;f=demos/topology_subscr_demo/client_tipc.c

Using the "old" TIPC 1.6 API:
http://tipc.cslab.ericsson.net/cgi-bin/gitweb.cgi?p=people/allan/tipcutils.git;a=blob;h=ac5dfc5004b482372abb7905c90fe3073fc9165d;hb=15f57f7572898959e0aaa66293895a8255d77021;f=demos/topology_subscr_demo/subscriptions.c

> > ...because it doesn't know if there is the old auto endian
> > swap thing being done or not being done.
> >
> > Assuming it is possible to do so in some non-kludgy way,
> > it sounds like we want to be looking into an in-kernel change
> > that ensures the older user space binaries get their
> > functionality restored then?
> >
> Lets try figure out exactly what data is getting mis-read first. Maybe we can
> fix it without having to go back to making a sending host figure out a receiving
> hosts byte order. That would be nice. Can you describe the problem in more
> detail?

The problem is not between the tipc stacks in different hosts, is
between the tipc stack and the applications using it (well, maybe there
is a problem somewhere else too).

This was a deliberate API change, not a subtle bug...

--
Leandro Lucarella (AKA luca) http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
For me to ask a woman out, I've got to get into a mental state like the karate
guys before they break the bricks.
-- George Constanza
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/