Re: Why add the general notification queue and its sources

From: Andy Lutomirski
Date: Fri Sep 06 2019 - 13:14:21 EST




> On Sep 6, 2019, at 9:12 AM, Steven Whitehouse <swhiteho@xxxxxxxxxx> wrote:
>
> Hi,
>
>> On 06/09/2019 16:53, Linus Torvalds wrote:
>> On Fri, Sep 6, 2019 at 8:35 AM Linus Torvalds
>> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>>> This is why I like pipes. You can use them today. They are simple, and
>>> extensible, and you don't need to come up with a new subsystem and
>>> some untested ad-hoc thing that nobody has actually used.
>> The only _real_ complexity is to make sure that events are reliably parseable.
>>
>> That's where you really want to use the Linux-only "packet pipe"
>> thing, becasue otherwise you have to have size markers or other things
>> to delineate events. But if you do that, then it really becomes
>> trivial.
>>
>> And I checked, we made it available to user space, even if the
>> original reason for that code was kernel-only autofs use: you just
>> need to make the pipe be O_DIRECT.
>>
>> This overly stupid program shows off the feature:
>>
>> #define _GNU_SOURCE
>> #include <fcntl.h>
>> #include <unistd.h>
>>
>> int main(int argc, char **argv)
>> {
>> int fd[2];
>> char buf[10];
>>
>> pipe2(fd, O_DIRECT | O_NONBLOCK);
>> write(fd[1], "hello", 5);
>> write(fd[1], "hi", 2);
>> read(fd[0], buf, sizeof(buf));
>> read(fd[0], buf, sizeof(buf));
>> return 0;
>> }
>>
>> and it you strace it (because I was too lazy to add error handling or
>> printing of results), you'll see
>>
>> write(4, "hello", 5) = 5
>> write(4, "hi", 2) = 2
>> read(3, "hello", 10) = 5
>> read(3, "hi", 10) = 2
>>
>> note how you got packets of data on the reader side, instead of
>> getting the traditional "just buffer it as a stream".
>>
>> So now you can even have multiple readers of the same event pipe, and
>> packetization is obvious and trivial. Of course, I'm not sure why
>> you'd want to have multiple readers, and you'd lose _ordering_, but if
>> all events are independent, this _might_ be a useful thing in a
>> threaded environment. Maybe.
>>
>> (Side note: a zero-sized write will not cause a zero-sized packet. It
>> will just be dropped).
>>
>> Linus
>
> The events are generally not independent - we would need ordering either implicit in the protocol or explicit in the messages. We also need to know in case messages are dropped too - doesn't need to be anything fancy, just some idea that since we last did a read, there are messages that got lost, most likely due to buffer overrun.

This could be a bit fancier: if the pipe recorded the bitwise or of the first few bytes of dropped message, then the messages could set a bit in the header indicating the type, and readers could then learn which *types* of messages were dropped.

Or they could just use multiple pipes.

If this whole mechanism catches on, I wonder if implementing recvmmsg() on pipes would be worthwhile.