My guess it's more the copies than the calls?It's a factor of both. This is why we also created the sendgroup() implementation that uses a tight loop of in-kernel calls to sendmsg() as a means for evaluating the cost of mode switches. It is definitely not negligible (exact numbers depend on the size of the group and the size of the payload, of course).
It sounds like you want sendfile() for UDP.Do you mean by having a per-recipient sendfile() call for the same file? Leaving the cost of the system call aside, this solution does not work well with the kind of real-time data that we've been working with (live streaming, online games). You would have to write the payload to the file as it is being generated and call sendfile() after each such write.