I'd assumed the move was primarily because of the difficulty of getting
correct semantics on a shared filesystem
.. not even shared. It was hard to get correct semantics full stop.
Which is a traditional problem. The thing is, the kernel always has some internal state, and it's hard to expose all the semantics that the kernel knows about to user space.
So no, performance is not the only reason to move to kernel space. It can easily be things like needing direct access to internal data queues (for a iSCSI target, this could be things like barriers or just tagged commands - yes, you can probably emulate things like that without access to the actual IO queues, but are you sure the semantics will be entirely right?
The kernel/userland boundary is not just a performance boundary, it's an abstraction boundary too, and these kinds of protocols tend to break abstractions. NFS broke it by having "file handles" (which is not something that really exists in user space, and is almost impossible to emulate correctly), and I bet the same thing happens when emulating a SCSI target in user space.