Should we treat it the same way? Always allocate it for each new thread
and register it with the kernel?
That would be an efficient way to do it, indeed. There is very little
performance overhead to have rseq registered for all threads, whether or
not they intend to run rseq critical sections.
People with slow / low memory machines would prefer not to see
overhead they don't need...