I can comment at least a little about the -Os aspect (although not I'm no
expert on this in particular). In general, for _most_ use cases, a kernel
compiled with CONFIG_CC_OPTIMIZE_FOR_SIZE will run slower than one compiled
without it. On rare occasion though, it may actually run faster, the only
cases I've seen where this happens are specialized uses that are very memory
pressure dependent and run almost entirely in userspace with almost no
syscalls (for example math related stuff operating on _very, very big_ (as in,
>1 trillion elements) multidimensional matrices, with complex memory
constraints), and even then it's usually a miniscule improvement in
performance (generally less than 1%, which can of course be significant
depending on how long it takes before the improvement).