I've seen different audio players react very differently in the same situations with the same kernel. Are people testing alternatives to make sure it's not just the program being bad? Maybe the people doing these scheduler tests are using all the popular media players and different widely available gui systems to make sure they're not tuning the kernel for a specific program. That should probably be clarified.
I think it ought to be made clear that the gain is being made for a type of program, and not a single one, a type of workload and not a workload consisting of this and that and this program. That can include different windowing systems (xfree86 vs non-free X implimentations or DirectFB) and gtk vs qt vs no toolkit.. This way obvious userspace bugs can be exposed and all this tuning wont be done for helping keep bugs and bad implimentations in use.