CFS is fair even on SMP. Consider for example the worst-case 3-tasks-on-2-CPUs workload on a 2-CPU box:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2658 mingo 20 0 1580 248 200 R 67 0.0 0:56.30 loop
2656 mingo 20 0 1580 252 200 R 66 0.0 0:55.55 loop
2657 mingo 20 0 1576 248 200 R 66 0.0 0:55.24 loop
66% of CPU time for each task. The 'TIME+' column shows a 2% spread between the slowest and the fastest loop after just 1 minute of runtime (and the spread gets narrower with time).