Thank you for sharing this test. I just did a quick test and yes this is really interesting !
Without the scale option, the load is close to 100% for each cpus (so the pstates are increasing up to the turbo frequency) but with scale=320:208, the load is oscillating (close to 50% in average), so the requested frequencies are lower (the power is also reduced).
I have a patch (not yet submitted) that reduce the gap for such a use case.
As a temporary solution, you can also switch to performance with:
sudo su
echo performance > /sys/devices/system/cpu/cpu*/scaling_governor
However, I still don't know why there is 50% of idle with the scale option (is it using a Hw accelerator ?).