First I was only commenting on one specific patch, nothing more.
My point is full rounding to 4K on all corners is wasteful because the
CPUs have to handle that case anyways and every split costs precious
TLB entries in direct mapping accesses.
And I might be old fashioned, but I still think minimizing TLB misses
in the kernel is still quite important since the TLBs of modern x86
CPUs are still comparatively small.
btw that is why I was also quite disappointed that the new cpa eliminated
reassembly. It means that on a long uptime system even with moderate
traffic of CPA page allocation/free eventually the completely direct mapping
will be all 4K. And there will be TLB miss galore on each system call
when user space is TLB intensive.
Ok in that light Yinghai's patch is perhaps not so bad after longer
uptime in that scenario. Still performance directly after boot up is
also something that shouldn't be ignored and I'm still hopefully that
reassembly will be readded at some point anyways.