Hum, so I'm somewhat undecided whether this is worth the churn. For free
blocks rb_tree we use rb_first() only in ext4_mb_generate_from_freelist()
which gets called only when generating new buddy bitmap from on-disk bitmap
and we traverse the whole tree after that - thus the extra cost of
rb_first() is a) well hidden in the total cost of iteration, b) rather rare
anyway.
Similarly for the H-tree directory code, we call rb_first() in
ext4_dx_readdir() only to start an iteration over the whole B-tree and in
such case I don't think optimizing rb_first() makes a big difference
(maintaining cached value is going to have about the same cost as we save
by using it).