On Thu, Apr 15, 2021 at 12:35:45PM -0400, Waiman Long wrote:That does make sense, though a little bit of performance may be lost. I will try that out to see how it work out performance wise.
On 4/15/21 12:30 PM, Johannes Weiner wrote:Yeah it's more obvious after getting to patch 5.
On Tue, Apr 13, 2021 at 09:20:24PM -0400, Waiman Long wrote:That will also work. However, the reason I did that was because of patch 5
In memcg_slab_free_hook()/pcpu_memcg_free_hook(), obj_cgroup_uncharge()The optimization makes sense.
is followed by mod_objcg_state()/mod_memcg_state(). Each of these
function call goes through a separate irq_save/irq_restore cycle. That
is inefficient. Introduce a new function obj_cgroup_uncharge_mod_state()
that combines them with a single irq_save/irq_restore cycle.
@@ -3292,6 +3296,25 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size)
refill_obj_stock(objcg, size);
}
+void obj_cgroup_uncharge_mod_state(struct obj_cgroup *objcg, size_t size,
+ struct pglist_data *pgdat, int idx)
But please don't combine independent operations like this into a
single function. It makes for an unclear parameter list, it's a pain
in the behind to change the constituent operations later on, and it
has a habit of attracting more random bools over time. E.g. what if
the caller already has irqs disabled? What if it KNOWS that irqs are
enabled and it could use local_irq_disable() instead of save?
Just provide an __obj_cgroup_uncharge() that assumes irqs are
disabled, combine with the existing __mod_memcg_lruvec_state(), and
bubble the irq handling up to those callsites which know better.
in the series. I could put the get_obj_stock() and put_obj_stock() code in
slab.h and allowed them to be used directly in various places, but hiding in
one function is easier.
But with the irq disabling gone entirely, is there still an incentive
to combine the atomic section at all? Disabling preemption is pretty
cheap, so it wouldn't matter to just do it twice.
I.e. couldn't the final sequence in slab code simply be
objcg_uncharge()
mod_objcg_state()
again and each function disables preemption (and in the rare case
irqs) as it sees fit?
You lose the irqsoff batching in the cold path, but as you say, hit
rates are pretty good, and it doesn't seem worth complicating the code
for the cold path.