On Mon, 15 Jul 2013, Waiman Long wrote:On 07/15/2013 10:41 AM, Thomas Gleixner wrote:And what's the point of that? I just explained it to you that you doOn Mon, 8 Jul 2013, Waiman Long wrote:I used the ARCH+GENERIC to mean using the generic code but with arch specific
Sigh. GENERIC means, that you use the generic implementation, ARCH
means the architecture has a private implementation of that code.
The generic implementation can use arch specific includes and if there
is none we simply fallback to the asm-generic one.
include.
not need the ARCH=y and GENERIC=y at all.
I asked you several times now to explain and document the whole thing> Let's start with a simple version because it IS simple and easyI understand that. I can live with try_cmpxchg_once, though doing itWell, you did not see a difference on your particular machine. Stillto analyze and debug and then incrementally build improvements on itI originally tried to do a cmpxchg without waiting and there was
instead of creating an heuristics monster in the first place, i.e.:
if (!spin_is_locked(&lr->lock)&& try_cmpxchg_once(lr))
return 0;
return 1;
Take numbers for this on a zoo of different machines: Intel and AMD,
old and new.
Then you can add an incremental patch on that, which add loops and
hoops. Along with numbers on the same zoo of machines.
practically no performance gain. I believe that as soon as
we don't have an idea how all of this works on a set of different
machines. There is a world beside 8 socket servers.
twice will give a slightly better performance. However, without
with numbers instead of your handwaving "slightly better performance"
arguments.
waiting for the lock to be free, this patch won't do much good. ToNo, it's not acceptable at all if you are not able to provide data for
keep it simple, I can remove the ability to do customization while
doing cmpxchg once and wait until the lock is free. Please let me
know if this is acceptable to you.
1,2,4,8 socket machines (from single core to your precious big
boxes). It's that simple. We are not accepting patches which optimize
for a single use case and might negatively affect 99,9999% of the
existing users which have no access to this kind of hardware unless
proven otherwise.
And I asked for a step by step approach in the first review, but youAlso what's the approach to tune that? Running some random testbenchAs I said above, I can remove the customization. I may reintroduce user
and monitor the results for various settings?
If that's the case we better have a that as variables with generic
initial values in the code, which can be modified by sysctl.
customization using sysctl as you suggested in the a follow up patch after
this one is merged.
decided to ignore that. And now you think that it's accetable for you
as long as you get what you want. That's not the way it works, really.
q> > > > Thats what tracepoints are for. Tracing is the only way to get proper+ getnstimeofday(&tv2);Using getnstimeofday() for timestamping and spamming the logs with
+ ns = (tv2.tv_sec - tv1.tv_sec) * NSEC_PER_SEC +
+ (tv2.tv_nsec - tv1.tv_nsec);
+ pr_info("lockref wait loop time = %lu ns\n", ns);
printouts? You can't be serious about that?
Did you even try to understand what I wrote? I did not ask you toThis code is not critical and I can certainly remove it.No, no, no! Again: That's what tracepoints are for.numbers which take preemption, interrupts etc. into account withoutThe _SHOW_WAIT_LOOP_TIME is for debugging and instrumentation purpose only
hurting runtime performace.
during development cycle. It is not supposed to be turned on at production
system. I will document that in the code.
This kind of debugging is completely pointless. Tracepoints are
designed to be low overhead and can be enabled on production
systems.
Your debugging depends on slow timestamps against CLOCK_REALTIME and
an even slower output via printk. How useful is that if you want to
really instrument the behaviour of this code?
remove instrumentation. I asked you to use useful instrumentation
instead of some totally useless crap.