Am 07.02.23 um 11:33 schrieb harald@xxxxxxxxx:
2) A theoretical analysis about possible regressions depending on timer
resolution as mentioned in an earlier message.
This sounds as if you were doing such an analysis for the original
version. Can you share this work so I can attempt to repeat it
for the modified algorithm?
3) Ideally figuring out, why your version performs better then what we
currently have. I have some suspicions, but better understanding might
lead to a better approach. E.g. maybe recording the other edges isn't
the problem so long as we ignore them during decoding?
As I see it, the main thing we are losing with your current proposal is
some diagnostic features. If we keep them as much as possible and have
regressions understood and covered, I see no reason to reject your idea.
That's why I changed the script to separately count EIO and ETIMEDOUT.
The latter indicates missed edges, the former failure to interpret
the data read.
What I see is that the patched driver's errors mostly result from missed
IRQ (note in contrast to last results, I cut the number of reads):
# real[s] user[s] sys[s] success EIO timeout err per succ
1 20.57 0.25 0.03 10 0 0 0
2 24.74 0.25 0.07 10 0 4 0,4
3 21.55 0.20 0.07 10 0 0 0
4 25.81 0.25 0.08 10 0 5 0,5
5 21.56 0.23 0.05 10 0 0 0
6 21.58 0.22 0.05 10 1 0 0,1
7 25.86 0.24 0.08 10 1 5 0,6
8 22.69 0.27 0.05 10 1 1 0,2
9 23.67 0.26 0.04 10 0 2 0,2
10 20.55 0.23 0.04 10 0 0 0
Whereas the original driver has more errors resulting from
mis-interpreted data:
# real[s] user[s] sys[s] success EIO timeout err per succ
1 24.88 0.26 0.07 10 5 4 0,9
2 25.91 0.26 0.07 10 4 5 0,9
3 31.27 0.31 0.10 10 6 10 1,6
4 29.17 0.32 0.11 10 7 8 1,5
5 22.73 0.24 0.08 10 4 2 0,6
6 46.46 0.35 0.25 10 19 24 4,3
7 23.79 0.23 0.09 10 3 3 0,6
8 30.17 0.27 0.11 10 6 9 1,5
9 23.77 0.26 0.06 10 3 2 0,5
10 20.58 0.24 0.06 10 1 0 0,1
I tried a variant that reads falling and rising edges and
uses the redundany of information to eliminate some errors.
This did not work out at all.
It seems a relevant source of
trouble is delayed call to the IRQ handler. The problem is
that only then you try to find out if this IRQ is due to
rising or falling edge by reading the current GPIO level. When
you are to late, this might already have changed and you read
a level, but for the edge of _this_ level you'll receive another
IRQ a few us later.
So the reason that this patch here is showing
lower error rates seems to be the lower probability of such
things happening by halving the IRQs to be handled, _plus_
the information from the hardware, that this IRQ was due
to a falling edge.