* Kyle Huey <me@xxxxxxxxxxxx> wrote:
On Wed, Jul 5, 2017 at 10:07 PM, Robert O'Callahan <robert@xxxxxxxxxxxxx> wrote:Sorry about that - I've queued up a revert for the original commit and will send
On Tue, Jul 4, 2017 at 3:21 AM, Mark Rutland <mark.rutland@xxxxxxx> wrote:This seems to have stalled out here unfortunately.
Should any of those be moved into the "should be dropped" pile?Why not be conservative and clear every sample you're not sure about?
We'd appreciate a fix sooner rather than later here, since rr is
currently broken on every stable Linux kernel and our attempts to
implement a workaround have failed.
(We have separate "interrupt" and "measure" counters, and I thought we
might work around this regression by programming the "interrupt"
counter to count kernel events as well as user events (interrupting
early is OK), but that caused our (completely separate) "measure"
counter to report off-by-one results (!), which seems to be a
different bug present on a range of older kernels.)
Can we get a consensus (from ingo or peterz?) on Mark's question? Or,
alternatively, can we move the patch at the top of this thread forward
on the stable branches until we do reach an answer to that question?
We've abandoned hope of working around this problem in rr and are
currently broken for all of our users with an up-to-date kernel, so
the situation for us is rather dire at the moment I'm afraid.
the fix to Linus later today. I've added a -stable tag as well so it can be
forwarded to Greg the moment it hits upstream.
We should do the original fix as well, but in a version that does not skip the
sample but zeroes out the RIP and registers (or sets them all to -1LL) - and also
covers other possible places where skid-RIP is exposed, such as LBR.
Thanks,
Ingo