[PATCH v1 0/3] collect LBR callstack together with thread stack data

From: Alexey Budankov
Date: Fri Aug 09 2019 - 11:16:09 EST



The patch set unblocks collection of LBR call stack data simultaneously with
raw thread stack data by --call-graph dwarf,SIZE option:

$perf record -g --call-graph dwarf,1024 -j stack,u -- stack_test

Collected LBR call stack can be used to augment dwarf call stack calculated
from the raw thread stack data and to provide more comprehensive call stack
information for cases when collected SIZE is not enough to cover complete
thread stack.

Such cases are typical for workloads that allocate large arrays of data on
its threads stacks or the possible SIZE to collect can't be large enough due
to workload nature or system configuration and this is where hardware
captured LBR call stacks can provide missing stack frames. Possible dwarf plus
LBR call stacks consolidation algorithm description follows.

With this patch set perf report command UI currently ignores collected LBR
call stack data and still provides dwarf based call stacks information.

===========================================================================

Overview:

Legend:

THS - thread stack
CTX - thread register context
SWS - software stack
SSF - skipped stack frames
PSS - Perf sample stack

ip,sp,bp - HW registers values
d - allocated stack regions
kip - ip address in the kernel space
K - captured thread stack size

THS

-----
| |<-stack bottom
...
|---|
|ip4|
|---| PSS = SWS(THS(K))
| |
--> | |
| |d3 | user/
| |---| user PSS kernel PSS
| |ip3| ------ ------
| |---| |SSF | |SSF |
| | | .... ....
| | | ------ ------
| |d2 | | -1 | | -1 |
|---| user ------ ------
K |ip2| CTX |ip3 | |ip3 |
|---| |----| |----|
| |d1 | ... |ip2 | , |ip2 |
| |---| |---| |----| |----|
| |ip1| |bp0| |ip1 | |ip1 |
| |---| |---| |----| |----|
| | | |ip0|->|ip0 | |ip0 |<-user stack top
| | | |---| ------ ------
| | |<-|sp0|<-stack |kip0|<-kernel stack bottom
--> ----- ----- top |----|
|kip1|
|----|
|kip2|
|----|
....
| |<-kernel stack top
------

Algorithm details:

Legend:

HWS - hardware stack
K-SWS - kernel software stack

BRANCH
TABLE

HWS ip ip
from to
------ -----------
|ip7`| |ip7`| |
|----| |----|----|
|ip6`| |ip6`| |
user PSS |----| |----|----|
|ip5`| |ip5`| |
------ |----| |----|----|
| -1 | |ip4`| |ip4`| |
------ |----| |----|----|
|ip3 |~~~|ip3`| |ip3`| |
|----| |----| |----|----|
|ip2 |~~~|ip2`| |ip2`| |
|----| |----| |----|----|
|ip1 |~~~|ip1`| |ip1`|ip0`|
|----| |----| -----------
|ip0 |~~~|ip0`|<---------'
------ ------

1. if (sym(ipj) == sym(ipj`)), j=0-3 ===> user PSS
2. ipj` , j=4-7 ===> user PSS

Augmented PSS = A_SWS(SWS(THS(K)), HWS):

user/
user PSS kernel PSS

------ ------
|ip7`| |ip7`|<-user PSS bottom
|----| |----|
|ip6`| |ip6`|
|----| |----|
HWS |ip5`| |ip5`|
|----| |----|
|ip4`| |ip4`|
------ ------
|ip3 | |ip3 |
|----| |----|
SWS |ip2 | |ip2 |
|----| |----|
|ip1 | |ip1 |
|----| |----|
|ip0 | |ip0 |<-user PSS top
------ ------
|kip0|<-kernel PSS bottom
|----|
|kip1|
K-SWS |----|
|kip2|
|----|
|kip3|<-kernel PSS top
------

APSS

===========================================================================

---
Alexey Budankov (3):
perf record: enable LBR callstack capture jointly with thread stack
perf report: dump LBR callstack data by -D jointly with thread stack
perf report: prefer dwarf callstacks to LBR ones when captured both

tools/perf/builtin-report.c | 2 ++
tools/perf/util/parse-branch-options.c | 1 +
tools/perf/util/session.c | 31 ++++++++++++++++----------
3 files changed, 22 insertions(+), 12 deletions(-)

--
2.20.1