Re: [RFC 1/2] kernel patch for dump user space stack tool

From: Yanmin Zhang
Date: Tue Apr 24 2012 - 22:57:56 EST


On Tue, 2012-04-24 at 12:10 +0200, Peter Zijlstra wrote:
> On Tue, 2012-04-24 at 09:30 +0800, Yanmin Zhang wrote:
> > > > +static inline void __save_stack_trace_user_task(struct task_struct *task,
> > > > + struct stack_trace *trace)
> > > > +{
> > > > + const struct pt_regs *regs = task_pt_regs(task);
> > > > + const void __user *fp;
> > > > + unsigned long addr;
> > > > +
> > > > + if (task != current && task->state == TASK_RUNNING && task->on_cpu) {
> > > > + /* To trap into kernel at least once */
> > > > + smp_send_reschedule(task_cpu(task));
> > > > + }
> > >
> > > This doesn't make any sense at all..
> > ptrace could put the task to a either STOPPED or TRACED state.
> > But it's time-consuming.
>
> Yeah, but what is the above meant to achieve? it doesn't actually stop
> the task or anything, it will just trap the remote cpu, by the time you
> do your stack walk below the cpu might be running another task entirely
> or you're walking a life stack with all the 'fun' issues that'll bring.
When we access the user space stack, it's based on _task_, not cpu.

The IPI is to make sure the task could trap into kernel at least once,
so we could get its regs->bp. If the task is running on another cpu
for a long time, the regs->bp might be too old. I am also a little worried
about that if the task restores to user space to run quickly after the IPI,
regs->bp might be ruined. If it's true, we might get bad data, or couldn't
get useful data.

See below codes.
+ const struct pt_regs *regs = task_pt_regs(task);
Above pt_regs is task's, not current's.

+ const void __user *fp;
+ unsigned long addr;
+
+ if (task != current && task->state == TASK_RUNNING && task->on_cpu) {
+ /* To trap into kernel at least once */
+ smp_send_reschedule(task_cpu(task));
+ }
+
+ fp = (const void __user *)regs->bp;
+ if (trace->nr_entries < trace->max_entries)
+ trace->entries[trace->nr_entries++] = regs->ip;
+
+ while (trace->nr_entries < trace->max_entries) {
+ struct stack_frame_user frame;
+
+ frame.next_fp = NULL;
+ frame.ret_addr = 0;
+
+ addr = (unsigned long)fp;
+ if (!access_process_vm(task, addr, (void *)&frame,
+ sizeof(frame), 0))
Above line would access the task's user space stack.

We implemented the tool based on real requirement and it's not perfect. So
we need your expertise help.

Thanks for the comments.

Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/