Re: [PATCH 00/33] [RFC] Non disruptive application core dump infrastructure

From: PÃdraig Brady
Date: Thu Mar 20 2014 - 10:46:03 EST


On 03/20/2014 09:39 AM, Janani Venkataraman wrote:
> Hi all,
>
> The following series implements an infrastructure for capturing the core of an
> application without disrupting its process.
>
> Kernel Space Approach:
>
> 1) Posted an RFD to LKML explaining the various kernel-methods being analysed.
>
> https://lkml.org/lkml/2013/9/3/122
>
> 2) Went ahead to implement the same using the task_work_add approach and posted an
> RFC to LKML.
>
> http://lwn.net/Articles/569534/
>
> Based on the responses, the present approach implements the same in User-Space.
>
> User Space Approach:
>
> We didn't adopt the CRIU approach because our method would give us a head
> start, as all that the distro would need is the PTRACE_functionality and nothing
> more which is available from kernel versions 3.4 and above.
>
> Basic Idea of User Space:
>
> 1) The threads are held using PTRACE_SEIZE and PTRACE_INTERRUPT.
>
> 2) The dump is then taken using the following:
> 1) The register sets namely general purpose, floating point and the arch
> specific register sets are collected through PTRACE_GETREGSET calls by
> passing the appropriate register type as parameter.
> 2) The virtual memory maps are collected from /proc/pid/maps.
> 3) The auxiliary vector is collected from /proc/pid/auxv.
> 4) Process state information for filling the notes such as PRSTATUS and
> PRPSINFO are collected from /proc/pid/stat and /proc/pid/status.
> 5) The actual memory is read through process_vm_readv syscall as suggested
> by Andi Kleen.
> 6) Command line arguments are collected from /proc/pid/cmdline
>
> 3) The threads are then released using PTRACE_DETACH.
>
> Self Dump:
>
> A self dump is implemented with the following approach which was adapted
> from CRIU:
>
> Gencore Daemon
>
> The programs can request a dump using gencore() API, provided through
> libgencore. This is implemented through a daemon which listens on a UNIX File
> socket. The daemon is started immediately post installation.
>
> We have provided service scripts for integration with systemd.
>
> NOTE:
>
> On systems with systemd, we could make use of socket option, which will avoid
> the need for running the gencore daemon always. The systemd can wait on the
> socket for requests and trigger the daemon as and when required. However, since
> the systemd socket APIs are not exported yet, we have disabled the supporting
> code for this feature.
>
> libgencore:
>
> 1) The client interface is a standard library call. All that the dump requester
> does is open the library and call the gencore() API and the dump will be
> generated in the path specified(relative/absolute).
>
> To Do:
>
> 1) Presently we wait indefinitely for the all the threads to seize. We can add
> a time-out to decide how much time we need to wait for the threads to be
> seized. This can be passed as command line argument in the case of a third
> party dump and in the case of the self-dump through the library call. We need
> to work on how much time to wait.
>
> 2) Like mentioned before, the systemd socket APIs are not exported yet and
> hence this option is disabled now. Once these API's are available we can enable
> the socket option.
>
> We would like to push this to one of the following packages:
> a) util-linux
> b) coreutils
> c) procps-ng
>
> We are not sure which one would suit this application the best.
> Please let us know your views on the same.

Well from coreutils persepective, they're generally
non Linux specific _commands_, and so wouldn't be
a natural home for this (despite the _core_ in the name :)).

thanks,
PÃdraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/