[PATCH trace-cmd V6 4/7] trace-cmd/virt-server: Add virt-server mode for a virtualization environment

From: Masami Hiramatsu
Date: Tue May 26 2015 - 03:03:12 EST


From: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@xxxxxxxxxxx>

Add the virt-server mode for a virtualization environment
based on the listen mode. This mode works as a client/server
mode over not TCP/UDP but virtio-serial channel. Since the
troughput of trace-data can be huge, traditional IP network
easily gets higher overhead. Using virtio-serial can reduce
overhead because it can skip guest/host TCP/IP network stack.

virt-server uses two kinds of virtio-serial I/Fs:
(1) agent-ctl-path(UNIX domain socket)
=> control path of an agent trace-cmd each guest
(2) trace-path-cpuX(named pipe)
=> trace data path each vcpu

Those I/Fs must be defined as below paths:
(1) /tmp/trace-cmd/virt/agent-ctl-path
(2) /tmp/trace-cmd/virt/<guest domain>/trace-path-cpuX

If we run virt-server, agent-ctl-path I/F is automatically created because
virt-server operates as a server mode of UNIX domain socket. However,
trace-path-cpuX is not automatically created because we need to separate
trace data for each guests.

Over the virtio-serial, V2 protocol is slightly changed since
the server can not notice when the client connects. The detail
is described in Documentation/Protocol.txt.

NOTE:
This feature requests to disable(or make permissive) selinux
since qemu has to open a (non-registered) unix domain socket.

<How to set up>
1. Run virt-server on a host before booting guests
# trace-cmd virt-server

2. Make guest domain directory
# mkdir -p /tmp/trace-cmd/virt/<domain>
# chmod 710 /tmp/trace-cmd/virt/<domain>
# chgrp qemu /tmp/trace-cmd/virt/<domain>

3. Make FIFO on the host
# mkfifo /tmp/trace-cmd/virt/<domain>/trace-path-cpu{0,1,...,X}.{in,out}

4. Set up virtio-serial pipes of the guest on the host
Add the following tags to domain XML files.
# virsh edit <domain>
<channel type='unix'>
<source mode='connect' path='/tmp/trace-cmd/virt/agent-ctl-path'/>
<target type='virtio' name='agent-ctl-path'/>
</channel>
<channel type='pipe'>
<source path='/tmp/trace-cmd/virt/<domain>/trace-path-cpu0'/>
<target type='virtio' name='trace-path-cpu0'/>
</channel>
... (cpu1, cpu2, ...)

5. Boot the guest
# virsh start <domain>

6. Check I/F of virtio-serial on the guest
# ls /dev/virtio-ports
...
agent-ctl-path
...
trace-path-cpu0
...

Next, the user will run trace-cmd with record --virt options or other options
for virtualization on the guest.

This patch adds only minimum features of virt-server as follows:
<Features>
- virt-server subcommand
- Create I/F directory(/tmp/trace-cmd/virt/)
- Use named pipe I/Fs of virtio-serial for trace data paths
- Use UNIX domain socket for connecting clients on guests
- Use splice(2) for collecting trace data of guests

<Restrictions>
- libvirt is required for finding guest domain name
- User must setup fifos by hand
- Do not support hotplug VCPUs
- Interface directory is fixed
- SELinux should be disabled

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@xxxxxxxxxxx>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@xxxxxxxxxxx>
---
Changes in V5: Change patch description
Update protocol document
Changes in V4: Fix some typos and cleanup
Changes in V3: Change _nw/_NW to _net/_NET
---
Documentation/Protocol.txt | 44 +++
Documentation/trace-cmd-virt-server.1.txt | 89 ++++++
trace-cmd.c | 3
trace-cmd.h | 2
trace-listen.c | 467 ++++++++++++++++++++++++-----
trace-msg.c | 105 ++++++-
trace-recorder.c | 50 ++-
trace-usage.c | 10 +
8 files changed, 667 insertions(+), 103 deletions(-)
create mode 100644 Documentation/trace-cmd-virt-server.1.txt

diff --git a/Documentation/Protocol.txt b/Documentation/Protocol.txt
index 49f7766..52df89e 100644
--- a/Documentation/Protocol.txt
+++ b/Documentation/Protocol.txt
@@ -6,6 +6,7 @@ Index
1. What is the trace-cmd protocol?
2. Trace-cmd Protocol V1 (Obsolete)
3. Trace-cmd Protocol V2
+4. Trace-cmd Protocol V2 in virt-server mode


1. What is the trace-cmd protocol?
@@ -117,3 +118,46 @@ or not by checking the first message from the client. If client
sends a positive number, it should be a V1 protocol client.


+4. Trace-cmd Protocol V2 in virt-server mode
+============================================
+
+In the virt-server mode, trace-cmd uses a control channel and
+trace data channels of virtio-serial to transfar trace data.
+
+Since the virtio-serial channel is just a character device
+on the guest, the server can not notice when a client attaches
+to (means opens) the channel. Thus, the server waits for the
+connection message MSG_TCONNECT from the client on the control
+channel. The protocol flow is as follows;
+
+ <server> <client>
+ Open a control channel
+ wait for MSG_TCONNECT
+ open a virtio-serial channel
+ send MSG_TCONNECT
+ receive MSG_TCONNECT <----+
+ send MSG_RCONNECT
+ +---------------> receive MSG_RCONNECT
+ check "tracecmd-V2"
+ send MSG_TINIT with cpus, pagesize and options
+ receive MSG_TINIT <-------+
+ perse the parameters
+ send MSG_RINIT with port_array
+ +----------------> receive MSG_RINIT
+ get port_array
+ send meta data(MSG_SENDMETA)
+ receive MSG_SENDMETA <----+
+ record meta data
+ (snip)
+ send a message to finish sending meta data
+ | (MSG_FINMETA)
+ receive MSG_FINMETA <-----+
+ read block
+ --- start sending trace data on child processes ---
+
+ --- When client finishes sending trace data ---
+ send MSG_CLOSE
+ receive MSG_CLOSE <-------+
+ close the virtio-serial channel
+
+
diff --git a/Documentation/trace-cmd-virt-server.1.txt b/Documentation/trace-cmd-virt-server.1.txt
new file mode 100644
index 0000000..b775745
--- /dev/null
+++ b/Documentation/trace-cmd-virt-server.1.txt
@@ -0,0 +1,89 @@
+TRACE-CMD-VIRT-SERVER(1)
+========================
+
+NAME
+----
+trace-cmd-virt-server - listen for incoming connection to record tracing of
+ guests' clients
+
+SYNOPSIS
+--------
+*trace-cmd virt-server ['OPTIONS']
+
+DESCRIPTION
+-----------
+The trace-cmd(1) virt-server sets up UNIX domain socket I/F for communicating
+with guests' clients that run 'trace-cmd-record(1)' with the *--virt* option.
+When a connection is made, and the guest's client sends data, it will create a
+file called 'trace.DOMAIN.dat'. Where DOMAIN is the name of the guest named
+by libvirt.
+
+OPTIONS
+-------
+*-D*::
+ This options causes trace-cmd listen to go into a daemon mode and run in
+ the background.
+
+*-d* 'dir'::
+ This option specifies a directory to write the data files into.
+
+*-o* 'filename'::
+ This option overrides the default 'trace' in the 'trace.DOMAIN.dat' that
+ is created when guest's client connects.
+
+*-l* 'filename'::
+ This option writes the output messages to a log file instead of standard output.
+
+SETTING
+-------
+Here, an example is written as follows:
+
+1. Run virt-server on a host
+ # trace-cmd virt-server
+
+2. Make guest domain directory
+ # mkdir -p /tmp/trace-cmd/virt/<DOMAIN>
+ # chmod 710 /tmp/trace-cmd/virt/<DOMAIN>
+ # chgrp qemu /tmp/trace-cmd/virt/<DOMAIN>
+
+3. Make FIFO on the host
+ # mkfifo /tmp/trace-cmd/virt/<DOMAIN>/trace-path-cpu{0,1,...,X}.{in,out}
+
+4. Set up of virtio-serial pipe of a guest on the host
+ Add the following tags to domain XML files.
+ # virsh edit <guest domain>
+ <channel type='unix'>
+ <source mode='connect' path='/tmp/trace-cmd/virt/agent-ctl-path'/>
+ <target type='virtio' name='agent-ctl-path'/>
+ </channel>
+ <channel type='pipe'>
+ <source path='/tmp/trace-cmd/virt/<DOMAIN>/trace-path-cpu0'/>
+ <target type='virtio' name='trace-path-cpu0'/>
+ </channel>
+ ... (cpu1, cpu2, ...)
+
+5. Boot the guest
+ # virsh start <DOMAIN>
+
+6. Run the guest's client(see trace-cmd-record(1) with the *--virt* option)
+ # trace-cmd record -e sched* --virt
+
+SEE ALSO
+--------
+trace-cmd(1), trace-cmd-record(1), trace-cmd-report(1), trace-cmd-start(1),
+trace-cmd-stop(1), trace-cmd-extract(1), trace-cmd-reset(1),
+trace-cmd-split(1), trace-cmd-list(1)
+
+AUTHOR
+------
+Written by Masami Hiramatsu <masami.hiramatsu.pt@xxxxxxxxxxx>
+
+RESOURCES
+---------
+git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/trace-cmd.git
+
+COPYING
+-------
+Copyright \(C) 2013,2104 Hitachi, Ltd. Free use of this software is
+granted under the terms of the GNU Public License (GPL).
+
diff --git a/trace-cmd.c b/trace-cmd.c
index 4c5b564..29a2bb8 100644
--- a/trace-cmd.c
+++ b/trace-cmd.c
@@ -425,7 +425,8 @@ int main (int argc, char **argv)
} else if (strcmp(argv[1], "mem") == 0) {
trace_mem(argc, argv);
exit(0);
- } else if (strcmp(argv[1], "listen") == 0) {
+ } else if (strcmp(argv[1], "listen") == 0 ||
+ strcmp(argv[1], "virt-server") == 0) {
trace_listen(argc, argv);
exit(0);
} else if (strcmp(argv[1], "split") == 0) {
diff --git a/trace-cmd.h b/trace-cmd.h
index 1261e23..a93920f 100644
--- a/trace-cmd.h
+++ b/trace-cmd.h
@@ -257,6 +257,7 @@ struct tracecmd_recorder *tracecmd_create_recorder_maxkb(const char *file, int c
struct tracecmd_recorder *tracecmd_create_buffer_recorder_fd(int fd, int cpu, unsigned flags, const char *buffer);
struct tracecmd_recorder *tracecmd_create_buffer_recorder(const char *file, int cpu, unsigned flags, const char *buffer);
struct tracecmd_recorder *tracecmd_create_buffer_recorder_maxkb(const char *file, int cpu, unsigned flags, const char *buffer, int maxkb);
+struct tracecmd_recorder *tracecmd_create_recorder_virt(const char *file, int cpu, int trace_fd);

int tracecmd_start_recording(struct tracecmd_recorder *recorder, unsigned long sleep);
void tracecmd_stop_recording(struct tracecmd_recorder *recorder);
@@ -270,6 +271,7 @@ int tracecmd_msg_finish_sending_metadata(int fd);
void tracecmd_msg_send_close_msg(void);

/* for server */
+int tracecmd_msg_set_connection(int fd, const char *domain);
int tracecmd_msg_initial_setting(int fd, int *cpus, int *pagesize);
int tracecmd_msg_send_port_array(int fd, int total_cpus, int *ports);
int tracecmd_msg_collect_metadata(int ifd, int ofd);
diff --git a/trace-listen.c b/trace-listen.c
index 17ab184..718680f 100644
--- a/trace-listen.c
+++ b/trace-listen.c
@@ -23,9 +23,13 @@
#include <stdlib.h>
#include <string.h>
#include <getopt.h>
+#include <grp.h>
+#include <sys/stat.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/wait.h>
+#include <sys/epoll.h>
+#include <sys/un.h>
#include <netdb.h>
#include <unistd.h>
#include <fcntl.h>
@@ -50,19 +54,42 @@ static int backlog = 5;

static int proto_ver;

-#define TEMP_FILE_STR "%s.%s:%s.cpu%d", output_file, host, port, cpu
-static char *get_temp_file(const char *host, const char *port, int cpu)
+enum {
+ NET = 1,
+ VIRT = 2,
+};
+
+#define TEMP_FILE_STR_NET "%s.%s:%s.cpu%d", output_file, host, port, cpu
+#define TEMP_FILE_STR_VIRT "%s.%s:%d.cpu%d", output_file, domain, virtpid, cpu
+static char *get_temp_file(const char *host, const char *port,
+ const char *domain, int virtpid, int cpu, int mode)
{
char *file = NULL;
int size;

- size = snprintf(file, 0, TEMP_FILE_STR);
- file = malloc_or_die(size + 1);
- sprintf(file, TEMP_FILE_STR);
+ if (mode == NET) {
+ size = snprintf(file, 0, TEMP_FILE_STR_NET);
+ file = malloc_or_die(size + 1);
+ sprintf(file, TEMP_FILE_STR_NET);
+ } else if (mode == VIRT) {
+ size = snprintf(file, 0, TEMP_FILE_STR_VIRT);
+ file = malloc_or_die(size + 1);
+ sprintf(file, TEMP_FILE_STR_VIRT);
+ }

return file;
}

+static char *get_temp_file_net(const char *host, const char *port, int cpu)
+{
+ return get_temp_file(host, port, NULL, 0, cpu, NET);
+}
+
+static char *get_temp_file_virt(const char *domain, int virtpid, int cpu)
+{
+ return get_temp_file(NULL, NULL, domain, virtpid, cpu, VIRT);
+}
+
static void put_temp_file(char *file)
{
free(file);
@@ -81,11 +108,15 @@ static void signal_setup(int sig, sighandler_t handle)
sigaction(sig, &action, NULL);
}

-static void delete_temp_file(const char *host, const char *port, int cpu)
+static void delete_temp_file(const char *host, const char *port,
+ const char *domain, int virtpid, int cpu, int mode)
{
char file[MAX_PATH];

- snprintf(file, MAX_PATH, TEMP_FILE_STR);
+ if (mode == NET)
+ snprintf(file, MAX_PATH, TEMP_FILE_STR_NET);
+ else if (mode == VIRT)
+ snprintf(file, MAX_PATH, TEMP_FILE_STR_VIRT);
unlink(file);
}

@@ -113,8 +144,12 @@ static int process_option(char *option)
return 0;
}

+static struct tracecmd_recorder *recorder;
+
static void finish(int sig)
{
+ if (recorder)
+ tracecmd_stop_recording(recorder);
done = 1;
}

@@ -184,7 +219,7 @@ static void process_udp_child(int sfd, const char *host, const char *port,

signal_setup(SIGUSR1, finish);

- tempfile = get_temp_file(host, port, cpu);
+ tempfile = get_temp_file_net(host, port, cpu);
fd = open(tempfile, O_WRONLY | O_TRUNC | O_CREAT, 0644);
if (fd < 0)
pdie("creating %s", tempfile);
@@ -225,6 +260,28 @@ static void process_udp_child(int sfd, const char *host, const char *port,
exit(0);
}

+#define SLEEP_DEFAULT 1000
+
+static void process_virt_child(int fd, int cpu, int pagesize,
+ const char *domain, int virtpid)
+{
+ char *tempfile;
+
+ signal_setup(SIGUSR1, finish);
+ tempfile = get_temp_file_virt(domain, virtpid, cpu);
+
+ recorder = tracecmd_create_recorder_virt(tempfile, cpu, fd);
+
+ do {
+ if (tracecmd_start_recording(recorder, SLEEP_DEFAULT) < 0)
+ break;
+ } while (!done);
+
+ tracecmd_free_recorder(recorder);
+ put_temp_file(tempfile);
+ exit(0);
+}
+
#define START_PORT_SEARCH 1500
#define MAX_PORT_SEARCH 6000

@@ -272,20 +329,37 @@ static int udp_bind_a_port(int start_port, int *sfd)
return num_port;
}

-static void fork_udp_reader(int sfd, const char *node, const char *port,
- int *pid, int cpu, int pagesize)
+static void fork_reader(int sfd, const char *node, const char *port,
+ int *pid, int cpu, int pagesize, const char *domain,
+ int virtpid, int mode)
{
*pid = fork();

if (*pid < 0)
- pdie("creating udp reader");
+ pdie("creating reader");

- if (!*pid)
- process_udp_child(sfd, node, port, cpu, pagesize);
+ if (!*pid) {
+ if (mode == NET)
+ process_udp_child(sfd, node, port, cpu, pagesize);
+ else if (mode == VIRT)
+ process_virt_child(sfd, cpu, pagesize, domain, virtpid);
+ }

close(sfd);
}

+static void fork_udp_reader(int sfd, const char *node, const char *port,
+ int *pid, int cpu, int pagesize)
+{
+ fork_reader(sfd, node, port, pid, cpu, pagesize, NULL, 0, NET);
+}
+
+static void fork_virt_reader(int sfd, int *pid, int cpu, int pagesize,
+ const char *domain, int virtpid)
+{
+ fork_reader(sfd, NULL, NULL, pid, cpu, pagesize, domain, virtpid, VIRT);
+}
+
static int open_udp(const char *node, const char *port, int *pid,
int cpu, int pagesize, int start_port)
{
@@ -305,6 +379,29 @@ static int open_udp(const char *node, const char *port, int *pid,
return num_port;
}

+#define TRACE_CMD_DIR "/tmp/trace-cmd/"
+#define VIRT_DIR TRACE_CMD_DIR "virt/"
+#define VIRT_TRACE_CTL_SOCK VIRT_DIR "agent-ctl-path"
+#define TRACE_PATH_DOMAIN_CPU VIRT_DIR "%s/trace-path-cpu%d.out"
+
+static int open_virtio_serial_pipe(int *pid, int cpu, int pagesize,
+ const char *domain, int virtpid)
+{
+ char buf[PATH_MAX];
+ int fd;
+
+ snprintf(buf, PATH_MAX, TRACE_PATH_DOMAIN_CPU, domain, cpu);
+ fd = open(buf, O_RDONLY | O_NONBLOCK);
+ if (fd < 0) {
+ warning("open %s", buf);
+ return fd;
+ }
+
+ fork_virt_reader(fd, pid, cpu, pagesize, domain, virtpid);
+
+ return fd;
+}
+
/* Setup client who is using the v1 protocol */
static int client_initial_setting(int fd, char *buf, int *cpus, int *pagesize)
{
@@ -369,7 +466,7 @@ static int client_initial_setting(int fd, char *buf, int *cpus, int *pagesize)
return 0;
}

-static int communicate_with_client(int fd, int *cpus, int *pagesize)
+static int communicate_with_client_net(int fd, int *cpus, int *pagesize)
{
char buf[BUFSIZ];
int n;
@@ -407,12 +504,32 @@ static int communicate_with_client(int fd, int *cpus, int *pagesize)
return 0;
}

-static int create_client_file(const char *node, const char *port)
+static int communicate_with_client_virt(int fd, const char *domain, int *cpus, int *pagesize)
+{
+ proto_ver = V2_PROTOCOL;
+
+ if (tracecmd_msg_set_connection(fd, domain) < 0)
+ return -1;
+
+ /* read the CPU count, the page size, and options */
+ if (tracecmd_msg_initial_setting(fd, cpus, pagesize) < 0)
+ return -1;
+
+ return 0;
+}
+
+static int create_client_file(const char *node, const char *port,
+ const char *domain, int pid, int mode)
{
char buf[BUFSIZ];
int ofd;

- snprintf(buf, BUFSIZ, "%s.%s:%s.dat", output_file, node, port);
+ if (mode == NET)
+ snprintf(buf, BUFSIZ, "%s.%s:%s.dat", output_file, node, port);
+ else if (mode == VIRT)
+ snprintf(buf, BUFSIZ, "%s.%s:%d.dat", output_file, domain, pid);
+ else
+ plog("create_client_file: Unsupported mode %d", mode);

ofd = open(buf, O_RDWR | O_CREAT | O_TRUNC, 0644);
if (ofd < 0)
@@ -421,7 +538,8 @@ static int create_client_file(const char *node, const char *port)
}

static void destroy_all_readers(int cpus, int *pid_array, const char *node,
- const char *port)
+ const char *port, const char *domain,
+ int virtpid, int mode)
{
int cpu;

@@ -429,42 +547,50 @@ static void destroy_all_readers(int cpus, int *pid_array, const char *node,
if (pid_array[cpu] > 0) {
kill(pid_array[cpu], SIGKILL);
waitpid(pid_array[cpu], NULL, 0);
- delete_temp_file(node, port, cpu);
+ delete_temp_file(node, port, domain, virtpid, cpu, mode);
pid_array[cpu] = 0;
}
}
}

static int *create_all_readers(int cpus, const char *node, const char *port,
- int pagesize, int fd)
+ const char *domain, int virtpid, int pagesize,
+ int fd, int mode)
{
char buf[BUFSIZ];
- int *port_array;
+ int *port_array = NULL;
int *pid_array;
int start_port;
int udp_port;
int cpu;
int pid;

- port_array = malloc_or_die(sizeof(int) * cpus);
+ if (mode == NET) {
+ port_array = malloc_or_die(sizeof(int) * cpus);
+ start_port = START_PORT_SEARCH;
+ }
pid_array = malloc_or_die(sizeof(int) * cpus);
memset(pid_array, 0, sizeof(int) * cpus);

- start_port = START_PORT_SEARCH;
-
- /* Now create a UDP port for each CPU */
+ /* Now create a reader for each CPU */
for (cpu = 0; cpu < cpus; cpu++) {
- udp_port = open_udp(node, port, &pid, cpu,
- pagesize, start_port);
- if (udp_port < 0)
- goto out_free;
- port_array[cpu] = udp_port;
+ if (node) {
+ udp_port = open_udp(node, port, &pid, cpu,
+ pagesize, start_port);
+ if (udp_port < 0)
+ goto out_free;
+ port_array[cpu] = udp_port;
+ /*
+ * Due to some bugging finding ports,
+ * force search after last port
+ */
+ start_port = udp_port + 1;
+ } else {
+ if (open_virtio_serial_pipe(&pid, cpu, pagesize,
+ domain, virtpid) < 0)
+ goto out_free;
+ }
pid_array[cpu] = pid;
- /*
- * Due to some bugging finding ports,
- * force search after last port
- */
- start_port = udp_port + 1;
}

if (proto_ver == V2_PROTOCOL) {
@@ -485,7 +611,7 @@ static int *create_all_readers(int cpus, const char *node, const char *port,
return pid_array;

out_free:
- destroy_all_readers(cpus, pid_array, node, port);
+ destroy_all_readers(cpus, pid_array, node, port, domain, virtpid, mode);
return NULL;
}

@@ -527,7 +653,8 @@ static void stop_all_readers(int cpus, int *pid_array)
}

static void put_together_file(int cpus, int ofd, const char *node,
- const char *port)
+ const char *port, const char *domain, int virtpid,
+ int mode)
{
char **temp_files;
int cpu;
@@ -536,25 +663,33 @@ static void put_together_file(int cpus, int ofd, const char *node,
temp_files = malloc_or_die(sizeof(*temp_files) * cpus);

for (cpu = 0; cpu < cpus; cpu++)
- temp_files[cpu] = get_temp_file(node, port, cpu);
+ temp_files[cpu] = get_temp_file(node, port, domain,
+ virtpid, cpu, mode);

tracecmd_attach_cpu_data_fd(ofd, cpus, temp_files);
free(temp_files);
}

-static void process_client(const char *node, const char *port, int fd)
+static void process_client(int fd, const char *node, const char *port,
+ const char *domain, int virtpid, int mode)
{
int *pid_array;
int pagesize;
int cpus;
int ofd;

- if (communicate_with_client(fd, &cpus, &pagesize) < 0)
- return;
-
- ofd = create_client_file(node, port);
-
- pid_array = create_all_readers(cpus, node, port, pagesize, fd);
+ if (mode == NET) {
+ if (communicate_with_client_net(fd, &cpus, &pagesize) < 0)
+ return;
+ } else if (mode == VIRT) {
+ if (communicate_with_client_virt(fd, domain, &cpus, &pagesize) < 0)
+ return;
+ } else
+ pdie("process_client: Unsupported mode %d", mode);
+
+ ofd = create_client_file(node, port, domain, virtpid, mode);
+ pid_array = create_all_readers(cpus, node, port, domain, virtpid,
+ pagesize, fd, mode);
if (!pid_array)
return;

@@ -573,9 +708,22 @@ static void process_client(const char *node, const char *port, int fd)
/* wait a little to have the readers clean up */
sleep(1);

- put_together_file(cpus, ofd, node, port);
+ put_together_file(cpus, ofd, node, port, domain, virtpid, mode);
+
+ destroy_all_readers(cpus, pid_array, node, port, domain, virtpid, mode);
+}
+
+static void process_client_net(int fd, const char *node, const char *port)
+{
+ process_client(fd, node, port, NULL, 0, NET);
+}

- destroy_all_readers(cpus, pid_array, node, port);
+static void process_client_virt(int fd, const char *domain, int virtpid)
+{
+ /* keep connection to qemu if clients on guests finish operation */
+ do {
+ process_client(fd, NULL, NULL, domain, virtpid, VIRT);
+ } while (!done);
}

static int do_fork(int cfd)
@@ -602,32 +750,104 @@ static int do_fork(int cfd)
return 0;
}

-static int do_connection(int cfd, struct sockaddr_storage *peer_addr,
- socklen_t peer_addr_len)
+static int get_virtpid(int cfd)
{
- char host[NI_MAXHOST], service[NI_MAXSERV];
- int s;
+ struct ucred cr;
+ socklen_t cl;
int ret;

- ret = do_fork(cfd);
- if (ret)
+ cl = sizeof(cr);
+ ret = getsockopt(cfd, SOL_SOCKET, SO_PEERCRED, &cr, &cl);
+ if (ret < 0)
return ret;

- s = getnameinfo((struct sockaddr *)peer_addr, peer_addr_len,
- host, NI_MAXHOST,
- service, NI_MAXSERV, NI_NUMERICSERV);
+ return cr.pid;
+}

- if (s == 0)
- plog("Connected with %s:%s\n",
- host, service);
- else {
- plog("Error with getnameinfo: %s\n",
- gai_strerror(s));
- close(cfd);
- return -1;
+#define LIBVIRT_DOMAIN_PATH "/var/run/libvirt/qemu/"
+
+/* We can convert pid to domain name of a guest when we use libvirt. */
+static char *get_guest_domain_from_pid(int pid)
+{
+ struct dirent *dirent;
+ char file_name[NAME_MAX];
+ char *file_name_ret, *domain;
+ char buf[BUFSIZ];
+ DIR *dir;
+ size_t doml;
+ int fd;
+
+ dir = opendir(LIBVIRT_DOMAIN_PATH);
+ if (!dir) {
+ if (errno == ENOENT)
+ warning("Only support for using libvirt");
+ return NULL;
+ }
+
+ for (dirent = readdir(dir); dirent != NULL; dirent = readdir(dir)) {
+ snprintf(file_name, NAME_MAX, LIBVIRT_DOMAIN_PATH"%s",
+ dirent->d_name);
+ file_name_ret = strstr(file_name, ".pid");
+ if (file_name_ret) {
+ fd = open(file_name, O_RDONLY);
+ if (fd < 0)
+ return NULL;
+ if (read(fd, buf, BUFSIZ) < 0)
+ return NULL;
+
+ if (pid == atoi(buf)) {
+ /* not include /var/run/libvirt/qemu */
+ doml = (size_t)(file_name_ret - file_name)
+ - strlen(LIBVIRT_DOMAIN_PATH);
+ domain = strndup(file_name +
+ strlen(LIBVIRT_DOMAIN_PATH),
+ doml);
+ plog("start %s:%d\n", domain, pid);
+ return domain;
+ }
+ }
}

- process_client(host, service, cfd);
+ return NULL;
+}
+
+static int do_connection(int cfd, struct sockaddr *peer_addr,
+ socklen_t peer_addr_len, int mode)
+{
+ char host[NI_MAXHOST], service[NI_MAXSERV];
+ int s, ret, virtpid;
+ char *domain = NULL;
+
+ if (mode == VIRT) {
+ virtpid = get_virtpid(cfd);
+ if (virtpid < 0)
+ return virtpid;
+
+ domain = get_guest_domain_from_pid(virtpid);
+ if (!domain)
+ return -1;
+ }
+
+ ret = do_fork(cfd);
+ if (ret)
+ return ret;
+
+ if (mode == NET) {
+ s = getnameinfo(peer_addr, peer_addr_len, host, NI_MAXHOST,
+ service, NI_MAXSERV, NI_NUMERICSERV);
+
+ if (s == 0)
+ plog("Connected with %s:%s\n",
+ host, service);
+ else {
+ plog("Error with getnameinfo: %s\n",
+ gai_strerror(s));
+ close(cfd);
+ return -1;
+ }
+ process_client_net(cfd, host, service);
+ } else if (mode == VIRT)
+ process_client_virt(cfd, domain, virtpid);

close(cfd);

@@ -681,12 +901,11 @@ static void remove_process(int pid)

static void kill_clients(void)
{
- int status;
int i;

for (i = 0; i < saved_pids; i++) {
kill(client_pids[i], SIGINT);
- waitpid(client_pids[i], &status, 0);
+ waitpid(client_pids[i], NULL, 0);
}

saved_pids = 0;
@@ -705,31 +924,38 @@ static void clean_up(int sig)
} while (ret > 0);
}

-static void do_accept_loop(int sfd)
+static void do_accept_loop(int sfd, int mode)
{
- struct sockaddr_storage peer_addr;
- socklen_t peer_addr_len;
+ struct sockaddr addr;
+ socklen_t addrlen;
int cfd, pid;

- peer_addr_len = sizeof(peer_addr);
+ if (mode == NET)
+ addrlen = sizeof(struct sockaddr_storage);
+ else if (mode == VIRT)
+ addrlen = sizeof(struct sockaddr_un);
+ else
+ pdie("do_accept_loop: Unsupported mode %d", mode);

do {
- cfd = accept(sfd, (struct sockaddr *)&peer_addr,
- &peer_addr_len);
+ cfd = accept(sfd, &addr, &addrlen);
printf("connected!\n");
if (cfd < 0 && errno == EINTR)
continue;
if (cfd < 0)
pdie("connecting");

- pid = do_connection(cfd, &peer_addr, peer_addr_len);
+ if (mode == NET)
+ pid = do_connection(cfd, &addr, addrlen, mode);
+ else if (mode == VIRT)
+ pid = do_connection(cfd, NULL, 0, mode);
if (pid > 0)
add_process(pid);

} while (!done);
}

-static void do_listen(char *port)
+static void do_listen_net(char *port)
{
struct addrinfo hints;
struct addrinfo *result, *rp;
@@ -767,8 +993,64 @@ static void do_listen(char *port)
if (listen(sfd, backlog) < 0)
pdie("listen");

- do_accept_loop(sfd);
+ do_accept_loop(sfd, NET);
+
+ kill_clients();
+}
+
+static void make_virt_if_dir(void)
+{
+ struct group *group;
+
+ if (mkdir(TRACE_CMD_DIR, 0710) < 0) {
+ if (errno != EEXIST)
+ pdie("mkdir %s", TRACE_CMD_DIR);
+ }
+ /* QEMU operates as qemu:qemu */
+ chmod(TRACE_CMD_DIR, 0710);
+ group = getgrnam("qemu");
+ if (chown(TRACE_CMD_DIR, -1, group->gr_gid) < 0)
+ pdie("chown %s", TRACE_CMD_DIR);
+
+ if (mkdir(VIRT_DIR, 0710) < 0) {
+ if (errno != EEXIST)
+ pdie("mkdir %s", VIRT_DIR);
+ }
+ chmod(VIRT_DIR, 0710);
+ if (chown(VIRT_DIR, -1, group->gr_gid) < 0)
+ pdie("chown %s", VIRT_DIR);
+}
+
+static void do_listen_virt(void)
+{
+ struct sockaddr_un un_server;
+ struct group *group;
+ socklen_t slen;
+ int sfd;
+
+ make_virt_if_dir();
+
+ slen = sizeof(un_server);
+ sfd = socket(AF_UNIX, SOCK_STREAM, 0);
+ if (sfd < 0)
+ pdie("socket");
+
+ un_server.sun_family = AF_UNIX;
+ snprintf(un_server.sun_path, PATH_MAX, VIRT_TRACE_CTL_SOCK);
+
+ if (bind(sfd, (struct sockaddr *)&un_server, slen) < 0)
+ pdie("bind");
+ chmod(VIRT_TRACE_CTL_SOCK, 0660);
+ group = getgrnam("qemu");
+ if (chown(VIRT_TRACE_CTL_SOCK, -1, group->gr_gid) < 0)
+ pdie("fchown %s", VIRT_TRACE_CTL_SOCK);
+
+ if (listen(sfd, backlog) < 0)
+ pdie("listen");
+
+ do_accept_loop(sfd, VIRT);

+ unlink(VIRT_TRACE_CTL_SOCK);
kill_clients();
}

@@ -782,17 +1064,33 @@ enum {
OPT_debug = 255,
};

+static void parse_args_net(int c, char **argv, char **port)
+{
+ switch (c) {
+ case 'p':
+ *port = optarg;
+ break;
+ default:
+ usage(argv);
+ }
+}
+
void trace_listen(int argc, char **argv)
{
char *logfile = NULL;
char *port = NULL;
int daemon = 0;
+ int mode = 0;
int c;

if (argc < 2)
usage(argv);

- if (strcmp(argv[1], "listen") != 0)
+ if (strcmp(argv[1], "listen") == 0)
+ mode = NET;
+ else if (strcmp(argv[1], "virt-server") == 0)
+ mode = VIRT;
+ else
usage(argv);

for (;;) {
@@ -812,9 +1110,6 @@ void trace_listen(int argc, char **argv)
case 'h':
usage(argv);
break;
- case 'p':
- port = optarg;
- break;
case 'd':
output_dir = optarg;
break;
@@ -831,11 +1126,14 @@ void trace_listen(int argc, char **argv)
debug = 1;
break;
default:
- usage(argv);
+ if (mode == NET)
+ parse_args_net(c, argv, &port);
+ else
+ usage(argv);
}
}

- if (!port)
+ if (!port && mode == NET)
usage(argv);

if ((argc - optind) >= 2)
@@ -863,7 +1161,12 @@ void trace_listen(int argc, char **argv)
signal_setup(SIGINT, finish);
signal_setup(SIGTERM, finish);

- do_listen(port);
+ if (mode == NET)
+ do_listen_net(port);
+ else if (mode == VIRT)
+ do_listen_virt();
+ else
+ ; /* Not reached */

return;
}
diff --git a/trace-msg.c b/trace-msg.c
index e3d4f3f..717089c 100644
--- a/trace-msg.c
+++ b/trace-msg.c
@@ -59,6 +59,9 @@ typedef __be32 be32;

#define CPU_MAX 256

+/* use CONNECT_MSG as a protocol version of trace-msg */
+#define CONNECT_MSG "tracecmd-V2"
+
/* for both client and server */
bool use_tcp;
int cpu_count;
@@ -78,6 +81,10 @@ struct tracecmd_msg_str {
char *buf;
} __attribute__((packed));

+struct tracecmd_msg_rconnect {
+ struct tracecmd_msg_str str;
+};
+
struct tracecmd_msg_opt {
be32 size;
be32 opt_cmd;
@@ -104,6 +111,7 @@ struct tracecmd_msg_error {
be32 size;
be32 cmd;
union {
+ struct tracecmd_msg_rconnect rconnect;
struct tracecmd_msg_tinit tinit;
struct tracecmd_msg_rinit rinit;
struct tracecmd_msg_meta meta;
@@ -111,7 +119,10 @@ struct tracecmd_msg_error {
} __attribute__((packed));

enum tracecmd_msg_cmd {
+ MSG_ERROR = 0,
MSG_CLOSE = 1,
+ MSG_TCONNECT = 2,
+ MSG_RCONNECT = 3,
MSG_TINIT = 4,
MSG_RINIT = 5,
MSG_SENDMETA = 6,
@@ -122,6 +133,7 @@ struct tracecmd_msg {
be32 size;
be32 cmd;
union {
+ struct tracecmd_msg_rconnect rconnect;
struct tracecmd_msg_tinit tinit;
struct tracecmd_msg_rinit rinit;
struct tracecmd_msg_meta meta;
@@ -159,6 +171,16 @@ static void bufcpy(void *dest, u32 offset, const void *buf, u32 buflen)
memcpy(dest+offset, buf, buflen);
}

+static int make_rconnect(const char *buf, int buflen, struct tracecmd_msg *msg)
+{
+ u32 offset = offsetof(struct tracecmd_msg, data.rconnect.str.buf);
+
+ msg->data.rconnect.str.size = htonl(buflen);
+ bufcpy(msg, offset, buf, buflen);
+
+ return 0;
+}
+
enum msg_opt_command {
MSGOPT_USETCP = 1,
};
@@ -236,11 +258,13 @@ static int make_rinit(struct tracecmd_msg *msg)

msg->data.rinit.cpus = htonl(cpu_count);

- for (i = 0; i < cpu_count; i++) {
- /* + rrqports->cpus or rrqports->port_array[i] */
- offset += sizeof(be32);
- port = htonl(port_array[i]);
- bufcpy(msg, offset, &port, sizeof(be32) * cpu_count);
+ if (port_array) {
+ for (i = 0; i < cpu_count; i++) {
+ /* + rrqports->cpus or rrqports->port_array[i] */
+ offset += sizeof(be32);
+ port = htonl(port_array[i]);
+ bufcpy(msg, offset, &port, sizeof(be32) * cpu_count);
+ }
}

return 0;
@@ -252,6 +276,9 @@ static u32 tracecmd_msg_get_body_length(u32 cmd)
u32 len = 0;

switch (cmd) {
+ case MSG_RCONNECT:
+ return sizeof(msg->data.rconnect.str.size)
+ + sizeof(CONNECT_MSG);
case MSG_TINIT:
len = sizeof(msg->data.tinit.cpus)
+ sizeof(msg->data.tinit.page_size)
@@ -288,6 +315,8 @@ static u32 tracecmd_msg_get_body_length(u32 cmd)
static int tracecmd_msg_make_body(u32 cmd, struct tracecmd_msg *msg)
{
switch (cmd) {
+ case MSG_RCONNECT:
+ return make_rconnect(CONNECT_MSG, sizeof(CONNECT_MSG), msg);
case MSG_TINIT:
return make_tinit(msg);
case MSG_RINIT:
@@ -423,6 +452,8 @@ static void *tracecmd_msg_buf_access(struct tracecmd_msg *msg, int offset)

static int tracecmd_msg_wait_for_msg(int fd, struct tracecmd_msg *msg)
{
+ int offset = TRACECMD_MSG_HDR_LEN;
+ char *buf;
u32 cmd;
int ret;

@@ -434,8 +465,20 @@ static int tracecmd_msg_wait_for_msg(int fd, struct tracecmd_msg *msg)
}

cmd = ntohl(msg->cmd);
- if (cmd == MSG_CLOSE)
+ switch (cmd) {
+ case MSG_RCONNECT:
+ offset += sizeof(msg->data.rconnect.str.size);
+ buf = tracecmd_msg_buf_access(msg, offset);
+ /* Make sure the server is the tracecmd server */
+ if (memcmp(buf, CONNECT_MSG,
+ ntohl(msg->data.rconnect.str.size) - 1) != 0) {
+ warning("server not tracecmd server");
+ return -EPROTONOSUPPORT;
+ }
+ break;
+ case MSG_CLOSE:
return -ECONNABORTED;
+ }

return 0;
}
@@ -494,7 +537,55 @@ static void error_operation_for_server(struct tracecmd_msg *msg)

cmd = ntohl(msg->cmd);

- warning("Message: cmd=%d size=%d\n", cmd, ntohl(msg->size));
+ if (cmd == MSG_ERROR)
+ plog("Receive error message: cmd=%d size=%d\n",
+ ntohl(msg->data.err.cmd), ntohl(msg->data.err.size));
+ else
+ warning("Message: cmd=%d size=%d\n", cmd, ntohl(msg->size));
+}
+
+int tracecmd_msg_set_connection(int fd, const char *domain)
+{
+ struct tracecmd_msg *msg;
+ char buf[TRACECMD_MSG_MAX_LEN] = {};
+ u32 cmd;
+ int ret;
+
+ msg = (struct tracecmd_msg *)buf;
+
+ /*
+ * Wait for connection msg by a client first.
+ * If a client uses virtio-serial, a connection message will
+ * not be sent immediately after accept(). connect() is called
+ * in QEMU, so the client can send the connection message
+ * after guest boots. Therefore, the virt-server patiently
+ * waits for the connection request of a client.
+ */
+ ret = tracecmd_msg_recv(fd, msg);
+ if (ret < 0) {
+ if (!buf[0]) {
+ /* No data means QEMU has already died. */
+ close(fd);
+ die("Connection refuesd: %s", domain);
+ }
+ return -ENOMSG;
+ }
+
+ cmd = ntohl(msg->cmd);
+ if (cmd == MSG_CLOSE)
+ return -ECONNABORTED;
+ else if (cmd != MSG_TCONNECT)
+ return -EINVAL;
+
+ ret = tracecmd_msg_send(fd, MSG_RCONNECT);
+ if (ret < 0)
+ goto error;
+
+ return 0;
+
+error:
+ error_operation_for_server(msg);
+ return ret;
}

#define MAX_OPTION_SIZE 4096
diff --git a/trace-recorder.c b/trace-recorder.c
index 66cad98..ad80d82 100644
--- a/trace-recorder.c
+++ b/trace-recorder.c
@@ -155,19 +155,23 @@ tracecmd_create_buffer_recorder_fd2(int fd, int fd2, int cpu, unsigned flags,
recorder->fd1 = fd;
recorder->fd2 = fd2;

- path = malloc_or_die(strlen(buffer) + 40);
- if (!path)
- goto out_free;
+ if (buffer) {
+ path = malloc_or_die(strlen(buffer) + 40);
+ if (!path)
+ goto out_free;

- if (flags & TRACECMD_RECORD_SNAPSHOT)
- sprintf(path, "%s/per_cpu/cpu%d/snapshot_raw", buffer, cpu);
- else
- sprintf(path, "%s/per_cpu/cpu%d/trace_pipe_raw", buffer, cpu);
- recorder->trace_fd = open(path, O_RDONLY);
- if (recorder->trace_fd < 0)
- goto out_free;
+ if (flags & TRACECMD_RECORD_SNAPSHOT)
+ sprintf(path, "%s/per_cpu/cpu%d/snapshot_raw",
+ buffer, cpu);
+ else
+ sprintf(path, "%s/per_cpu/cpu%d/trace_pipe_raw",
+ buffer, cpu);
+ recorder->trace_fd = open(path, O_RDONLY);
+ if (recorder->trace_fd < 0)
+ goto out_free;

- free(path);
+ free(path);
+ }

if ((recorder->flags & TRACECMD_RECORD_NOSPLICE) == 0) {
ret = pipe(recorder->brass);
@@ -190,8 +194,9 @@ tracecmd_create_buffer_recorder_fd(int fd, int cpu, unsigned flags, const char *
return tracecmd_create_buffer_recorder_fd2(fd, -1, cpu, flags, buffer, 0);
}

-struct tracecmd_recorder *
-tracecmd_create_buffer_recorder(const char *file, int cpu, unsigned flags, const char *buffer)
+static struct tracecmd_recorder *
+__tracecmd_create_buffer_recorder(const char *file, int cpu, unsigned flags,
+ const char *buffer)
{
struct tracecmd_recorder *recorder;
int fd;
@@ -254,6 +259,25 @@ tracecmd_create_buffer_recorder_maxkb(const char *file, int cpu, unsigned flags,
goto out;
}

+struct tracecmd_recorder *
+tracecmd_create_buffer_recorder(const char *file, int cpu, unsigned flags,
+ const char *buffer)
+{
+ return __tracecmd_create_buffer_recorder(file, cpu, flags, buffer);
+}
+
+struct tracecmd_recorder *
+tracecmd_create_recorder_virt(const char *file, int cpu, int trace_fd)
+{
+ struct tracecmd_recorder *recorder;
+
+ recorder = __tracecmd_create_buffer_recorder(file, cpu, 0, NULL);
+ if (recorder)
+ recorder->trace_fd = trace_fd;
+
+ return recorder;
+}
+
struct tracecmd_recorder *tracecmd_create_recorder_fd(int fd, int cpu, unsigned flags)
{
const char *tracing;
diff --git a/trace-usage.c b/trace-usage.c
index 520b14b..3d9b821 100644
--- a/trace-usage.c
+++ b/trace-usage.c
@@ -212,6 +212,16 @@ static struct usage_help usage_help[] = {
" -l logfile to write messages to.\n"
},
{
+ "virt-server",
+ "listen on a virtio-serial for trace clients",
+ " %s virt-server [-o file][-d dir][-l logfile]\n"
+ " Creates a socket to listen for clients.\n"
+ " -D create it in daemon mode.\n"
+ " -o file name to use for clients.\n"
+ " -d diretory to store client files.\n"
+ " -l logfile to write messages to.\n"
+ },
+ {
"list",
"list the available events, plugins or options",
" %s list [-e [regex]][-t][-o][-f [regex]]\n"

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/