Re: [PATCH 2/3] Emulate simple x86 instructions in userspace

From: Alexander Graf
Date: Fri Jun 21 2019 - 09:28:51 EST



On 12.06.19 17:19, samcacc@xxxxxxxxxx wrote:
On 5/31/19 10:38 AM, Alexander Graf wrote:
On 21.05.19 17:39, Sam Caccavale wrote:

+static void dump_state_after(const char *desc, struct state *state)
+{
+ÂÂÂ debug(" -- State after %s --\n", desc);
+ÂÂÂ debug("mode: %s\n", x86emul_mode_string[state->ctxt.mode]);
+ÂÂÂ debug(" cr0: %lx\n", state->vcpu.cr[0]);
+ÂÂÂ debug(" cr3: %lx\n", state->vcpu.cr[3]);
+ÂÂÂ debug(" cr4: %lx\n", state->vcpu.cr[4]);
+
+ÂÂÂ debug("Decode _eip: %lu\n", state->ctxt._eip);
+ÂÂÂ debug("Emulate eip: %lu\n", state->ctxt.eip);
+
+ÂÂÂ debug("\n");
 }
  int step_emulator(struct state *state)
 {
-ÂÂÂ return 0;
+ÂÂÂ int rc, prev_eip = state->ctxt.eip;
+ÂÂÂ int decode_size = state->data_available - decode_offset;
+
+ÂÂÂ if (decode_size < 15) {
+ÂÂÂÂÂÂÂ rc = x86_decode_insn(&state->ctxt, &state->data[decode_offset],
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ decode_size);
+ÂÂÂ } else {
+ÂÂÂÂÂÂÂ rc = x86_decode_insn(&state->ctxt, NULL, 0);

Isn't this going to fetch instructions from data as well? Why do we need
the < 15 special case at all?

I've changed the method of acquiring data in v2, but the 15 limit is
still relevant. If x86_decode_insn is called with a NULL pointer and
instruction size 0, the bytes are fetched via the emulator_ops.fetch
function. This would be nice, but there is no way of limiting how many
bytes it will try and fetch-- and it usually grabs 15 since that is the
longest x86 instruction (as of yet?). When there are less than 15 bytes
left, limiting the fetch size to the remaining bytes is important.


You want to at least add a comment here, detailing the fact that where the magic 15 comes from and that you want to exercise the normal prefetch path while still allowing the buffer to shrink < 15 bytes :). Maybe move MAX_INST_SIZE from svm.c into a .h file and reuse that while at it.


[...]


diff --git a/tools/fuzz/x86_instruction_emulation/scripts/bin_fuzz
b/tools/fuzz/x86_instruction_emulation/scripts/bin_fuzz
new file mode 100755
index 000000000000..e570b17f9404
--- /dev/null
+++ b/tools/fuzz/x86_instruction_emulation/scripts/bin_fuzz
@@ -0,0 +1,23 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0+
+# This runs the afl-harness at $1, $2 times (or 100)
+# It runs uniq and sorts the output to give an idea of what is
causing the
+# most crashes. Useful for deciding what to implement next.
+
+if [ "$#" -lt 1 ]; then
+Â echo "Usage: './bin_fuzz path_to_afl-harness [number of times to run]"
+Â exit
+fi
+
+mkdir -p fuzz
+rm -f fuzz/*.in fuzz/*.out
+
+for i in $(seq 1 1 ${2:-100})
+do
+Â {
+Â head -c 500 /dev/urandom | tee fuzz/$i.in | ./$1
+Â } > fuzz/$i.out 2>&1
+
+done
+
+find ./fuzz -name '*.out' -exec tail -1 {} \; | sed 's/.*
Segmen/Segman/' | sed -r 's/^(\s[0-9a-f]{2})+$/misc instruction
output/' | sort | uniq -c | sort -rn

What is that Segman thing about?

This was for binning crashes-- check `tools/fuzz/x86ie/scripts/bin.sh`
in v2 for the updated version. Basically, it checks whether a
segmentation fault has happened, and if so, launches a gdb session to
see whether it was caused by an unimplemented x86_emulator_op. This is
useful in development for prioritizing the unimplemented features which
are causing the most fake crashes.


I can see why you want to combine them, but I don't understand where "Segman" comes from. Where is there a man here?



Alex