Re: [PATCH] perf script: turn AUTOCOMMIT off for bulk SQL inserts inevent_analyzing_sample.py
From: Feng Tang
Date: Mon Jun 10 2013 - 10:34:23 EST
On Fri, Jun 07, 2013 at 11:58:53AM -0700, Emmet Caulfield wrote:
> The example script tools/perf/scripts/python/event_analyzing_sample.py
> contains a minor error. This script takes a perf.data file and
> populates a SQLite database with it.
>
> There's a long comment on lines 29-34 to the effect that it takes a
> long time to populate the database if the .db file is on disk, so it's
> done in the "ramdisk" (/dev/shm/perf.db), but the problem here is
> actually line 36:
>
> con.isolation_level=None
>
> This line turns on AUTOCOMMIT, making every INSERT statement into its
> own transaction, and greatly slowing down a bulk insert (25 minutes
> vs. a few seconds to insert 15,000 records). This is best solved by
> merely omitting this line or changing it to:
>
> con.isolation_level='DEFERRED'
>
> After making this change, if the database is in memory, it takes
> roughly 0.5 seconds to insert 15,000 records and 0.8 seconds if the
> database file is on disk, effectively solving the problem.
>
> Given that the whole purpose of having AUTOCOMMIT turned on is to
> ensure that individual insert/update/delete operations are committed
> to persistent storage, moving the .db file to a ramdisk defeats the
> purpose of turning this option on in the first place. Thus
> leaving/turning it *off* with the file on disk is no worse. It is
> pretty much standard practice to defer transactions and index updates
> for bulk inserts like this anyway.
>
> The following patch deletes the offending line and updates the
> associated comment.
>
> Emmet.
>
>
> --- tools/perf/scripts/python/event_analyzing_sample.py~
> 2013-06-03 15:38:41.762331865 -0700
> +++ tools/perf/scripts/python/event_analyzing_sample.py 2013-06-03
> 15:43:48.978344602 -0700
> @@ -26,14 +26,9 @@
> from perf_trace_context import *
> from EventClass import *
>
> -#
> -# If the perf.data has a big number of samples, then the insert operation
> -# will be very time consuming (about 10+ minutes for 10000 samples) if the
> -# .db database is on disk. Move the .db file to RAM based FS to speedup
> -# the handling, which will cut the time down to several seconds.
> -#
> +# Create/connect to a SQLite3 database:
> con = sqlite3.connect("/dev/shm/perf.db")
> -con.isolation_level = None
> +
>
> def trace_begin():
> print "In trace_begin:\n"
Thanks for the root causing the slowness of SQLite3 operation.
Acked-by: Feng Tang <feng.tang@xxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/