Re: [PATCH 1/1] suspend: delete sys_sync()

From: Len Brown
Date: Sat Jun 20 2015 - 01:26:33 EST

>> Putting a function trace on sys_sync and executing sync manually,
>> I was able to see it take 100ms,
>> though function trace itself could be contributing to that...
> It would seem that way - you need to get the traces to dump to
> something that has no sync overhead....

I don't think that ftrace buffers have sync overhead --
at least not until they are copied into files.

>> Curiously, in another run, sync ran at 15ms, but sd suspend exploded to 300ms.
>> I've seen that in some other results. Sometimes sync if fast, but sd
>> then more than makes up for it by being slow:-(
> Oh, I see that too. Normally That's because the filesystem hasn't
> been told to enter an idle state and so is doing metadata writeback
> IO after the sync. When that happens the sd suspend has wait for
> request queues to drain, IO to complete and device caches to flush.
> This simply cannot be avoided because suspend never tells the
> filesytems to enter an idle state....

I captured a trace of a slow sd suspend.
Apparently, it does two operations -- first a sync_cache,
then a stop operation. The sync was fast. The stop command
was where all the time went.

I'll look at a more modern drive on the same system next week,
just for comparison.

> i.e. remember what I said initially in this thread about suspend
> actually needing to freeze filesystems, not just sync them?

I think with the complexity of file systems and the underlying
devices, yes, we need to think about how to efficiently
and safely suspend/resume them.

But sys_sync is too expensive to have hard-coded in the kernel suspend path.
Some machines can suspend and resume in under 10ms -- they
absolutely do not want sys_sync() hard-coded in the suspend path.

>> FYI,
>> I ran -x2
>> from current directory /tmp, which is mounted on tmpfs,
>> but still found the 2nd sync was very slow -- 200ms
>> vs 6 - 20 ms for the sync preceding the 1st suspend.
> So where did that time go? As I pointed out previously, function
> trace will only tell us if the delay is data writeback or not. We
> seem to have confirmed that the delay is, indeed, writeback of dirty
> data. Now we need to identify what the dirty data belongs to: we
> need to trace individual writeback events to see what dirty inodes
> are actually being written.

I expect that is moving data around between
the back-to-back suspends when the -x2 option is used -- will look into it.

Len Brown, Intel Open Source Technology Center
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at