2.4 has O_SYNC bug ?

From: Rob van Nieuwkerk (robn@verdi.et.tudelft.nl)
Date: Sat Mar 22 2003 - 09:48:10 EST


Hi all,

Summary: Linux performs unneccesary (incorrect ?) strange extra writes
         on partitions opened with O_SYNC after data is written.

I'm using a small SBC with an application that logs certain data
records. It runs from a CompactFlash. There are 2 partitions on
the CF: a root fs (which is mounted read-only) and a data partition.
There's no fs on the data partition: I manage the records myself,
they are written sequentially to the raw partition. The data
partition is opened with the "O_SYNC" flag by the application.

When a record is written (which is always approx 3 sequential disk blocks)
I see the disk-activity led blink at the same moment. Good: we want
the data physically written to the CF at that moment; that's why the
O_SYNC is used.

But the strange thing is this: always after 30s the kernel performs
extra writes to the CF. It seems it's flushing some kind of dirty buffer
from the buffer cache. But there is not supposed to be any dirty buffer:
all data should have been written already to the CF because the partition
was opened with O_SYNC !

I don't know *what* data is written after the 30s. Maybe they are blocks
that were written 30s ago too (with the same content). Then they are
unneccesary and cause a much higher wear on the CF. Maybe it is something
that should have been written 30s ago but which is delayed by the kernel
despite the O_SYNC flag. Then it is incorrect and I'll lose data when
a user switches off the system within 30s after logging a record.

Anyone know what's going on ?

        greetings,
        Rob van Nieuwkerk

PS: Yes I know about O_DIRECT, I'll try that. But I think there is
    a bug with O_SYNC anyway !

System details:
---------------
Kernel built from RH 2.4.18-27.7.x tree, Geode GX1 CPU. There are no
other applications accessing the CF besides my own app. There is nothing
written (or read) on the CF by the app besides the (small) datarecord.
The app even runs from a ramfs, so this even rules out that I'm seeing
paging activity. I can see the strange writes happening in /proc/stat
too. When running "sync" after the record was written (and the disk
activity has happened) you see diskactivity. Running sync again produces
no new diskactivity and the "normal" activity after 30s does not happen.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Mar 23 2003 - 22:00:41 EST