Re: large files unnecessary trashing filesystem cache?

From: Guido Fiala
Date: Thu Oct 20 2005 - 10:26:21 EST


On Tuesday 18 October 2005 22:48, Badari Pulavarty wrote:
> On Tue, 2005-10-18 at 22:01 +0200, Guido Fiala wrote:
> > Story:
> > Once in while we have a discussion at the vdr (video disk recorder)
> > mailing list about very large files trashing the filesystems memory cache
> > leading to unnecessary delays accessing directory contents no longer
> > cached.
> > [...]
> Is there a reason why those applications couldn't use O_DIRECT ?
>
> Thanks,
> Badari

I asked a vdr-expert on this and here is the reason why O_DIRECT is not
suitable:

O_DIRECT would be great if it were a simple option for opening files.
But as a matter of facts O_DIRECT completely changes the semantics of
file access. You have to read blocks of a defines size to memory that
is aligned to defined block borders. Memory provided by normal malloc()
or new() is not usable and results in IO errors. So the result is you
have to have a complete rewrite of the whole IO subsystem of the
affected program. Most maintainers of non-trivial applications are
completely resistant against such changes - for good reasons.

If there would be an O_DIRECT_EX32++ (or O_STREAMING) that doesn't have
this change in semantic it would be much easier to apply the necessary
changes.

BTW: In the case of the VDR program not even a per process limit in used
buffer caches would help: the same program reads huge files _and_ huge
directory trees with a lot of small files that should be cached. A
heuristic for this case has to work on per file base. It needs to
detect that some files are only used in a streaming manner - with very
seldom jumps in random directions (skipping commercials, review a
scene). I don't know if such a heuristic is possible and if it would
not break other things.

PS: using f_advise helps a bit. One can keep IO semantics but you have
to add a virtualisation layer for all streaming IO. And you can't
combine posix_fadvise(POSIX_FADV_DONTNEED) with
posix_fadvise(POSIX_FADV_WILLNEED) when you possibly have jumps in your
access pattern because you can't cancel (at least to my knowledge) the
POSIX_FADV_WILLNEED call when you see the read ahead is not needed any
more. It would be an interesting add on if POSIX_FADV_DONTNEED would
cancel the read of a region that has been requested by
POSIX_FADV_WILLNEED before.

Ralf (forwarded by me on his request)

---
Hopefully i did now correctly "reply all" - sorry if i accidently caused some
trouble.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/