Re: replace() system call needed (was Re: EXT4-ish "fixes" inUBIFS)

From: Bodo Eggert
Date: Wed Apr 01 2009 - 18:58:47 EST


On Wed, 1 Apr 2009, Pavel Machek wrote:
> On Tue 2009-03-31 20:06:57, Theodore Tso wrote:
> > On Tue, Mar 31, 2009 at 11:27:33PM +0200, Bodo Eggert wrote:

> > > This can be done using implicit logic:
> > >
> > > ->E.g. on close(), mark inodes without being sync()ed as poisoned.
> > > (I can think of more sophisticated logic, but ...)
> > > ->On completing the inode with the delayed allocations, unpoison it.
> > > ->Don't commit rename()s if the corresponding inode is poisoned.
> >
> > Send us patches if you think it's that easy to do what you are
> > proposing. I assure it's not easy.
>
> Well, implementing replace() syscall would be quite easy. Would you
> want such a patch?

You'd need - minus setting the flag on close - about the same logic:
- detecting if the inode are dirty
- detecting when the inode gets into a clean state
- delaying the commit of the rename() until then
- not replaying parts of the journal

As I understand, the replace() should not behave different from rename(),
except for making a guaranty. Since this guaranty is desired anyway, I
think making it automatically would be a good thing. Especially until
all applications are rewritten ...

Off cause if you call replace, you know it's going to be OK, but you could
also use pathconf($configdir, _PC_SYNC_BEFORE_RENAME)
(== 0: Sync or barrier implicit, (data = ordered)
1: small chance of data loss due to not syncing, (vfat?)
2: guarantee of data loss without sync (ext4))

// userspace.c
static inline int sync_before_rename(char * where, int is_important_data)
{
#ifndef _PC_SYNC_BEFORE_RENAME
// be optimistic for unimportant data
return is_important_data;
#else
int ret = pathconf(where, _PC_SYNC_BEFORE_RENAME);
if (is_important_data)
return ret;
return (ret > 1);
#endif
}
--
How do I set my laser printer on stun?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/