Re: [PATCH 0/4] ext3 block reservation patch set

From: Andrew Morton
Date: Wed Apr 14 2004 - 18:13:30 EST


Mingming Cao <cmm@xxxxxxxxxx> wrote:
>
> > It's not clear when we should free up the write_state. I guess we
> > could leave it around for the remaining lifetime of the inode - that'd
> > still be a net win.

> We could free up the write_state at the time of ext3_discard_allocation(),
> (not at the time when we allocate a new reservation window)
>
> or later if we preserve reservation for slow growing files, we release
> the write_state at the time the inode is released.

That sounds appropriate.

> > - You're performing ext3_discard_reservation() in ext3_release_file().
> > Note that the file may still have pending allocations at this stage: say,
> > open a file, map it MAP_SHARED, dirty some pages which lie over file
> > holes then close the file again.
> >
> > Later, the VM will come along and write those dirty pages into the
> > file, at which point allocations need to be performed. But we have no
> > reservation data and, later, we may have no inode->write_state at all.
> >
> > What will happen?
> >
> In this case, we will allocation a new reservation window for it.
> Nothing bad will happen. We probably just waste a previously allocated
> reservation window...but I am not sure.
>
> My question is, if the file is first time opened, mapped, and we dirty
> pages in the file hole, will there any really disk block allocation
> involved there?

There might be, and there might not be. It depends on timing, memory
pressure, application activity, etc.

> The current implementation is more than O(n): every time it does not
> have a reservation window, it search from the head of per filesystem
> reservation window list head. If it failed within the group, it will
> move to the next group and start the search from the head of the list
> again.

Same problem exists in arch_get_unmapped_area(). We have a funny little
heuristic (free_area_cache) in there to speed up the common case.

> This could be fixed by forget about the block group boundary at
> all,(remove the for loop in ext3_new_block), make it searchs for a block
> in a filesystem wide:)

I do think we should do this. Does it have any disadvantages?

> I have concern about red black tree: it takes O(log(n)) to get where you
> want to start, but it need also takes O(log(n)) compare to find the hole
> size between two windows next to each other. And to find a reservable
> window, we need to browse the whole red black tree in the worse case, so
> the complexity is
> O(log(n)) + O(log(n)) *O(n)) = O(n)*O(log(n))
>
> Am I right?

Think so. rbtrees are optimised for loopkup, not for
get-me-a-suitably-sized-hole.


> > - Why do we discard the file's reservation on every iput()? iput's are
> > relatively common operations. (see fs/fs-writeback.c)
> >
> Yes..you are right! I was intent to call ext3_discard_allocation only
> when the usage count of the inode is 0. I looked at ext2 preallocation
> code, it called ext2_discard_preallocation in ext2_put_inode(), so I
> thought that's the place. But it seems ext3_put_inode() being called
> every time iput() is called. We should call ext3_discard_reservation in
> iput_final(). Should fix this in ext2.

Could be. so.

> > - What locking protects rsv_alloc_hit? i_sem is not held during
> > VM-initiated writeout. Maybe an atomic_t there, or just say that if we
> > race and the number is a bit inaccurate, we don't care?
> >
> Currently no lock is protect rsv_alloc_hit. The reason is it is just a
> heuristics indicator of whether we should enlarge the reservation window
> size next time. Even the hit ratio(50%) is just a rough guess, so, a
> little bit inaccurate would not hurt much, adding another lock probably
> not worth it.

I'd agree with that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/