Re: question about ext3_find_goal with reservation

From: Andrew Morton
Date: Fri May 21 2004 - 20:37:19 EST


Mingming Cao <cmm@xxxxxxxxxx> wrote:
>
> I just wondering, how the goal block being determined in ext3?
>
> If we are doing sequential write, (the new logical block is next to the
> last logical block), it's easy. Just increase the last physical block
> recorded in i_next_alloc_goal and take that as the goal block for
> ext3_new_block() to try to do allocation there.
>
> If the pattern is random write, in the current implementation(with the
> goal fix), the ext3_find_near() will find a goal with good locality.( I
> have a hard time understand the ext3_find_near(), need some help
> here...)

The comments in ext3_find_near() pretty well cover things.

The "colouring" implementation in there is to address some problems which
were exacerbated by the Orlov inode allocator. When we're determining the
initial block for a new file the allocator was essentially doing first-fit
within the blockgroup. So multiple files kept on having their blocks
jumbled up.

So the colouring essentially tries to start a new file at a random position
in its blockgroup. But it ensures that if one task is creating a lot of
files (say, an untar), these are all nicely contiguous within the
blockgroup. That's why the pid was used as the colour. A reasonable
heuristic. It improves the many-processes-creating-fiels problem, but
doesn't do much for the one-process-creates-lots-of-slowly-growing-files
problem.


> Well I am thinking, with the ext3 reservation changes, would it make
> sense that to find the goal in the reservation window, if we are doing
> random write on a inode, and we have a reservation window for that
> inode? In another words, when we try to find a goal for block allocation
> and the write pattern is non-sequential, if the the filesystem has
> reservation turned on, before we try to find a goal somewhere else,
> shouldn't we try to locate the goal inside the reservation window first?

I think that'll probabyl work OK. If the application is seekily growing
the file the layout is awful already.

> This will avoid the problem that a reservation window for an inode being
> throwed away frequently when a single process open an file doing lseek
> and write frequently? (We have discussed this before.)

Yes, we definitely want to avoid that CPU-intensive search on every write.

> With reservation, we probably don't need ext3_find_near() to guide us to
> find a goal block. But we still need that in the case the filesystem is
> mounted without reservation on.

You might need it for the very first block allocation. For example, if the
app opens an existing file and starts appending to it, does the current
code correctly commence allocation immediately beyond the file's final
block?

> If all above make sense, then the goal should be the start block of the
> reservation window.

Well. It'll be some block within the reservation window - the code should
be recording how far across the reservation window we currently are. We
don't want to do a linear search across the entire reservation window for
each block allocation attempt.

> So when ext3_new_block() try to satisfy the goal
> block, it will try to do allocation inside the reservation window
> directly. This makes the goal outside of reservation almost impossible
> in ext3_try_to_allocate_with_rsv(), unless it is doing sequential writes
> and the goal is just next to the end of the reservation.
>
> Any thoughts?

It would be nice to separate the old allocation things (i_alloc_block,
i_alloc_goal, etc) from the new code completely. Making one somehow
dependent upon or aware fo the other is confusing, and ultimately we'd like
to remove those fields from the inode altogether.

So maybe it's best to avoid calling ext3_find_near() and ext3_find_goal()
at all if reservations are enabled.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/