hi!
I've been trying to implement new locking schema for JBD
(Journaling Block Device). JBD is well-known bottleneck
for some configurations and loads.
The main ideas of locking design:
1) we do not lock the whole journal trying to get access for
some buffer, we do lock buffer only. let's call this lock
'bh lock'. in fact, this lock is simple one-bit state in
bh->b_state field. there are primitives to operate on this
lock: jbd_lock_bh(), jbd_unlock_bh() and jbd_bh_locked().
any operation on jh must be protected by this lock
2) each transaction has own lock to protect buffer list.
journal_file_buffer() and journal_unfile_buffer() uses
jh->j_transaction to find that lock. jh->j_transaction is
protected by bh lock. so, every time one tries to get write
access for a buffer following locking will be used:
get_write_access(bh)
{
jbd_lock_bh(bh);
/* decide what to do with buffer: wait, file it, etc */
journal_file_buffer(jh, th, BJ_Metadata);
{
spin_lock(&th->t_list_lock);
/* add buffer to transaction's list */
spin_unlock(&th->t_list_lock);
}
jbd_unlock_bh(bh);
}
while transaction is T_RUNNING state all proccessing go throught
this lock order. invalidatpage(), releasepage() and dirty_data()
also use this order. journal_commit_transaction() accesses buffers
in another order:
for_each_buffer_in_list(list) {
jbd_lock_bh(bh);
/* process it */
jbd_unlock_bh(bh);
}
so, it looks like lock ordeding violation. but, it isn't, because
this buffer is owned by commiting transaction and must not be refiled
by running transaction. the only places are flushing ordered data in
journal_commit_transaction() against journal_releasepage() and
journal_commit_transaction() against journal_dirty_data().
journal_commit_transaction() walks throught the list of transaction's
data buffers and journal_releasepage() first looks at buffer (so gets
bh lock), then refile it (so gets t_list_lock) => possible deadlock.
at this moment I use following schema:
lock(transaction->t_list_lock);
for_each_buffer_in_list(bh) {
get_bh(bh);
put bh in special array
unlock(transaction->t_list_lock);
for_each_buffer_in_special_array(bh) {
jbd_lock_bh(bh);
jh = bh2jh2(bh);
if (buffer belongs to the same transaction AND
buffer is on the same list) {
/* process buffer */
}
jbd_unlock_bh(bh);
put_bh(bh);
}
3) transaction's state and credits are protected by transaction->t_lock
3) revoke list protection
as we may have one running transaction and one committing transaction
at the same time, it's indeed that we simple need two revoke lists:
one for running transaction and one for committing transaction.
processes may modify revoke list simultaneously, so we protect current
revoke list by journal->j_revoke_lock
4) every time, journal_commit_transaction() starts to commit new transaction,
journal->j_running_transaction is set to NULL several start_this_handle()
may try to allocate new transaction. in order to make this SMP-compatible
get_transaction() uses journal->j_lock.
5) to protect list of committed transaction JDB uses journal->j_checkpoint_lock
6) log_do_checkpoint() scans list of transactions and list of buffers to be
flushed. it competes with journal_commit_transaction(). once again, here is
incompatible access order. I use schema, described in item 2.
The patch I'm sending have been tested for dozens of hours by
fsx-linux & bash-shared-mapping & make -j8 bzImage on dual
pIII-1GHz with 512MB RAM. Preempt was off. Patch is against
2.5.68-mm1.
I'd like to thank Andrew Morton for huge help.
with best regards, Alex
PS. would be happy to hear any comments/suggestions ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Wed Apr 30 2003 - 22:00:27 EST