RE: Cleancache and shared filesystems

From: Dan Magenheimer
Date: Fri May 27 2011 - 11:32:07 EST


> From: Steven Whitehouse [mailto:swhiteho@xxxxxxxxxx]
> Sent: Friday, May 27, 2011 7:52 AM
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Subject: Cleancache and shared filesystems
>
> Hi,
>
> I'm trying to figure out what I would need to do in order to get GFS2
> to
> work with cleancache. Looking at the OCFS2 implementation leaves me
> with
> some questions about how this is supposed to work. The docs say that
> the
> cleancache_init_shared_fs() function is supposed to take a 128 bit UUID
> plus the sb.
>
> In OCFS2 it is passed a pointer to a 32 bit little endian quantity as
> the UUID:
>
> __le32 uuid_net_key;
>
> ...
>
> memcpy(&uuid_net_key, di->id2.i_super.s_uuid, sizeof(uuid_net_key));
>
> ...
>
> cleancache_init_shared_fs((char *)&uuid_net_key, sb);
>
> and in the Xen backend driver this then appears to be dereferenced as
> if
> its two 64 bit values, which doesn't look right to me.

Hi Steve --

Thanks for looking at cleancache!

Hmmm... yes... it looks like you are right and the cleancache
interface code for ocfs2 has bit-rotted over time and a bad value
is being used for the uuid. This would result in pages not being
shared between clustered VMs on the same physical machine, but
would only result in a lower performance advantage, not any
other visible effects, which would explain why it has gone
undiscovered for so long.

Thanks for pointing this out as it is definitely a bug!

> Also, since the sb has a UUID field in it anyway, is there some reason
> why that cannot be used directly rather than passing the uuid as a
> separate variable?

Forgive me but I am not a clustered FS expert (even for ocfs2):
If the UUID field in the sb is always 128-bits and is always
identical for all cluster nodes, and this fact is consistent
across all clustered FS's at mount time, I agree that there is
no need to pass the uuid as a parameter in
cleancache_init_shared_fs as it can be derived in the body of
cleancache_init_shared_fs and then passed to
__cleancache_init_shared_fs (which only cares that it gets
128-bits and probably has no business digging through a
superblock). OTOH, this call is made only once per mount
so there's no significant advantage in changing this... or am
I missing something?

Thanks,
Dan

====

Thanks... for the memory!
I really could use more / my throughput's on the floor
The balloon is flat / my swap disk's fat / I've OOM's in store
Overcommitted so much
(with apologies to Bob Hope)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/