Re: [GIT PULL] Ceph fixes for -rc7
From: Gregory Farnum
Date: Wed Mar 30 2016 - 14:09:57 EST
On Wed, Mar 30, 2016 at 1:04 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> On Wed, Mar 30, 2016 at 4:40 AM, NeilBrown <neilb@xxxxxxxx> wrote:
>> On Wed, Mar 30 2016, Yan, Zheng wrote:
>>
>>> On Wed, Mar 30, 2016 at 8:24 AM, NeilBrown <neilb@xxxxxxxx> wrote:
>>>> On Fri, Mar 25 2016, Ilya Dryomov wrote:
>>>>
>>>>> On Fri, Mar 25, 2016 at 5:02 AM, NeilBrown <neilb@xxxxxxxx> wrote:
>>>>>> On Sun, Mar 06 2016, Sage Weil wrote:
>>>>>>
>>>>>>> Hi Linus,
>>>>>>>
>>>>>>> Please pull the following Ceph patch from
>>>>>>>
>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git for-linus
>>>>>>>
>>>>>>> This is a final commit we missed to align the protocol compatibility with
>>>>>>> the feature bits. It decodes a few extra fields in two different messages
>>>>>>> and reports EIO when they are used (not yet supported).
>>>>>>>
>>>>>>> Thanks!
>>>>>>> sage
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> Yan, Zheng (1):
>>>>>>> ceph: initial CEPH_FEATURE_FS_FILE_LAYOUT_V2 support
>>>>>>
>>>>>> Just wondering, but was CEPH_FEATURE_FS_FILE_LAYOUT_V2 supposed to have
>>>>>> exactly the same value as CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING (and
>>>>>> CEPH_FEATURE_CRUSH_TUNABLES5)??
>>>>>
>>>>> Yes, that was the point of getting it merged into -rc7.
>>>>
>>>> I did wonder if that might be the case.
>>>>
>>>>>
>>>>>> Because when I backported this patch (and many others) to some ancient
>>>>>> enterprise kernel, it caused mounts to fail. If it really is meant to
>>>>>> be the same value, then I must have some other backported issue to find
>>>>>> and fix.
>>>>>
>>>>> It has to be backported in concert with changes that add support for
>>>>> the other two bits.
>>>>
>>>> I have everything from fs/ceph and net/ceph as of 4.5, with adjustments
>>>> for different core code.
>>>>
>>>>> How did mount fail?
>>>>
>>>> "can't read superblock".
>>>> dmesg contains
>>>>
>>>> [ 50.822479] libceph: client144098 fsid 2b73bc29-3e78-490a-8fc6-21da1bf901ba
>>>> [ 50.823746] libceph: mon0 192.168.1.122:6789 session established
>>>> [ 51.635312] ceph: problem parsing mds trace -5
>>>> [ 51.635317] ceph: mds parse_reply err -5
>>>> [ 51.635318] ceph: mdsc_handle_reply got corrupt reply mds0(tid:1)
>>>>
>>>> then a hex dump of header:, front: footer:
>>>>
>>>> Maybe my MDS is causing the problem? It is based on v10.0.5 which
>>>> contains
>>>>
>>>> #define CEPH_FEATURE_CRUSH_TUNABLES5 (1ULL<<58) /* chooseleaf stable mode */
>>>> // duplicated since it was introduced at the same time as CEPH_FEATURE_CRUSH_TUN
>>>> #define CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING (1ULL<<58) /* New, v7 encoding */
>>>>
>>>> in ceph_features.h i.e. two features using bit 58, but not
>>>> FS_FILE_LAYOUT_V2
>>>>
>>>> Should I expect Linux 4.5 to work with ceph 10.0.5 ??
>>>
>>> Sorry, cephfs in linux 4.5 does not work with 10.0.5. Please upgrade
>>> to ceph 10.1.0
>>>
>>
>> Ahhh.. I do wonder at the point of feature flags if they don't let you
>> run any client with any server...
>> Is there a compatability matrix published somewhere?
>> If I have to stay with 10.0.5 (I don't know yet), it is safe to use
>> Linux-4.4 code?
>
> 10.0.* are all development cuts, we didn't even built packages for
> some of them. 10.1.0 is the first release candidate. You can think of
> 10.0.5 as a random pre-rc1 kernel snapshot, aimed at brave testers, so
> you do want to upgrade.
>
> The reason it doesn't work is those three features are all defined to
> the same value, but two of them got added earlier in the 10.0.* cycle.
> CEPH_FEATURE_FS_FILE_LAYOUT_V2 came in last, after 10.0.5.
A little more specifically: these feature bits do let you run any
client with any "real release" of Ceph that we expect not-testers to
be using. They *usually* work on our dev releases as well, but we've
gotten stingier about it as we come close to running out of feature
bits and are trying to pack more of them into the same actual bits
(we're working on freeing them up as well, but got started a little
later than is comfortable), while coordinating code merges between a
few different places. You got unlucky here.
-Greg