Re: [lustre-devel] [PATCH 41/80] staging: lustre: lmv: separate master object with master stripe

From: NeilBrown
Date: Sun Feb 11 2018 - 18:50:17 EST


On Sat, Feb 10 2018, James Simmons wrote:

>> > On Feb 8, 2018, at 10:10 PM, NeilBrown <neilb@xxxxxxxx> wrote:
>> >
>> > On Thu, Feb 08 2018, Oleg Drokin wrote:
>> >
>> >>> On Feb 8, 2018, at 8:39 PM, NeilBrown <neilb@xxxxxxxx> wrote:
>> >>>
>> >>> On Tue, Aug 16 2016, James Simmons wrote:
>> >>
>> >> my thatâs an old patch
>> >>
>> >>>
>> > ...
>> >>>
>> >>> Whoever converted it to "!strcmp()" inverted the condition. This is a
>> >>> perfect example of why I absolutely *loathe* the "!strcmp()" construct!!
>> >>>
>> >>> This causes many tests in the 'sanity' test suite to return
>> >>> -ENOMEM (that had me puzzled for a while!!).
>> >>
>> >> huh? I am not seeing anything of the sort and I was running sanity
>> >> all the time until a recent pause (but going to resume).
>> >
>> > That does surprised me - I reproduce it every time.
>> > I have two VMs running a SLE12-SP2 kernel with patches from
>> > lustre-release applied. These are servers. They have 2 3G virtual disks
>> > each.
>> > I have two over VMs running current mainline. These are clients.
>> >
>> > I guess your 'recent pause' included between v4.15-rc1 (8e55b6fd0660)
>> > and v4.15-rc6 (a93639090a27) - a full month when lustre wouldn't work at
>> > all :-(
>>
>> More than that, but I am pretty sure James Simmons is running tests all the time too
>> (he has a different config, I only have tcp).
>
> Yes I have been testing and haven't encountered this problem. Let me try
> the fix you pointed out.

Yeah, I guess I over reacted a bit in suggesting that no-one can have
been testing - sorry about that. It seemed really strange though as the
bug was so easy for me to hit.

Maybe - as you suggest in another email - it is due to some
client/server incompatibility. I guess it is unavoidable with an fs
like lustre to have incompatible protocol changes. Is there any
mechanism for detecting the version of other peers in the cluster and
refusing to run if versions are incompatible?

If you haven't hit the problem in testing, I suspect you aren't touching
that code path at all. Maybe put a BUG() call in there to see :-)

>
>> > Do you have a list of requested cleanups? I would find that to be
>> > useful.
>>
>> As Greg would tell you, âif you donât know what needs to be done,
>> letâs just remove the whole thing from staging nowâ.
>>
>> I assume you saw drivers/staging/lustre/TODO already, itâs only partially done.
>
> Actually the complete list is at :
>
> https://jira.hpdd.intel.com/browse/LU-9679
>
> I need to move that to our TODO list. Sorry I have been short on cycles.

Just adding that link to TODO would be a great start. I might do that
when I next send some patches.

Thanks,
NeilBrown

Attachment: signature.asc
Description: PGP signature