Re: Another version of cleanfile/cleanpatch

From: Oleg Verych
Date: Thu Jun 07 2007 - 18:56:29 EST


On Thu, Jun 07, 2007 at 04:36:33PM +0200, Jan Engelhardt wrote:
>
> On Jun 6 2007 21:14, Oleg Verych wrote:
> >[]
> >> > Many things in XXI century still can be done by tools founded 20-30
> >> > years ago. Why not try to?
> >>
> >> Because your shell script is unreadable by normal human beings[*]
> >> while the perl script for people with a bit of perl fu can read it
> >> and fix/modify it.

Actually, unreadable and challenging were first messages, where i just
tried to show different point of view. After new subject and attached
working commented script with test case, i hope it's not.

> And because at the end of the day, the perl script might be faster
> than the shell script.
(ran by bash ;)

If fact, when i applied same solution and logic as in those scripts,
performance was very poor even with dash. But lets think for a moment
about, what must be done and then how. Meaningful whitespace damage is:

- trailing whitespace everywhere;
- x*(eight spaces) on line start;
- spaces between tabs near line start, before code;
- empty lines in the end of file (patches can't be handled, or can? :).

Finally what patches, i've replied on, done was, notifying user about
long visual lines.

So, script is interactive, isn't it? And that means:

- you have human as user;
- user knows, what script can possibly do.

Because of that, i think, following is redundant:

- to check for binary files
- scan whole file for long lines, with useless bunch of messages about
ones. Useless, because script doesn't fix that, it can't do that!

Thus, going from the end of my version, you see one-shoot check for
long line with (i hope) user-friendly message.

Then, there's no check for binary file at all.

Body -- is a commented sed script with shell variables for source/patch
handling switch and compatibility with other versions of sed, not only GNU.
If you like more tabs, then i stated in whitespace damage, just use
"unexpand".

As result there's one small (maintaining[1]), hopefully smart and fast
script.

> Yes, UNIX was designed to handle fork-exec efficiently, thank God. But
> still.
[]
> > efforts to remove bashizms...
>
> I prefer bashisms over using a shell [referring to original sh or ksh]
> that can't do a sane thing.

I would like to know cases. Just to try to solve them.

Two from my head are:

- `set pipefail' option -- not problem at all [0]
- arrays.

Arrays. Well, that depends. My option is as follows.

If you really need some kind of indexed data, create array via line by
line reading of file, storing data in variables with numbers, like

while :
do eval read -r $PREFIX_$i || break
i=$(($i + 1))
done
(stdin redirection is shown in [0]) thus you have it.

OK, to not to go offtopic, i would say here, that if that temp file on the
tmpfs, then Linux directly helps you with its efficient memory
management, not libc (good addition to fork/clone-execve, isn't it? ;)

Anything else may require not shell as solution.

>
> Jan
> --

[0] 47.2.1.4 More Elaborate Combinations (UNIX Power Tools)
<http://unix.org.ua/orelly/unix/upt/ch47_02.htm>

[1] Just two maitaining patches: usability and bugfix, as example:

|processing_based_on_patch_file_names.patch, not script name:

--- clean-whitespace.sh~v.00 2007-06-07 23:53:00.099249000 +0200
+++ clean-whitespace.sh 2007-06-07 23:54:43.025681500 +0200
@@ -7,5 +7,5 @@
not_patch_line='/^+[^+]/'

-case $0 in *diff* | *patch*)
+case $1 in *[.]diff | *[.]patch)
file=patch ; sp='+[!+]' ; p='+' ; addr="$not_patch_line" ;;
esac

|correctly_handle_lines_after_append_command.patch
|(and more visible resulting message):

--- clean-whitespace.sh~v.01 2007-06-07 23:54:43.025681500 +0200
+++ clean-whitespace.sh 2007-06-07 23:57:34.120374250 +0200
@@ -14,8 +14,8 @@
s|[$t$s]*$||; # trailing whitespace,
:next; # x*8 spaces on the line start -> x*tabs
-s|^$p\($t*\)$s4$s4|$p\1$t|;t next;
-s|^$p\($t*\)$s*$t|$p\1$t|g; # strip spaces between tabs
-};p" -- "$i" >"$o" &&
-echo "please, see clean ${file:=source} file: $o
+s|^\([\n]*\)$p\($t*\)$s4$s4|\1$p\2$t|;t next; # \n is needed after N command
+s|^\([\n]*\)$p\($t*\)$s*$t|\1$p\2$t|g; # strip spaces between tabs
+};p" -- "$i" >"$o" && echo "
+please, see clean ${file:=source} file: $o
"
exec expand $i | while read -r line # check for long line

____
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/