The fastest ... You can do this with gcc, here is a version that reads data from a given file name, if specified, otherwise from stdin. If it's still too slow, you can see if you can do it faster by replacing getchar() and putchar() (which can be macros and should be very optimized) with your own buffering code. If we want to get ridiculous, even faster, you should have three threads, so the kernel can copy the next block of data with one core, while the other core is processing, and the third main copy is processing the output back to the kernel.
#!/bin/bash set -e BINNAME=$(mktemp) gcc -xc -O3 -o $BINNAME - <<"EOF" #include <stdio.h> #include <stdlib.h> int main(void) { int sep = 0; const int bufsize = 1024*1024; setvbuf(stdin, malloc(bufsize), _IOFBF, bufsize); setvbuf(stdout, malloc(bufsize), _IOFBF, bufsize); int ch; while((ch = getc(stdin)) >= 0) { if (isdigit(ch) || isspace(ch)) { if (!sep) { if (putc('\n', stdout) == EOF) break; sep = 1; } } else { sep = 0; if (putc(ch, stdout) == EOF) break; } } fflush(stdout); return ferror(stdin) || ferror(stdout); } EOF if [ -z "$1" ] ; then $BINNAME <&0 else $BINNAME <"$1" fi
Edit: I also looked too much at GNU / Linux stdio.h, some notes: putchar / getchar are not macros, but putc / getc , so using them instead can be a little optimization, probably avoiding a single function call, changed the code to reflect that. Also added verification of the putc return putc , and on it.
source share