Sed: delete alphanumeric words from a file

Question

Sed: delete alphanumeric words from a file

I have a file with a lot of text, what I want to do is delete all the alphanumeric words.

Example of words to be removed: gr8 2006 sdlfj435ljsa 232asa asld213 ladj2343asda asd!32

How can I do this?

+6

sed text-formatting

daydreamer Dec 13 '10 at 20:30

source share

4 answers

Assuming the only conclusion you wanted to get from your sample text is 2006 , and you have one word per line:

  sed '/[[:alpha:]]\+/{/[[:digit:]]\+/d}' /path/to/alnum/file

Enter

 $ cat alnum gr8 2006 sdlFj435ljsa 232asa asld213 ladj2343asda asd!32 alpha

Exit

 $ sed '/[[:alpha:]]\+/{/[[:digit:]]\+/d}' ./alnum 2006 alpha

+2

Siegex Dec 14 '10 at 1:36

source share

If the goal is to remove all alphanumeric words (strings consisting entirely of letters and numbers), then this sed command will work. It replaces all alphanumeric strings with nothing.

 sed 's/[[:alnum:]]*//g' < inputfile

Note that character classes other than alnum (see man 7 regex ).

For the data in this example, this leaves only 6 blank lines and one ! (since this is the only non-alphanumeric character in the example data). Is this really what you are trying to do?

0

Kamal Dec 13 '10 at 21:05

source share

AWK solution:

 BEGIN { # Statement that will be executed once at the beginning. FS="[ \t]" # Set space and tab characters to be treated as word separator. } # Code below will execute for each line in file. { x=1 # Set initial word index to 1 (0 is the original string in array) fw=1 # Indicate that future matched word is a first word. This is needed to put newline and spaces correctly. while ( x<=NF ) { gsub(/[ \t]*/,"",$x) # Strip word. Remove any leading and trailing white-spaces. if (!match($x,"^[A-Za-z0-9]*$")) # Print word only if it does not match pure alphanumeric set of characters. { if (fw == 0) { printf (" %s", $x) # Print the word offsetting it with space in case if this is not a first match. } else { printf ("%s", $x) # Print word as is... fw=0 # ...and indicate that future matches are not first occurrences } } x++ # Increase word index number. } if (fw == 0) # Print newline only if we had matched some words and printed something. { printf ("\n") } }

Assuming you have this script in script.awk' and data in data.txt , you have to invoke awk` as follows:

 awk -f ./test.awk ./data.txt

For your file, it will produce:

 asd!32

For more complex cases like this:

 gr8 2006 sdlfj435ljsa 232asa he!he lol asld213 f ladj2343asda asd!32 ab acd!s

... this will produce this:

 he!he asd!32 acd!s

Hope this helps. Good luck

0

user405725 Dec 13 '10 at 22:02

source share

Dennis williamson · Accepted Answer · 2010-12-13T23:15:39+0000

If you want to delete all words consisting of letters and numbers, leaving only words consisting of all numbers or all letters:

 sed 's/\([[:alpha:]]\+[[:digit:]]\+[[:alnum:]]*\|[[:digit:]]\+[[:alpha:]]\+[[:alnum:]]*\) \?//g' inputfile

Example:

 $ echo 'abc def ghi 111 222 ab3 a34 43a a34a 4ab3' | sed 's/\<\([[:alpha:]]\+[[:digit:]]\+[[:alnum:]]*\|[[:digit:]]\+[[:alpha:]]\+[[:alnum:]]*\) \?//g' abc def ghi 111 222

Sed: delete alphanumeric words from a file

Enter

Exit

More articles: