In1.2 kB/s (0.0%)8.3 kB/s (0.0%) <...">

Combining multiple sed commands

having the following file:

<tr class="in"> <th scope="row">In</th> <td>1.2 kB/s (0.0%)</td> <td>8.3 kB/s (0.0%) </td> <td>3.2 kB/s (0.0%) </td> </tr> <tr class="out"> <th scope="row">Out</th> <td>6.7 kB/s (0.6%) </td> <td>4.2 kB/s (0.1%) </td> <td>1.5 kB/s (0.6%) </td> </tr> 

I want to get the values ​​between every second <td></td> (and save it in a file) as follows:

 8.3 4.2 

My code is:

 # get the lines with <td> tags cat tmp.txt | grep '<td>[0-9]*.[0-9]' > tmp2.txt # delete whitespaces sed -i 's/[\t ]//g' tmp2.txt # remove <td> tag cat tmp2.txt | sed "s/<td>//g" > tmp3.txt # remove "kB/s (0.0%)" cat tmp3.txt | sed "s/kB\/s\((.*)\)//g" > tmp4.txt # remove </td> tag and save to traffic.txt cat tmp4.txt | sed "s/<\/td>//g" > traffic.txt #rm -R -f tmp* 

How can I do this in the usual way? This code is really noobish ..

Thanks Advance Marley

+4
source share
5 answers

Use the -e option. See it in man sed

So in your case you can do:

 cat tmp.txt | grep '<td>[0-9]*.[0-9]' \ | sed -e 's/[\t ]//g' \ -e "s/<td>//g" \ -e "s/kB\/s\((.*)\)//g" \ -e "s/<\/td>//g" > traffic.txt 

You can also write it in another way:

 grep "<td>.*</td>" tmp.txt | sed 's/<td>\([0-9.]\+\).*/\1/g' 

\+ matches one or more instances, but does not work on sed versions other than GNU. (For example, Mac has BSD)

With the @tripleee comment below, this is the most advanced version I could get, which will also work with non-GNU sed versions:

sed -n 's/<td>\([0-9]*.[0-9]*\).*/\1/p' tmp.txt

As a side note, you can also just pass the output through every sed, rather than save every output, which I see how people usually do for special tasks:

  cat tmp.txt | grep '<td>[0-9]*.[0-9]' \ | sed -e 's/[\t ]//g' \ | sed "s/<td>//g" \ | sed "s/kB\/s\((.*)\)//g" \ | sed "s/<\/td>//g" > traffic.txt 

The -e option is more efficient, but the connection option is more convenient, I think.

+10
source

This may work for you (GNU sed):

  sed '/^<tr/,/^<\/tr>/!d;/<td/H;/^<\/tr/!d;x;s/\n//g;s/<td>/\n/2;s/.*\n\(\S*\).*/\1/' file 

Explanation:

  • Focus on the lines between the <tr> and end </tr> tags. /^<tr/,/^<\/tr>/!d
  • Store the <td> lines in hold space (HS). /<td/H
  • Delete all rows in the range except the last. /^<\/tr/!d
  • Change to HS. x
  • Delete all new lines. s/\n//g
  • Replace the second <td> with a new line. s/<td>/\n/2
  • Delete all text in HS, except for the first non-spatial field after a new line is inserted and printed. s/.*\n\(\S*\).*/\1/
+2
source

You can use curly braces to create a block that is controlled by an address or a set of addresses:

 sed -n '/<td>[0-9]*.[0-9]/ {s/[\t ]//g; s/<td>//g; s/kB\/s\((.*)\)<\/td>//g;p}' tmp.txt 

I think that you can probably do something complicated with sed hold and pattern spaces to get the second and fourth lines (I saw solutions that can cancel double-spaced files this way).

+2
source

[Edit] Thanks to Barton for pointing out the error. The corrected version:

 cat tmp.txt | grep td | sed 's/<td>\([0-9]\.[0-9]\).*/\1/g' > newtmp.txt sed -n '2,${p;n;n}' newtmp.txt > final.txt; rm newtmp.txt 

The first line will select the digit.digit template after td in each line.

The second line prints every third line, starting from the second line (which actually gives you the second line from each group of three in the file).

+1
source

Your questions about running multiple seds seem to have been answered, but sed is the wrong tool for this. Assuming the input format is hard and the <tr> always at the beginning of the line, and the td tags you are looking for are always preceded by exactly 2 spaces on the line (this solution can easily be changed if it is not), you can do :

 awk -F'</?td>' '/^<tr/{i=0} /^ <td/{i++} i==2{print $2}' input-file 

The first argument tells awk to split each line into <td> or </td> , so the data you are interested in becomes the second field. The first sentence of the second argument resets the counter I to zero when <tr appears at the beginning of the line. The following increments i each time <td appear after 2 spaces. The latter prints the second field for the second line <td> . And the last argument indicates your input file.

Of course, this gives you everything there is between the <td> tags that I see, this is not what you want. To simply get a snippet of text between <td> and the first space, try:

 awk '/^<tr/{i=0} /^ <td/{i++} i==2{gsub( "<td>", ""); print $1}' input-file 
+1
source

Source: https://habr.com/ru/post/1415331/


All Articles