How can I trim "/" using shell scripts?

I played with a small shell script to get some information from an HTML page loaded with lynx.

My problem is that I get this line: <span class="val3">MPPTN: 0.9384</span></td>

I can crop the first part using:

 trimmed_info=`echo ${info/'<span class="val3">'/}` 

And the line becomes: "MPPTN: 0.9384"

But how can I crop the last part? It seems that the "/" will mess up with the echo command ... I tried:

 echo ${finalt/'</span></td>'/}; 
+4
source share
4 answers

The behavior of ${VARIABLE/PATTERN/REPLACEMENT} depends on which shell you use and for which version of bash. In the ksh section or in fairly recent versions of bash ${finalt/'</span></td>'/} (think ≥ 4.0) this line is tuned as desired. In earlier versions of bash, quoting is pretty dodgy; you need to write ${finalt/<\/span><\/td>/} (which still works in newer versions).

Since you remove the suffix, you can use the ${VARIABLE%PATTERN} or ${VARIABLE%%PATTERN} construct. Here you delete everything after the first </ , that is, the longest suffix corresponding to the pattern </* . Similarly, you can break the leading HTML tags into ${VARIABLE##PATTERN} .

 trimmed=${finalt%%</*}; trimmed=${trimmed##*>} 

Added benefit: unlike ${…/…/…} , which is specific to bash / ksh / zsh and works a little differently in all three, ${…#…} and ${…%…} fully portable. They do not do so much, but there are enough of them.

Side note: although this did not cause any problems in this particular instance, you should always put double quotes around variable substitutions , for example

 echo "${finalt/'</span></td>'/}" 

Otherwise, the shell will expand wildcards and spaces as a result. A simple rule is that if you have no good reason to leave double quotes, you put them.

+4
source

Not sure if using sed is ok - one way to retrieve a number might be something like ...

  echo '<span class="val3">MPPTN: 0.9384</span></td>' | sed 's/^[^:]*..//' | sed 's/<.*$//' 
+6
source

The decision largely depends on what exactly you want to do. If all your lines look like <span class="val3">XXXXX: X.XXXX</span></td> , then the simplest solution would be

 echo $info | cut -c 20-32 

If they are of the form <span class="val3">variable length</span></td> , then the simplest solution is

 echo $info | sed 's/<span class="val3">//' | sed 's/<\/span><\/td>//' 

If it is more general, you can use regular expressions, as in Sai's answer.

+2
source

I would recommend using the sed command for this kind of thing:

 echo "$string" | sed "s/$regex/$replace/" 
+1
source

All Articles