AWK, SED, REGEX for renaming files

I was just learning to use REGEX, AWK and SED. Currently, I have a group of files that I would like to rename - they all sit in the same directory.

The naming pattern is consistent, but I would like to reorder the file names, here is the format:

01._HORRIBLE_HISTORIES_S2.mp4 02._HORRIBLE_HISTORIES_S2.mp4 

I would like to rename them to HORRIBLE_HISTORIES_s01e01.mp4 - where e01 is gleaned from the first column. I know that I want to grab “01” from the first column, add it to the variable, and then insert it after S2 into each file name, at the same time I want to delete it from the beginning of the file name along with “. _”, In addition I I want to change "S2" to "s02".

If someone would be so kind, could you write something to me using awk / sed and explain the procedure that I could learn from it?

+7
regex awk filenames sed renaming
source share
5 answers
 for f in *.mp4; do echo mv "$f" \ "$(awk -F '[._]' '{ si = sprintf("%02s", substr($5,2)); print $3 "_" $4 "_s" si "e" $1 "." $6 }' <<<"$f")" done 
  • Iterates over all *.mp4 files.
  • Renames each of the results of the awk command provided by command substitution ( $(...) ).
  • The awk command splits the input file name into tokens on . or "_" (which makes the first token available as $1 , the second as $2 , ...).
  • Firstly, the number in "_S {number}" remains in a nutshell with 0 (that is, a 0 added only if the number does not already have 2 digits) and is stored in the variable si (season index); if this is normal, always add 0 , the awk program can be simplified to: { print $3 "_" $4 "_s0" substr($5,2) "e" $1 "." $6 } { print $3 "_" $4 "_s0" substr($5,2) "e" $1 "." $6 }
  • The result, along with the rest of the tokens, is then reordered to form the desired file name.

Pay attention to echo before mv so that you can safely view the resulting command - delete it to actually rename.

Alternative : pure bash solution using regex:

 for f in *.mp4; do [[ $f =~ ^([0-9]+)\._([^.]+)_S([^.]+)\.(.+)$ ]] echo mv "$f" \ "${BASH_REMATCH[2]}_s0${BASH_REMATCH[3]}e${BASH_REMATCH[1]}.${BASH_REMATCH[4]}" done 
  • Uses the bash, =~ , regular expression matching operator with capture groups (substrings in (...) ) to match each file name and extract substrings of interest.
  • The results of the comparison are stored in a special variable of the $BASH_REMATCH with element 0 containing all matches, 1 containing what corresponds to the first capture group, 2 second, etc.
  • The target mv command argument then collects the capture group matches in the desired order; note that in this case, for simplicity, I made the zero complement s{number} unconditional - a 0 just added.

As above, you need to remove echo before mv in order to do the actual renaming.

+7
source share

A common way to rename multiple files according to a pattern is to use the Perl rename command. It uses Perl regular expressions and is very efficient. Use -n -v to check the pattern without touching the files:

 $ rename -n -v 's/^(\d+)._(.+)_S2\.mp4/$2_s02e$1.mp4/' *.mp4 01._HORRIBLE_HISTORIES_S2.mp4 renamed as HORRIBLE_HISTORIES_s02e01.mp4 02._HORRIBLE_HISTORIES_S2.mp4 renamed as HORRIBLE_HISTORIES_s02e02.mp4 

Use parentheses to capture strings in the variables $1 (first capture), $2 (second capture), etc.:

  • ^(\d+) record numbers at the beginning of the file name (in $1)
  • ._(.+)_S2\.mp4 grab everything between ._ and _S2.mp4 (at $2 )
  • $2_s02e$1.mp4 collect the new file name with the captured data as you want.

When you are happy with the result, remove -n from the command and rename all the files for real.

rename often provided by default on Linux ( util-linux package). There is a similar discussion here on SO, with more details on finding / installing the correct command.

+9
source share

You can do this with almost pure bash (with a variable extension ):

 for f in *mp4 ; do newfilename="${f:5:20}_s01e${f:1:2}.mp4" echo mv $f $newfilename done 

If the output of this command meets your needs, you can remove echo from the loop or, more simply (if your last command was higher), issue: !! | bash !! | bash !! | bash !! | bash !! | bash

+1
source share

Make a file name string into a text file, then use a loop and awk to rename the file.

 while read oldname; do newname=$(awk -F'.' '{ print substr($2, 2) "_e" $1 "." $3 }' <<< ${oldname} | \ awk -F'_' '{ print $1 "_s0" substr($2, 2) $3 }'); mv ${oldname} ${newname}; done<input.txt 
0
source share

If you want to use gawk , regex matching will really come in handy. I find this pipe-based solution a little better than worrying about loop designs.

 ls -1 | \ gawk 'match($0, /.../, a) { printf ... | "sh" } \ END { close("sh") }' 

For readability, I replaced the regex and mv with ellipses.

  • Line 1 lists all the file names in the current directory, one line at a time and the channels that correspond to the gawk command.
  • Line 2 starts the regex matching by assigning the captured groups to the array variable a . The action converts this to our desired command with printf , which itself is sent to sh for execution.
  • Line 3 closes the shell, which was implicitly open when we started to bind things to it.

So, you just fill in the syntax of regular expressions and commands (borrowing from mklement0 ). For example ( LIVE CODE WARNING ):

 ls -1 | \ gawk 'match($0, /^([0-9]+)\._([^.]+)_S([^.]+)\.(.+)$/, a) { printf "mv %s %s_s0%se%s.%s\n",a[0],a[2],a[3],a[1],a[4] | "sh" } \ END { close("sh") }' 

To view this command (as it should), you can simply delete | "sh" | "sh" from the second line.

0
source share

All Articles