I searched for this in forums and in stackoverflow; he should be here somewhere, but I could not find him.
I am on a Mac using a terminal to run a shell script to rename some pdf files based on the contents of the file.
I have a directory full of pdf files that I export to text files using an open source pdf file. The resulting files have the same name as the pdf file, but end in .txt
. I created text files to find a line inside a file with the format Page xx Question xx
; for example Page 43 Question 2
. In this example, I would like to rename the pdf file as pg43_q2.pdf
I think the regex that I want is the following: /Page\s+(\d+)Question\s+(\d+)
but I'm not sure how to read the two captured numbers and save them to a string that I can use as file name.
script I still have:
#!/bin/sh PDF_FILE_PATH=$1 echo "Converting pdfs at $PDF_FILE_PATH" find "$PDF_FILE_PATH" -name '*.pdf' -print0 | while IFS= read -r -d '' filename; do echo $filename java -jar pdfbox-app-1.6.0.jar ExtractText "$filename" "$filename.txt" NEWNAME=$(sed -n -e '/Page/s/Page\s+\(\d+\)\s+Question\s+\(\d+\).*$/pg\1_q\2/p' "$filename.txt") echo "Renaming pdf $filename to $NEWNAME"
... but the sed command does not put anything in the NEWNAME variable.
I'm not particularly attached to sed, any suggestions would be appreciated
Last editing on the script uses the following sed command:
newname=$(sed -nE -e '/Page/s/^.*Page[[:blank:]]+([0-9]+)[[:blank:]]+Question[[:blank:]]+([0-9]+).*$/pg\1_q\2.pdf/p' "$filename.txt")
This works in about 50% of cases, but the rest of the time the newname variable is empty when I move on to renaming the file.
The third line of the converted file that works:
Unit 2 Review Page 257 Question 9 a) 12 (2)(2)(3)
The third line of the converted file that does not work:
Unit 2 Review Page 258 Question 16 a) (a – 4)(a + 7) = a(a + 7) – 4(a + 7) = a2 + 7a – 4a – 28 = a2 + 3a – 28 b) (2x + 3)(5x + 2) = 2x(5x + 2) + 3(5x + 2) = 10x2 + 4x + 15x + 6 = 10x2 + 19x + 6 c) (–x + 5)(x + 5) = –x(x + 5) + 5(x + 5) = –x2 – 5x + 5x + 25 = –x2 + 25 d) (3y + 4)2 = (3y + 4)(3y + 4) = 3y(3y + 4) + 4(3y + 4) = 9y2 + 12y + 12y + 16 = 9y2 + 24y + 16 e) (a – 3b)(4a – b) = a(4a – b) – 3b(4a – b) = 4a2 – ab – 12ab + 3b2 = 4a2 – 13ab + 3b2 f) (v – 1)(2v2 – 4v – 9) = v(2v2 – 4v – 9) – 1(2v2 – 4v – 9) = 2v3 – 4v2 – 9v – 2v2 + 4v + 9 = 2v3 – 6v2 – 5v + 9