As for the performance issue, I donβt see anything special about the file size: 180 MB should not create any problems. What happens with memory usage when running the script?
I am not sure, however, that your regular expressions do what you want. This is for example:
/[a]{1}[1234567890xX]{10}\W/
(I think):
- one a. Are you sure you want to match "a"? "a" would be enough, not "[a] {1}" in this case.
- exactly 10 of (digit or "x" or "X")
- one non-word character, i.e. not az, AZ, 0-9 or underscore
There are several sample ISBN sockets here and here , although they seem to correspond more closely to the format we see on the back cover of the book, and I assume that your input file has stripped some of these formats.
Mike woodhouse
source share