Assuming UTF-8 XML documents:
perl -CSDA -pe'
s/[^\x9\xA\xD\x20-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]+//g;
' file.xml > file_fixed.xml
If you want to encode erroneous bytes instead,
perl -CSDA -pe'
s/([^\x9\xA\xD\x20-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}])/
"&#".ord($1).";"
/xeg;
' file.xml > file_fixed.xml
You can call it in several ways:
perl -CSDA -pe'...' file.xml > file_fixed.xml
perl -CSDA -i~ -pe'...' file.xml
perl -CSDA -i -pe'...' file.xml
source
share