Convert PDF to clean SVG?

I am trying to convert pdf to svg. However, the one I use currently displays the path for each letter in each piece of text, that is, if I change the text in the source file, it looks ugly.

I was wondering what the cleanest SVG PDF converter is, hopefully one that has no way for its text areas that are simply not needed. As we know, PDF and SVG are pretty similar, so I assume there are good converters there.

+83
pdf svg
Apr 23 2018-12-12T00:
source share
10 answers

Inkscape is used by many people on Wikipedia to convert PDF to SVG.

http://inkscape.org/

They even have a handy guide on how to do this!

http://en.wikipedia.org/wiki/Wikipedia:Graphic_Lab/Resources/PDF_conversion_to_SVG#Conversion_with_Inkscape

+67
Apr 23 2018-12-12T00:
source share

You can use Inkscape only on the command line without opening the graphical interface. Try the following:

inkscape \ --without-gui \ --file=input.pdf \ --export-plain-svg=output.svg 

For a complete list of all command line options, run inkscape --help .

+67
Apr 24 2018-12-12T00:
source share

I am currently using PDFBox , which has good graphics output support. There is good support for extracting vector strokes, as well as for managing fonts. There are several useful tools for checking it (for example, PDFReader will display as Java Graphics2D). You can intercept the graphic tool with an SVG tool like Batik (I do this and it gives a good grip).

There is no easy way to convert all PDF to SVG - it depends on the strategy and tools used to create PDF files. Some text is converted to vectors and cannot be easily reconstructed - you need to install vector fonts and see them.

UPDATE: I developed this in the PDF2SVG package that Batik no longer uses:

which has been tested in a number of PDF files. It produces an SVG output consisting of

  • as one <svg:text> per character
  • paths like <svg:path>
  • images <svg:image>

Later packages (hopefully) convert characters to working text and paths to higher-level graphic objects

UPDATE: Now we can re-create executable text from SVG characters. We will also convert charts to domain-specific XML (e.g., chemical spectra). See https://bitbucket.org/petermr/svg2xml-dev . He is still in Alpha, but moving at a useful speed. Anyone can join!

UPDATE (@Tim Kelty) We continue to work on PDF2SVG, as well as downstream tools that make OCR OCR (limited) and create higher-level graphic primitives (arrows, rectangles, etc.). See https://bitbucket.org/petermr/imageanalysis https://bitbucket.org/petermr/diagramanalyzer https://bitbucket.org/petermr/norma and https://bitbucket.org/petermr/ami-core . This is a funded project to collect 100 million facts from non-fiction (contentmine.org), most of which is PDF.

+18
Apr 27 2018-12-12T00:
source share

This topic is quite old, but here is a convenient solution that I found:

http://www.cityinthesky.co.uk/opensource/pdf2svg/

He offers the pdf2png tool, which after installation performs exactly the job on the command line. I have tested it with flawless results so far, including with bitmaps.

EDIT: my mistake, this tool also converts letters in the path, so it does not affect the initial question. Nevertheless, it still works well and can be useful for anyone who does not intend to change the code in the svg file, so I will leave a message.

+13
Feb 05 '15 at 22:41
source share

If DVI for SVG is an option, you can also use dvisvgm to convert the DVI file to an SVG file. This works great, for example, for LaTeX formulas (with the --no-fonts option):

 dvisvgm --no-fonts input.dvi -o output.svg 

There is also pdf2svg , which uses poppler and Cairo to convert PDF to SVG. When I tried this, SVG displayed fine in inkscape .

+6
Jun 03 '13 at 8:42 on
source share

Here is the process I used. The main tool I used was Inkscape, which was able to easily convert text.

  • used the actions of Adobe Acrobat Pro with JavaScript to split PDF sheets.
  • run Inkscape Portable 0.48.5 from Windows Cmd to convert to SVG
  • made some manual changes for a specific SVG XML attribute that I'm having problems with using Windows Cmd and Windows PowerShell

Single pages: Adobe Acrobat Pro with JavaScript

Using Adobe Acrobat Pro actions (formerly batch processing) creates a custom action for splitting PDF pages into separate files. In addition, you can share PDF files with GhostScript

Acrobat JavaScript action to split pages

 /* Extract Pages to Folder */ var re = /.*\/|\.pdf$/ig; var filename = this.path.replace(re,""); { for ( var i = 0; i < this.numPages; i++ ) this.extractPages ({ nStart: i, nEnd: i, cPath : filename + "_s" + ("000000" + (i+1)).slice (-3) + ".pdf" }); }; 

Convert PDF to SVG: Inkscape Package with Windows CMD Commands

Using Windows Cmd, a batch file was created to cycle all PDF files in a folder and convert them to SVG

Batch file for converting PDF to SVG in current folder

 :: ===== SETUP ===== @echo off CLS echo Starting SVG conversion... echo. :: setup working directory (if different) REM set "_work_dir=%~dp0" set "_work_dir=%CD%" :: setup counter set "count=1" :: setup file search and save string set "_work_x1=pdf" set "_work_x2=svg" set "_work_file_str=*.%_work_x1%" :: setup inkscape commands set "_inkscape_path=D:\InkscapePortable\App\Inkscape\" set "_inkscape_cmd=%_inkscape_path%inkscape.exe" :: ===== FIND FILES IN WORKING DIRECTORY ===== :: Output from DIR last element is single carriage return character. :: Carriage return characters are directly removed after percent expansion, :: but not with delayed expansion. pushd "%_work_dir%" FOR /f "tokens=*" %%A IN ('DIR /A:-D /O:N /B %_work_file_str%') DO ( CALL :subroutine "%%A" ) popd :: ===== CONVERT PDF TO SVG WITH INKSCAPE ===== :subroutine echo. IF NOT [%1]==[] ( echo %count%:%1 set /A count+=1 start "" /D "%_work_dir%" /W "%_inkscape_cmd%" --without-gui --file="%~n1.%_work_x1%" --export-dpi=300 --export-plain-svg="%~n1.%_work_x2%" ) ELSE ( echo End of output ) echo. GOTO :eof :: ===== INKSCAPE REFERENCE ===== :: print inkscape help REM "%_inkscape_cmd%" --help > "%~dp0\inkscape_help.txt" REM "%_inkscape_cmd%" --verb-list > "%~dp0\inkscape_verb_list.txt" 

Cleanup Attributes: Windows Cmd and PowerShell

I understand that it is not recommended to manually iterate over the strength of SVG or XML tags or attributes due to possible changes, and instead use the XML parser. However, I had a simple problem where the stroke width in one drawing was very small, and on the other, the font family was not correctly identified, so I basically modified the previous version of the Windows Cmd script to do a simple search and replace. The only changes were in defining the search string and changing the invocation of the PowerShell command. The PowerShell command searches, replaces, and saves the modified file with the suffix added. I found some other links that could be better used for parsing or modifying the resulting SVG files if you need to do some other minor cleanup.

Manual modifications to find and replace SVG XML data

 :: setup file search and save string set "_work_x1=svg" set "_work_x2=svg" set "_work_s2=_mod" set "_work_file_str=*.%_work_x1%" 

powershell -Command "(Get-Content '%~n1.%_work_x1%') | ForEach-Object {$_ -replace 'stroke-width:0.06', 'stroke-width:1'} | ForEach-Object {$_ -replace 'font-family:Times Roman','font-family:Times New Roman'} | Set-Content '%~n1%_work_s2%.%_work_x2%'"

Hope this can help someone

References

Adobe Acrobat Pro actions and JavaScript links for individual pages

GhostScript links to individual pages

Inkscape command-line links for converting PDF to SVG

  • convert pdf to svg
  • Convert PDF to clean SVG?

Windows batch file cmd script links

XML Tag / Attribute Replacement Study

+6
May 29 '15 at 20:18
source share

Bash script to convert each PDF page to its own SVG file.

 #!/bin/bash # # Make one PDF per page using PDF toolkit. # Convert this PDF to SVG using inkscape # inputPdf=$1 pageCnt=$(pdftk $inputPdf dump_data | grep NumberOfPages | cut -d " " -f 2) for i in $(seq 1 $pageCnt); do echo "converting page $i..." pdftk ${inputPdf} cat $i output ${inputPdf%%.*}_${i}.pdf inkscape --without-gui "--file=${inputPdf%%.*}_${i}.pdf" "--export-plain-svg=${inputPdf%%.*}_${i}.svg" done 

To generate in png, use --export-png etc.

+2
Dec 06 '15 at 16:02
source share

I found that xfig did an excellent job:

 pstoedit -f fig foo.pdf foo.fig xfig foo.fig export to svg 

This is much better than inkscape. Actually, perhaps this was done by pdtoedit.

+1
Mar 14 '14 at 14:20
source share

You can use http://image.online-convert.com/convert-to-svg . It worked well in my experience.

+1
Aug 17 '15 at 20:09
source share

Here is an example NodeJS REST api for two PDF rendering scripts. https://github.com/pumppi/pdf2images

Scripts: pdf2svg and Imagemagicks convert

0
Apr 03 '16 at 8:22
source share



All Articles