TL; DR: use \newpage and the Lua filter below to get page breaks in many formats.
Pandoc parses all inputs into the internal document format. This format does not have a special way of representing page breaks, but you can still encode information in other ways. One way is to use raw LaTeX \newpage . This works great when outputting LaTeX (or a PDF created through LaTeX). However, when working with various formats, such as HTML or docx, problems arise.
A simple solution when setting up other formats is to use the pandoc filter, which can transform the internal representation of the document to suit our needs. Pandoc 2.0 and later even allows you to use the included Lua interpreter to perform this conversion.
Suppose we specify page breaks by placing \newpage on a line surrounded by blank lines, for example:
lorem ipsum \newpage more text
\newpage will be parsed as RawBlock containing the source TeX. The block will only be included in the output file if the target format can contain the original TeX (i.e. LaTeX, Markdown, Org, etc.).
We can use a simple Lua filter to translate this when targeting to a different format. The following works for docx, LaTeX and easy labeling.
--- Return a block element causing a page break in the given format. local function newpage(format) if format == 'docx' then local pagebreak = '<w:p><w:r><w:br w:type="page"/></w:r></w:p>' return pandoc.RawBlock('openxml', pagebreak) elseif format:match 'html.*' then return pandoc.RawBlock('html', '<div style=""></div>') elseif format:match '(la)?tex' then return pandoc.RawBlock('tex', '\\newpage{}') elseif format:match 'epub' then local pagebreak = '<p style="page-break-after: always;"> </p>' return pandoc.RawBlock('html', pagebreak) else -- fall back to insert a form feed character return pandoc.Para{pandoc.Str '\f'} end end -- Filter function called on each RawBlock element. function RawBlock (el) -- check that the block is TeX or LaTeX and contains only \newpage or -- \newpage{} if el.format:match '(la)?tex' and content:match -- '\\newpage(%{%})?' then if el.text:match '\\newpage' then -- use format-specific pagebreak marker. FORMAT is set by pandoc to -- the targeted output format. return newpage(FORMAT) end -- otherwise, leave the block unchanged return nil end
tarleb Sep 01 '18 at 19:22 2018-09-01 19:22
source share