PDF files

Manipulate PDF document and images

PDFTK for documents

pdftk is a tool in the command line that allows you to manipulate pdf documents, such as concatenating documents or extracting pages.

General syntax

pdftk INPUT_FILES OPERATION output OUTPUT_FILES

where OPERATION corresponds to the desired manipulation on the INPUT_FILES.

Concatenation

This can be done with cat:

  • pdftk doc_1.pdf doc_2.pdf doc_3.pdf cat output doc_123.pdf
  • pdftk *.pdf cat output all.pdf

Extraction

This can also be done with cat:

  • only page 1: pdftk doc.pdf cat 1 output doc_page_1.pdf
  • a range of pages: pdftk doc.pdf cat 2-5 output doc_page_2to5.pdf
  • a combination: pdftk doc.pdf cat 1-10 12 15 output doc_page_1to10_12_15.pdf
  • from multiple documents: pdftk A=doc_1.pdf B=doc_2.pdf cat A1-10 B1-3 output doc_1_page_1to10_doc_2_page_1to3.pdf

PDFCROP for images

pdfcrop is also a tool in the command line, to remove margins in images saved as pdf files. A concrete example is embedding figures in LaTeX or Beamer from pdf files. Figures in pdf has the advantage of being vectorized, meaning that resizing them will not impact their quality.

  • automatically remove margins: pdfcrop INPUT.pdf
  • if parts are cropped/removed or margins remains: pdfcrop --hires --resolution WIDTHxHEIGHT INPUT.pdf
    use a high resolution like --resolution 1000x1000 if errors remains.
  • manually set margins: pdfcrop --margins 'LEFT TOP RIGHT BOTTOM' --clip INPUT.pdf
    you can set positive or negative values to respectively increase or decrease the margins. You can also set a single value for all margins --margins VALUE.

This will process and create a new file INPUT-crop.pdf at the same location.

Convert SVG to PDF

Similarly to PDF, SVG figures are vectorized. However, embedding SVG into LaTeX or beamer can be tedious. A simple workaround is to convert the SVG figure to PDF, as there are, fortunately, a number of tools to do this from the command line:

  • ImageMagick:
    sudo apt install imagemagick
    convert INPUT.svg OUTPUT.pdf
    
  • rsvg-convert:
    sudo apt install librsvg2-bin
    rsvg-convert -f pdf -o OUTPUT.pdf INPUT.svg
    
  • Inkscape:
    sudo apt install inkscape
    inkscape INPUT.svg --export-pdf=OUTPUT.pdf
    

Each of these tools install their own dependencies, but from my limited usage inkscape seemed to be the most convenient solution.

Generate pdf documents using Markdown and pandoc

Basic PDF Generation

Convert a Markdown file (input.md) to PDF:

pandoc input.md -o output.pdf

Pandoc uses LaTeX under the hood to generate PDFs.

Specify a Template

Pandoc uses a default LaTeX template. To customize it:

pandoc input.md -o output.pdf --template=my-template.tex
  • Get the default template:
    pandoc -D latex > default-template.tex
    
  • Modify it (e.g., change fonts, margins, headers).

Set Document Metadata

Use a YAML front matter block in your Markdown:

---
title: "My Document"
author: "Your Name"
date: "2023-10-01"
---

Or pass metadata via CLI:

pandoc input.md -o output.pdf -M title="My Document" -M author="Your Name"

Choose a PDF Engine

Pandoc defaults to pdflatex. For better Unicode/font support, use:

pandoc input.md -o output.pdf --pdf-engine=xelatex

Other engines: lualatex, wkhtmltopdf, weasyprint.

Adjust Page Layout

Use --variable (-V) to set LaTeX options:

pandoc input.md -o output.pdf -V geometry:margin=1in

Common variables:

  • fontsize=12pt (default: 10pt)
  • documentclass=report (default: article)
  • papersize=a4 (default: letter)

Table of Contents

pandoc input.md -o output.pdf --toc --toc-depth=2

Numbered Sections

pandoc input.md -o output.pdf --number-sections

Include Raw LaTeX

Add LaTeX commands directly in Markdown:

\begin{center}
This text is centered.
\end{center}

Bibliography & Citations

Use --citeproc for citations:

pandoc input.md --bibliography=refs.bib -o output.pdf --citeproc

Example citation in Markdown:

Blah blah [@smith2020].

More Resources