PDF files
Manipulate PDF document and images
PDFTK for documents
pdftk is a tool in the command line that allows you to manipulate pdf documents, such as concatenating documents or extracting pages.
General syntax
pdftk INPUT_FILES OPERATION output OUTPUT_FILES
where OPERATION corresponds to the desired manipulation on the INPUT_FILES.
Concatenation
This can be done with cat:
pdftk doc_1.pdf doc_2.pdf doc_3.pdf cat output doc_123.pdfpdftk *.pdf cat output all.pdf
Extraction
This can also be done with cat:
- only page 1:
pdftk doc.pdf cat 1 output doc_page_1.pdf - a range of pages:
pdftk doc.pdf cat 2-5 output doc_page_2to5.pdf - a combination:
pdftk doc.pdf cat 1-10 12 15 output doc_page_1to10_12_15.pdf - from multiple documents:
pdftk A=doc_1.pdf B=doc_2.pdf cat A1-10 B1-3 output doc_1_page_1to10_doc_2_page_1to3.pdf
PDFCROP for images
pdfcrop is also a tool in the command line, to remove margins in images saved as pdf files. A concrete example is embedding figures in LaTeX or Beamer from pdf files. Figures in pdf has the advantage of being vectorized, meaning that resizing them will not impact their quality.
- automatically remove margins:
pdfcrop INPUT.pdf - if parts are cropped/removed or margins remains:
pdfcrop --hires --resolution WIDTHxHEIGHT INPUT.pdf
use a high resolution like--resolution 1000x1000if errors remains. - manually set margins:
pdfcrop --margins 'LEFT TOP RIGHT BOTTOM' --clip INPUT.pdf
you can set positive or negative values to respectively increase or decrease the margins. You can also set a single value for all margins--margins VALUE.
This will process and create a new file INPUT-crop.pdf at the same location.
Convert SVG to PDF
Similarly to PDF, SVG figures are vectorized. However, embedding SVG into LaTeX or beamer can be tedious. A simple workaround is to convert the SVG figure to PDF, as there are, fortunately, a number of tools to do this from the command line:
- ImageMagick:
sudo apt install imagemagick convert INPUT.svg OUTPUT.pdf - rsvg-convert:
sudo apt install librsvg2-bin rsvg-convert -f pdf -o OUTPUT.pdf INPUT.svg - Inkscape:
sudo apt install inkscape inkscape INPUT.svg --export-pdf=OUTPUT.pdf
Each of these tools install their own dependencies, but from my limited usage inkscape seemed to be the most convenient solution.
Generate pdf documents using Markdown and pandoc
Basic PDF Generation
Convert a Markdown file (input.md) to PDF:
pandoc input.md -o output.pdf
Pandoc uses LaTeX under the hood to generate PDFs.
Specify a Template
Pandoc uses a default LaTeX template. To customize it:
pandoc input.md -o output.pdf --template=my-template.tex
- Get the default template:
pandoc -D latex > default-template.tex - Modify it (e.g., change fonts, margins, headers).
Set Document Metadata
Use a YAML front matter block in your Markdown:
---
title: "My Document"
author: "Your Name"
date: "2023-10-01"
---
Or pass metadata via CLI:
pandoc input.md -o output.pdf -M title="My Document" -M author="Your Name"
Choose a PDF Engine
Pandoc defaults to pdflatex. For better Unicode/font support, use:
pandoc input.md -o output.pdf --pdf-engine=xelatex
Other engines: lualatex, wkhtmltopdf, weasyprint.
Adjust Page Layout
Use --variable (-V) to set LaTeX options:
pandoc input.md -o output.pdf -V geometry:margin=1in
Common variables:
fontsize=12pt(default: 10pt)documentclass=report(default:article)papersize=a4(default:letter)
Table of Contents
pandoc input.md -o output.pdf --toc --toc-depth=2
Numbered Sections
pandoc input.md -o output.pdf --number-sections
Include Raw LaTeX
Add LaTeX commands directly in Markdown:
\begin{center}
This text is centered.
\end{center}
Bibliography & Citations
Use --citeproc for citations:
pandoc input.md --bibliography=refs.bib -o output.pdf --citeproc
Example citation in Markdown:
Blah blah [@smith2020].
