root bin: pdf

Sunday, 6 January 2019

How to compress scanned PDF?

One problem with scanned pdfs is that the size is so bloated. One good solution to compress such files is this one script:

#!/bin/sh

gs -q -dNOPAUSE -dBATCH -dSAFER \
-sDEVICE=pdfwrite \
-dCompatibilityLevel=1.3 \
-dPDFSETTINGS=/screen \
-dEmbedAllFonts=true \
-dSubsetFonts=true \
-dColorImageDownsampleType=/Bicubic \
-dColorImageResolution=120 \
-dGrayImageDownsampleType=/Bicubic \
-dGrayImageResolution=72 \
-dMonoImageDownsampleType=/Bicubic \
-dMonoImageResolution=120 \
-sOutputFile=out.pdf \
$1

Here the compression rate can be changed by tweaking resolution values. I found the above gives a good compression without sacrificing the text quality.

Thursday, 16 July 2015

PDF join files and replace strings using pdftk

pdftk *.pdf cat output combined.pdf

===========

You can try to modify content of your PDF as follows

Uncompress the text streams of PDF

pdftk file.pdf output uncompressed.pdf uncompress

Use sed to replace your text with another

sed -e "s/ORIGINALSTRING/NEWSTRING/g" <uncompressed.pdf >modified.pdf

If this attempt was successful, re-compress the PDF with pdftk
```
pdftk modified.pdf output recompressed.pdf compress
```

Sources:
https://www.pdflabs.com/docs/pdftk-cli-examples/
http://stackoverflow.com/a/9872494/4151875