Wednesday, December 14, 2011

Converting Image Files to PDFs

Recently I have been processing some image files and needed a way to combine them into a PDF file.  Image files by themselves can get a little bit messy, so it just tidies thing up a little by collating related images.  Don't get me wrong, I still hang on to the originals but PDFs are nicer to actually use.

So here is the situation, I have a folder that contains all the images named sequentially.  The images then need to be converted to PDFs and then combined into one file.  Additionally I want a grayscale version and a colour version.  As usual, the easiest way to do this is with a script in linux.  If you haven't got them already, you will need to run the following commands to install packages for this to work.

sudo apt-get install imagemagick
sudo apt-get install pdftk

Imagemagick is an awesome tool for processing images, particularly when processing batches of them, and Pdftk is a tool kit that allows you to rearrange, remove, and add pages to to pdf files.  With these two tools the following script does the job nicely.

#!/bin/bash

mkdir Col
mkdir BW
rm -rf ./Col/*
rm -rf ./BW/*

ls *.png | sed -e "s/.png$//" | xargs -r -I FILE \
convert FILE.png -density 300x300 -compress jpeg \
-quality 60 ./Col/FILE.pdf
 
ls *.png | sed -e "s/.png$//" | xargs -r -I FILE \
convert FILE.png -colorspace gray -density 300x300 \
-compress jpeg -quality 60 ./BW/FILE.pdf

pdftk ./BW/*.pdf cat output BW.pdf
pdftk ./Col/*.pdf cat output Col.pdf

rm -rf ./Col/*
rm -rf ./BW/*

rmdir ./Col
rmdir ./BW

The script is run from the directory that contains the images.  It first creates two directories, one for the colour PDFs and one for the GrayScale PDFs.  Any files that may exist in these directories are deleted.  Next the images are converted to PDFs by the ImageMagick convert command.  All the png files from the directory are found via the ls command and piped to sed where the file extension is removed.  Xargs is then used to run the convert command.  Options are used to control the output, density sets the viewing DPI, compress set the compression method, and quality set the JPEG compression quality.  Colorspace is used in the second command to set the output to grayscale.

Next pdftk is used to combine the PDF's that were just created into one file.  The order of the pages in the PDF is based on the alphabetical order of the input files.

The intermediary files and folders are then deleted.  Job done, and I could leave the computer unattended for most of the time as well.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.