I’ve always wondered how to easily create PDFs, haven’t you? Sure you have, right?
What I mean by “easily” is how to do it quickly and efficiently with code, which is to say that I don’t mean doing it via point-n-click. I mean by coding stuff up. If this puts you off, then exit right now, because this stuff assumes familiarity with computer science and programming.
I am writing this post because I can’t find anywhere online that combines all the conceptual overhead needed to do a project that makes a bunch of dynamically-generated PDFs, like the one I just completed does. (Making school reports for the end-of-semester grades and comments.)
Wouldn’t you know it, but you can make pixel-perfect PDFs if you already know the web stack: HTML/CSS. There is a command line tool called wkhtmltopdf.org that will take that stuff and churn out your pdf file, which just uses webkit to render it all, and there are plenty of wrappers around that for your favorite programming environment. For me, that’s python, which means pdfkit.
Using wrappers at first can be really perplexing, because there usually isn’t too much in the way of documentation. But there’s a reason for that, because the docs to be consulted should be of the thing being wrapped, not the pretty wrapping itself.
I always use Linux for my programming environment, using git, vagrant and virtual box, and never my actual Mac hosting system. So installing pdfkit/wkhtmltopdf took a bit of research, since installing it with a package manager like apt-get only gets you a crippled version of the software.
Crippled software? Boo. Triple boo, skip the second boo.
So here’s how to install the Real Thing:
- Download the right one for your linux distro from the downloads page.
- Install that, but instead of using apt-get package manager, use dpkg as shown here.
- Now you should have the full version of wkhtmltopdf, which gives you features like headers and footers, and you can install pdfkit for your programming environment as usual. (Python: “pip install pdfkit”).
Now, you just have to format things using HTML and CSS, but here’s a key tip to make the dev process much faster:
- Use “media=screen” and “media=print” to define two different CSS source files. I like this because that way you get a browser-friendly and PDF-friendly version in one go.
- Debug the print version in your browser by using the emulation mode and selecting the print media. For Chrome, instructions are here.
- In your code, be sure to tell wkhtmltopdf that the print media should be used to render, with the ‘print-media-type’ option. In Python, that means having a key with the name ‘print-media-type’ (an empty string for the value is fine) in the options dictionary.
This is enough to get started, but if you are going to be making a whole bunch of them based on any set patterns, then you’ll need a way to get them generated quickly. That’s where a web app comes in handy. I like pyramid best.
The idea is that you define a GET request with your web app so that it spits out different HTML depending on whatever data. This is the basic process:
- Have your database contain the data you need, either a fancy SQL one or maybe just a csv text file. Whatever.
- Make the GET request contain some key or other information that looks up the unique data in the above database, and then
- Send that data to your template system of choice.
- The template contains all your HTML/CSS and provides the additional ability to produce certain tags depending on the content. I use Chameleon just because it makes the most sense to my brain.
Okay, so conceptually, we know we can build the data needed to make the PDF, but then how to actually tie in that stuff with wkhtmltopdf? I did it this way:
- The GET request doesn’t just spit stuff back to the browser, it actually saves it to disk first, and returns that saved file as the response.
- Write a simple program that loops through all the possible unique urls that will generate the necessary PDFs.
- Collect your PDFs wherever you saved those files.
In the end, I ended up being able to format, and re-format to no end, information as complex as student data and their reports, complete with nice-looking tables that explained their grades, school information, per-page headers and footers, the works.