What follows is some of what I will cover in my upcoming talk at PDF Association conference in Seattle in June.
From 1000 feet, here are your three main alternatives:
- The first is to create the PDF directly, using pdfkit, jsPDF, or the higher level pdfmake. Pdfkit is like iText in the Java world. Pdfmake, based on pdfkit, has its own format for representing rich text; it converts this to PDF.
- The second is to create HTML, then convert that to PDF. These days probably using puppeteer.
- The third is to create a docx, then convert that to PDF.
Put another way, you can either create the PDF directly, or use HTML or docx as an intermediate format.
For one thing, often the content will already be in Word document format, making your job easy.
More importantly, its worth thinking up front about ongoing maintenance (changes to content and formatting). Is that something that you as a developer want to be doing, or is it better to enable the business to do this themselves? If its a Word document, then business users can update the document without troubling you.
With docx.js you can programmatically build up your Word document (much like pdfkit and jsPDF allow you to build up a PDF). But this probably isn’t a great idea, because for the final PDF to come out looking right, any feature you care to use has to be supported in both the create-docx and docx-to-pdf steps. For example, merged cells in a table, or adding a watermark.
What we want is an easy way to create a docx, and then the confidence that our docx will be converted cleanly to PDF.
For this, a “templating” approach is the answer: basically, you create a docx template with your wanted layout – in Microsoft Word, LibreOffice, Google Docs, Native Documents or whatever – then use the template engine to replace “variables”.
Step 1: populate docx template
Here we’ll use docxtemplater, in node.js.
Say you want a PDF invoice. Since part of the point of using a Word template is that it is easy for business users to make it pretty, let’s start with one of the invoice templates designed by Microsoft and included in Word.
You can see I’ve added some variables (represented with curly braces, as required by docxtemplater).
You can click the image to see the docx in our Word File Editor. Click invoice-template.docx to download/use it with the code which follows.
Notice the Items array. The table row repeats for each of the Items. You can see docxtemplater’s markup for a repeat/loop at the start and end of that table row.
To try it, install docxtemplater as per its instructions:
npm install docxtemplater
npm install jszip@2
Then its just:
And you get a populated invoice instance:
Notice the table row has been repeated, and all variables replaced.
If you run the code yourself, you can verify the results by opening invoice-instance.docx in your favourite docx editor, or in ours: click here then drag/drop your docx.
Step 2: convert the docx to PDF
So far so good. Now we just need to convert the populated invoice instance to PDF.
For that, we’ll use docx-wasm, a node module we at Native Documents released earlier this year. Our bread and butter at Native Documents is the web-based document editing/viewing component we used above to display invoice-template.docx, and this node module generates PDF output using that Word compatible page layout code. Put another way, the page layout reproduces what Word does so closely that it can also be used for high quality PDF output.
First, install it:
npm install @nativedocuments/docx-wasm
Converting the docx in the node.js buffer object to PDF is then just:
You’ll need a ND_DEV_ID, ND_DEV_SECRET pair to use this module. You can get free-tier keys at https://developers.nativedocuments.com/
Copy these into the docx.init call (or alternatively, you can set these as environment vars).
I haven’t posted the PDF here, since it just looks the same as the invoice-instance docx.
Putting it all together
To try it, download invoice-template.docx then:
A nice way to run this is on AWS Lambda. With Lambda, you get easy scalability, and you aren’t paying for servers when you aren’t using them. More on this in my upcoming talk at PDF Association conference in Seattle in June! In the meantime, docx-to-pdf-on-AWS-Lambda shows you how to do the docx to PDF part on Lambda. Adding the docx templating piece is straightforward.
Its also now possible to convert docx to PDF client-side, in-browser, reducing server loads, and opening the way to offline operation. docx-wasm-client-side shows you how to do the docx to PDF part client-side.