Convert html to pdf with Linux

Written by
Date: 2010-10-07 10:36:30 00:00


When you may need to convert a complete web page in html to a pdf file, Linux can help you.

We will need two tools:

  • wget - To download the complete page, including css, and others
  • wkhtmltopdf - To make the real conversion from html to pdf

You should be able to install both of them using your package manager.

To be able to convert the html to pdf, we will follow a two stage process.

First step, download the web page in html

To do that enter this command:

wget -p [url to download]

Example:

I will first create a folder to store the page, so.

mkdir /tmp/download-folder

Then download the web page:

cd /tmp/download-folder

wget -p http://www.go2linux.org/mt/linux-ht/2010/10/new-branch-on-debian-1.html

That will create a structure like this:

/tmp/download-folder/www.go2linux.org/mt/linux-ht/2010/10

There you will find the file, new-branch-on-debian-1.html

Second step, convert the html file to pdf file

Enter into the folder where the html file is.

cd /tmp/download-folder/www.go2linux.org/mt/linux-ht/2010/10

Convert the file.

Using this format

wkhtmltopdf [html file] [pdf file]

wkhtmltopdf new-branch-on-debian-1.html new-branch-on-debian-1.pdf

That is it, you now have converted a complete html file including format, css, etc. to a pdf file, that you can send by email, archive, or anything you want.

Note: If the page you are downloading does not have .html extension you may get errors, to solve that, just mv (rename) the file to have an .html extension. Now a days, most of the pages does not have .html extensions.