Convert html to pdf with Linux
Written by Guillermo Garron
Date: 2010-10-07 10:36:30 00:00
When you may need to convert a complete web page in html to a pdf file, Linux can help you.
We will need two tools:
- wget - To download the complete page, including css, and others
- wkhtmltopdf - To make the real conversion from html to pdf
You should be able to install both of them using your package manager.
To be able to convert the html to pdf, we will follow a two stage process.
First step, download the web page in html
To do that enter this command:
wget -p [url to download]
Example:
I will first create a folder to store the page, so.
mkdir /tmp/download-folder
Then download the web page:
cd /tmp/download-folder
wget -p http://www.go2linux.org/mt/linux-ht/2010/10/new-branch-on-debian-1.html
That will create a structure like this:
/tmp/download-folder/www.go2linux.org/mt/linux-ht/2010/10
There you will find the file, new-branch-on-debian-1.html
Second step, convert the html file to pdf file
Enter into the folder where the html file is.
cd /tmp/download-folder/www.go2linux.org/mt/linux-ht/2010/10
Convert the file.
Using this format
wkhtmltopdf [html file] [pdf file]
wkhtmltopdf new-branch-on-debian-1.html new-branch-on-debian-1.pdf
That is it, you now have converted a complete html file including format, css, etc. to a pdf file, that you can send by email, archive, or anything you want.
Note: If the page you are downloading does not have .html extension you may get errors, to solve that, just mv (rename) the file to have an .html extension. Now a days, most of the pages does not have .html extensions.