Why and how I migrated from Drupal to Jekyll

Written by
Date: 2011-08-13 16:30:00 00:00


Introduction

I started blogging about Linux, in 2007, it all started because I was writing almost daily in some Linux distribution lists, so I thought it could have made sense to put all my writings on line in a single place, so others can use them.

I started writing about my discoveries, how to tutorials, and other things, mainly as notes to myself and from time to time pointing other people from the forums to my posts, when they were looking for a solution I have already wrote about.

I started using Joomla, it was OK for some months, the I stumbled upon Tuxmachines, and discovered Drupal, so I immediately started using it and migrate from Joomla to Drupal, and the blog started to grow, both in content and visitors.

With the time, the site grew up, and I was getting 7.000 views per day, by that time, there was not chance to stop blogging, not even mention putting the blog off-line. In all that time I learned a lot about Drupal, I'm not expert in PHP, but I never needed it as Drupal is really easy to and powerful at the same time.

My blogging learning curve

In all these years I have learned a few things about blogging platforms, about web servers, and about scalability.

My blog started at a VMWare virtual machine in my office computer, where I've public IPs, but when this post hit Slashdot, and put my server on its knees, I realized that I needed to move it, so I did it, and it was moved to a Shared virtual hosting server.

Next time was Digg hitting this page, the one who put my site off-line, this time I moved it to a VPS server, now I was able to monitor which processes were the resource-hungry ones, and soon I discovered that MySQL was the culprit.

So, I learned about boost module, and I installed Squid as a reverse proxy in my server, then I switched to NGinx + Varnish with Apache+PHP+MySQL as the backend for the pages not cached by boost.

Too much maintenance

As you can see in order to keep a server ready to receive the next tsunami of visits, I had to do too much work keeping Apache, PHP, MySQL, Nginx, Varnish and Drupal up to date and secured, also being sure that all of them could work in sync.

I heard about Movable Type, and loved the idea of the static content, it can be compared to Drupal working together with boost module, but in the case of Movable Type if the backend application dies, you will not be able to post new articles but all your site will still be alive, while in the case of Drupal, if somehow the application stops working, after the expiration period your site will become off-line, or if someone wanted to open a page not yet cached, he won't be able to do it.

It seems that Movable Type could be a better option, but you can read all over the Internet stories about Movable Type dying, and stories about people switching from it to WordPress or Drupal, so I went back to Drupal, and never actually did the switch to Movable Type.

All the idea about serving static content have been in my mind for months, until one day when, while visiting my favorite site Hacker News, I read this post, and decided to give it a try, but not because I visit Hacker News it really means I'm a hacker, so could not make it work, but I read that day also about Jekyll, and decided to try it out, this time I was able to make it work, and started this site.

Now it was time to migrate Go2linux from Drupal to Jekyll.

Migrating from Drupal to Jekyll

The problems

Jekyll, provides excellent migrating tools to help you move your blog from almost every platform to Jekyll, and the script to migrate a Drupal site to Jekyll works great if you are using clean urls but not url alias.

So basically it will move a site with this structure:

http://www.site.com/node/23

to:

http://www.site.com/2009/10/23/title-of-posts.html

But that is not my case, and I'm sure that most of Drupal users do use the url_alias module, and also the path_auto module, another problem I have in all these years I have been modifying the url schema of my Drupal site, so I have these kind of urls.

  • http://www.go2linux.org/arch-linux-review
  • http://www.go2linux.org/blogs/2011/04/thinking-about-buying-and-ipad-1005
  • http://www.go2linux.org/linux/2011/05/move-window-when-title-bar-not-visible-over-sized-1019
  • http://www.go2linux.org/linux/2011/07/use-varnish-avoid-hot-linking-or-image-leeching-1123.html
  • http://www.go2linux.org/mt/linux-ht/2010/11/slackware-review-1.html

So as you can see there is not way to follow some kind of rule to recover my urls, as I have at least five models of them, so what I needed was a script that import the url from the Drupal database, actually from the url_alias table, and use it to create a permalink for every post imported from Drupal.

The solution

What I've done is to use the script available in Jekyll's site, and modify it to allow me import together with the posts, also the contents of url_alias and create with that the Yaml front matter.

I stepped there into a new problem, how to manage the relationship between a post and its url_alias, the url_alias table has this format:

+----------+------------------+------+-----+---------+----------------+
| Field    | Type             | Null | Key | Default | Extra          |
+----------+------------------+------+-----+---------+----------------+
| pid      | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| src      | varchar(128)     | NO   | MUL |         |                |
| dst      | varchar(128)     | NO   | MUL |         |                |
| language | varchar(12)      | NO   |     |         |                |
+----------+------------------+------+-----+---------+----------------+

And the node table has this structure:

+-----------+------------------+------+-----+---------+----------------+
| Field     | Type             | Null | Key | Default | Extra          |
+-----------+------------------+------+-----+---------+----------------+
| nid       | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| vid       | int(10) unsigned | NO   | UNI | 0       |                |
| type      | varchar(32)      | NO   | MUL |         |                |
| language  | varchar(12)      | NO   |     |         |                |
| title     | varchar(255)     | NO   | MUL |         |                |
| uid       | int(11)          | NO   | MUL | 0       |                |
| status    | int(11)          | NO   | MUL | 1       |                |
| created   | int(11)          | NO   | MUL | 0       |                |
| changed   | int(11)          | NO   | MUL | 0       |                |
| comment   | int(11)          | NO   |     | 0       |                |
| promote   | int(11)          | NO   | MUL | 0       |                |
| moderate  | int(11)          | NO   | MUL | 0       |                |
| sticky    | int(11)          | NO   |     | 0       |                |
| tnid      | int(10) unsigned | NO   | MUL | 0       |                |
| translate | int(11)          | NO   | MUL | 0       |                |
+-----------+------------------+------+-----+---------+----------------+

So, the value node.nid, is related with url_alias.src, and I modified the original script to work with that:

Here is the script if you want to use it:

Conclusion

By today both Go2linux and this site have been generated by Jekyll, and are served to you as static content, this it should be faster, another good thing is there are less possible points of failure, actually only the web server.

Because Jekyll only generates the html using your own layouts, you can have full control of how your site will look like, there is no need of plugins, or any other thing, if you want to add something like a Facebook fan "widget" you just add the code to the proper place in your layout, and you can make it match your CSS and the colors of your template.

One last thing, if you are concerned about the size of the site you can manage with Jekyll, I need to say you, that Go2Linux has over a thousand pages, and it is generated in some 10 seconds, in a modern laptop.