How does Scala help when migrating from a self-hosted WordPress blog to

09 Feb

I had hosted this blog on a shared hosting for almost a decade using WordPress. After experiencing continuous issues with my hosting provider (very old PHP version, thus being unable to update WordPress easily, and other technical problems), I have finally decided to migrate 10 years of content to I couldn’t have guessed Scala programming language would help me in this process.

First things first: Fortunately, the old version of WordPress software I used had the export functionality which let me export all of the content into a single, self-contained XML file (ah, the joys of data portability! :)). Next step was to import this file using the administrative section of, and everything was completed in just a few minutes. Next step was enabling the redirection for only the blog related part of my personal website. A simple .htaccess file that used Apache‘s mod_rewrite functionality handled this very well.

I did an ad-hoc testing first by visiting the new address, and then going to the old address to see that it redirected properly. But  when I went deeper, I have realized a subtle issue with some of the blog entries: They did not show the source code snippets that was stored at and dynamically pulled from When I analyzed the problem a little, I have seen that it had to do with the way I have inserted them (using a .js file from GitHub). Apparently, the new wanted those ‘gist’s to be given in a slightly different (and simpler format). The problem statement became: “search the old content for gists, see which blog entries have them, along with the gists inside them, so that those entries can be edited and the gist addresses can be placed using the new format“. Scala was the first language that came to my mind for quick and dirty XML processing, and I have decided to use it to search for the relevant data in the imported XML file (whose size was 6.3 MB). Doing a few quick trials at the Scala REPL (console), I have arrived at the following very short Scala program:

which resulted (in about 2 seconds on my old ThinkPad laptop) in the following: //”></script> // // //”> //”></scr //”></scr // // //”></s //

This output was more than enough for me to simply visit the blog entries, and insert the gists without .js endings. I could not automate the final editing part, because I had to manually read the contents of each blog entry to see where the missing gist URL should be inserted, but since the number of those blog entries was not big that was not a show-stopper. And I was more than happy because I did not have to do text editor or grep based searches, back and forth eye scans on a 6.3MB XML file. What I have also realized is that, I’m reaching for Scala more and more these days for these kind of text processing tasks, unlike the old days where my tool of choice would be usually Perl or Python. Scala’s interactive console (its REPL), as well as its concise syntax with DSLs such as the one that allows for very convenient XML processing make it an ideal candidate more and more frequently for me.

PS: Do not hesitate to comment on the Scala gist above and offer more concise (yet readable) or more performant versions ;-)

Leave a comment

Posted by on February 9, 2013 in Programlama


Tags: , , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 64 other followers

%d bloggers like this: