I had hosted this blog on a shared hosting for almost a decade using WordPress. After experiencing continuous issues with my hosting provider (very old PHP version, thus being unable to update WordPress easily, and other technical problems), I have finally decided to migrate 10 years of content to wordpress.com. I couldn’t have guessed Scala programming language would help me in this process.
First things first: Fortunately, the old version of WordPress software I used had the export functionality which let me export all of the content into a single, self-contained XML file (ah, the joys of data portability! :)). Next step was to import this file using the administrative section of http://ileriseviye.wordpress.com, and everything was completed in just a few minutes. Next step was enabling the redirection for only the blog related part of my personal website. A simple
.htaccess file that used Apache‘s mod_rewrite functionality handled this very well.
I did an ad-hoc testing first by visiting the new address, and then going to the old address to see that it redirected properly. But when I went deeper, I have realized a subtle issue with some of the blog entries: They did not show the source code snippets that was stored at and dynamically pulled from https://gist.github.com/. When I analyzed the problem a little, I have seen that it had to do with the way I have inserted them (using a .js file from GitHub). Apparently, the new ileriseviye.wordpress.com wanted those ‘gist’s to be given in a slightly different (and simpler format). The problem statement became: “search the old content for gists, see which blog entries have them, along with the gists inside them, so that those entries can be edited and the gist addresses can be placed using the new format“. Scala was the first language that came to my mind for quick and dirty XML processing, and I have decided to use it to search for the relevant data in the imported XML file (whose size was 6.3 MB). Doing a few quick trials at the Scala REPL (console), I have arrived at the following very short Scala program:
which resulted (in about 2 seconds on my old ThinkPad laptop) in the following:
This output was more than enough for me to simply visit the blog entries, and insert the gists without .js endings. I could not automate the final editing part, because I had to manually read the contents of each blog entry to see where the missing gist URL should be inserted, but since the number of those blog entries was not big that was not a show-stopper. And I was more than happy because I did not have to do text editor or grep based searches, back and forth eye scans on a 6.3MB XML file. What I have also realized is that, I’m reaching for Scala more and more these days for these kind of text processing tasks, unlike the old days where my tool of choice would be usually Perl or Python. Scala’s interactive console (its REPL), as well as its concise syntax with DSLs such as the one that allows for very convenient XML processing make it an ideal candidate more and more frequently for me.
PS: Do not hesitate to comment on the Scala gist above and offer more concise (yet readable) or more performant versions