Exporting Content from Publii to Jeckyll
In this tutorial, we’ll export Publii data and put it into basic Markdown for use in Jeckyll.
About a year ago, I stopped using my home grown blog engine and switched to using Publii. I’ve been relatively happy with that tool but found limitations that blocked me:
- local install made it difficult using it on multiple machines
- lack of hooks to create my own writing tools, for example, on the go
- no support for mobile
- SFTP just didn’t work when I switched servers and required a ssh key.
I tried Grav but my server’s file permissions didn’t want to work with it. I considered going back to my own software but decided to try Jeckyll. Jeckyll is a flat CMS that uses markdown and renders it to a simple HTML site.
In my three days of playing with it, I really like it. It is very command line based like my old software but has a way better rendering engine than the one I wrote in Python (shocker!). Its clean output and support for templates makes expanding my site something I can do in the future painlessly. The problem was I had data from the last 18 months in Publii and I had to get it out. I started manually copying and pasting content but after two articles, it was too tedious so I wrote a Python script to extract the data from Publii for Jeckyll.
Here’s how I did it.
Step 1: Get the files
To get the data out, I went to Tools & Plugins->Backups->Create backup.
This created a .tar file that contained all of the files for the site. I found this .tar file on my Mac in the Publii file in my Documents folder. Once I found the file, I decompressed it and had a file structure that looked like this:
|-input
| |-config
| |-db.sqlite
| |-languages
| |-media
| |-root-files
| |-themes
|-output
|-preview
We’ll only use the input folder for the next steps.
Step 2: Get Text Data from the SQLite DB
I used a Mac tool called DB Browser for SQLite to view the data but ended up using Python to actually extract it. If you open the file db.sqlite in the Input folder, you can see the tables. The two I was concerned with were posts and posts_images. These two tables held the info I wanted.
I needed several Python packages to make this code work including codecs , sqlite3, and markdownify. I’m using Python3 and used pip to install whatever libraries I didn’t have.
I wrote all of this in a Python script called publiitomd.py that I executed from the command line.
In order to connect to the SQLite DB, I used this code:
connection_obj = sqlite3.connect('/Path/to/files/gregwalshcom-01-12-2023-10-35-57/input/db.sqlite')
cursor_obj = connection_obj.cursor()
Then I created a SELECT statement to get my text data including featured images, captions, and alt tags.
statement = '''SELECT posts.id, posts.title, posts.slug, posts.text, posts.featured_image_id, posts.created_at, posts_images.url, posts_images.additional_data FROM posts INNER JOIN posts_images ON posts.featured_image_id = posts_images.id'''
cursor_obj.execute(statement)
output = cursor_obj.fetchall()
Next, I needed to start a loop for each item being pulled from the database. In this loop, I set a bunch of variables, and begin to form the structure of the markdown file that I’ll move into Jeckyll. One of the hardest parts of this was realizing the timecode in Publii is milliseconds from the epoch, not just seconds. Here is the code that builds a set of variables that we will write to a file in just a minute. I chose to put the featured image, its captions and alt tags in the front matter to use later.
for row in output:
posttitle = row[1]
posttext = markdownify.markdownify(row[3], heading_style="ATX")
date = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(int(row[5])/1000))
shortdate= time.strftime('%Y-%m-%d', time.localtime(int(row[5])/1000))
filename = shortdate + "-" + row[2] + ".markdown"
featuredimage = row[6]
featuredcaption = json.loads(row[7])["caption"]
featuredalt = json.loads(row[7])["alt"]
featuredcredit = json.loads(row[7])["credits"]
headerText = """---
layout: post
title: \"""" + posttitle + """\"
date: """ + date + """
categories: blog
"""
if featuredimage != "":
headerText = headerText + "image: img/" + featuredimage + "\n"
if featuredcaption != "":
headerText = headerText + 'image_caption: "' + featuredcaption + '"\n'
if featuredalt != "":
headerText = headerText + 'image_alt: "' + featuredalt + '"\n'
if featuredcredit != "":
headerText = headerText + 'image_credit: "' + featuredcredit + '"\n'
headerText = headerText +"""---
"""
The “”” makes a block that keeps the line breaks exactly as written. There is a bunch of if-then statements that decide which lines of the front matter to write. If you have other needs for exporting Publii data, this seems the way to go. You may also notice that the caption, alt, and credit are all extracted from JSON…that’s because Publii stores those three in one field of the posts_images table.
Once I had all of that, I was able to start writing the file. I wanted to use UTF-8 so I used the codecs library to make it happen. I don’t know if I needed to. Here is where I open a new file as UTF-8 that conforms to the Jeckyll naming convention, write the front matter (header), write the body, and close the file to begin the loop again. I created a folder in the same location as the script called “md” that I write all the files to.
f = codecs.open("md/" + filename,'w','utf-8')
f.write(headerText)
f.write(posttext)
f.close()
And then I close it all out with this code that runs after the loop.
connection_obj.commit()
# Close the connection
connection_obj.close()
This game me a folder of properly named files in one folder I could easily drag over to the _posts folder in my Jeckyll site.
Step 3: Get Graphics
This was relatively tedious and had I had more than 20 something files to move, I would have written a script to do this for me. Instead, I went into input->media->posts->ID and dragged the full-size image from there into a folder named img in the top-level of my Jekyll site directory. You can use any folder, but, make sure your front matter above lines up with it.
If you have graphics in individual posts in Publii and you’re moving them to Jeckyll, duplicate your Publii generated site’s image folder into the Jeckyll top-level directory. (Or just do what I do and only absolute link to them on Flickr or wikimedia.)
Now that everything is in the Jeckyll site, we need to make our Jeckyll posts display that featured image.
Step 4: Show Featured Image in Jeckyll
This is only necessary if you have images and want them to display on your posts. You can use this to make amazing Bootstrap-esque sites but this just pops it up on the top of the page.
I needed to edit the top part of my post.html in _layouts page to look like this.
---
layout: default
---
<article class="post h-entry" itemscope itemtype="http://schema.org/BlogPosting">
<header class="post-header"><h1 class="post-title p-name" itemprop="name headline">Exporting Content from Publii to Jeckyll</h1>
That is a conditional to see if the page has a featured image in the front matter and display it with its ALT tag if it does (You did use ALT tags, right?)
This then displays the image at the top of the page.
Conclusion
This code is really clunky and I’m sure there was a better way to do. I used code from that original blog engine as the basis but the logic was sound. As mentioned, if I had more than 20 posts to export from Publii, I would have automated a little more of the media movement as well. I’m now free of Publii and look forward to what I can do with Jeckyll.