The forum is the most important place for discussion and collaboration among BotEngine users and contributors. To improve its user interface and simplify operation, I migrated the forum to a new software platform. In this project, I explored a solution to import content into the Discourse discussion platform and implemented a tool to automate the import process.
When I initially set up the forum in 2015, I chose MVCForum as the system to run it on. While MVCForum worked as expected, over time Discourse became an increasingly interesting alternative. Eventually, the better UI and notification features, along with the usual benefits of using a mainstream solution provided enough value addition to warrant a switch from MVCForum to Discourse.
While introducing a different software, I wanted to make sure that users can continue their conversations without interruption and find things where they left them.
To arrange for a seamless transition, I planned to import the existing content into Discourse.
Since Discourse generates content URLs in a different way than MVCForum, I also needed to setup corresponding redirects from old URLs to avoid breaking users bookmarks or other links.
Trying to figure out how to approach the technical side of such a migration, I found a guide on Migrating to Discourse from another Forum software, mainly written by Discourse team members. While the title suggests a good fit for the task, the process described there seems overly complex, including setting up a development environment, converting the data to migrate to a MySQL database and reviewing and modifying ruby code.
In that same article, I also found a hint at how to avoid this complexity:
[...] Then you will be able to create a backup and import it on your production instance.
The sentence quoted above says that at the end of the migration process, after work in the specifically set up development environment is done, the result is transferred from said environment to the live environment. This can be done conveniently through the Discourse web UI, no programming needed for that.
Looking at the Backup and Restore functions, it appeared that data is transferred in the form of a PostgreSQL database dump, a simple text file containing the forums posts, topics, users, etc., separated by line breaks.
To save most of the effort necessary to follow the guide mentioned above, I took a shortcut and just merged the data to import into this database dump.
This database also models a list of so-called permalinks, used to redirect a client from a specified URL to another URL or a piece of content. Via the permalinks feature, the redirects from old URLs mentioned earlier can also be implemented as part of the database merge.
To automate the merge, I created a tool which reads from two files, Discourse database dump and content to be imported and writes the result to a new file which can be imported to Discourse using the Restore function. Documentation of the tool and it's source code can be found at https://github.com/Viir/import-to-discourse/
Having no previous experience with these database dumps, my first attempt to implement the merge was just adding new records below the existing ones. Discourse reported successful restore from database dumps fabricated this way, but a problem became apparent soon. When adding a new post to the database, the system assigns it a unique identifier, picked from a sequence of integers. After having restored the database from the naïvely aggregated dump, this did not work anymore, as the system then tried to use an identifier already occupied by an imported post for a new record.
This happens because the database dump from Discourse explicitly specifies with which identifier the sequence should continue when adding new records. To avoid collisions between IDs of imported and future records, I expanded the import tool to adjust the sequence manipulation statements in the database dump as well.
Besides the surprise with the ID collisions, the import process was straightforward and produced the expected results.
The Discourse instance went live to replace the old software on 10th December.