Migrating BlogEngine.NET Posts to WordPress with Translations Using Microsoft Access

Migrating Our Site from BlogEngine.NET

We could have started our new blog from scratch but since our existing blog existed for many years, we wanted to migrate it with all the comments from our BlogEngine.NET host to WordPress. That turned out to be a tricky process but we managed to do so. To help others who might be facing the same situation, here are the steps we followed so you don’t have to make the same mistakes we did:

Prepare the Existing Blogs for the Migration

The first step is to make sure your existing BlogEngine.NET blog is working properly and ready for export. One of the tricky and time-consuming parts of this is the reference to graphic files. BlogEngine stores its embedded graphics in its own structure using syntax similar to this (our blog was in the BLOG folder):

src=”/blog/image.axd?picture=banner.jpg”

Note that this only impacts graphics that were uploaded into BlogEngine. If you referenced images that already existing on your website, those references are fine and do not need to be modified.

To fix the image.axd? references and eliminate future dependencies, it’s best to store these graphics in your website explicitly. Once you save the graphic files, you can update your blogs to reference them. Saving the individual pictures is a manual process and you’ll need to decide where to store them on your website. You can then manually update the affected blog topics. Alternatively, you can do a search and replace later after exporting the blog’s XML file. We did a combination of both.

Export the existing BlogEngine.NET data to an XML file

Export the existing BlogEngine.NET data to an XML file. This is available as the last option under Settings from BlogEngine. The default name is BlogML.xml

Unfortunately, even if you fixed the picture image references, you’ll still need to translate the file to a format that WordPress can import. That requires making many changes. We actually exported the XML file, then parsed it to find the references to

src=”/blog/image.axd?

to identify any image references that were still in BlogEngine. That gave us the choice to either fix the original blog and re-export, or to fix it directly in the XML file.

Preparing WordPress

The WordPress import tools are under Tools, Import. To import the BlogML file, you need to install the appropriate WordPress PlugIn. The BlogML plugin that worked for us was BlogML-WordPress-Import.zip which can be found here. You’ll need administrator write rights to your WordPress folders to install this.

Before you modify the BlogML file, you may want to import it to see the problems that need to be addressed in WordPress. You can do so and trash them in WordPress without any harm.

Using Permalinks with Post Names

By default, WordPress saves and displays its posts by ID number in the URL. If you want posts to have more meaningful names which also helps with SEO, you should set the preference under Settings, Permalinks, and choose Post. We set this but the pages triggered a 404, Missing File problem.

We discovered that this translation didn’t work on our WordPress host (Windows using IIS) unless we added a web.config file in the root of the blog with this information:

<?xml version="1.0"?>
<configuration>
  <system.webServer>
    <rewrite>
      <rules>
        <rule name="Main Rule" stopProcessing="true">
          url=".*" />
          <conditions logicalGrouping="MatchAll">
            <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
            <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
          </conditions>
          <action type="Rewrite" url="index.php/{R:0}" />
        </rule>
      </rules>
    </rewrite>
  </system.webServer>
</configuration>

Translating the BlogXML file with Microsoft Access

Now that we established the foundation to import the XML file and display the posts with the proper Permalinks, we could see several things still need to be fixed. It was relative easy to do with multiple search and replace terms. We did this in Microsoft Access:

  • Create a table with two text fields. One for the Original value and one for the New value to replace it. We then populated the table with the terms to translate:
    • Hyperlink references. Since we migrated our blog from a subfolder (www.fmsinc.com/blog) to its own subdomain (blog.fmsinc.com), we needed to modify all the hyperlink references that were pointing to our web pages to explicitly point to our www.fmsinc.com web site. That meant, we needed to adjust our href=”/ syntax to “href=”http://www.fmsinc.com/”, so we added these two values to our table.
    • Existing Image references. Similarly, we needed to adjust our image src=”/ references for graphic files to “src=”http://www.fmsinc.com/”, so they were added. Note, we didn’t search for “img src”  because many references included style settings between the “img” and “src”. 
    • New Image references. This is also the time to add any explicit image.axd? references to the new location of the graphics if you didn’t want to manually edit the original posts.
  • Total Visual SourceBookBlogEngine saves category names as GUIDs and references the GUIDs in each post. If you don’t translate these, they’ll be imported into WordPress with the GUID rather than readable category name. We used the CXMLSettings class from Total Visual SourceBook to read the categories section of the XML file so we could pair the GUID and category names.
  • Perform the Search and Replace
    Once the table contains all the terms to translate, we wrote a simple routine to read the XML file into a variable, then go through the table and use the VBA REPLACE function for each record. When we were finished, we wrote the text to a new XML file for WordPress to import.
  • From WordPress, import the new file using the BlogML import plugin.

Because we programmatically perform the translation process, it was easy to test, run, and refined the entire process when things didn’t work correctly. It took us a few iterations but we were pleasantly surprised how well the posts came across.

We found that we needed to manually touch up some of our posts. The HTML in WordPress doesn’t require the use of paragraph styles (<p> </p>) to define each paragraph and automatically strips them out. Unfortunately, it displays the line breaks in paragraphs which is normally ignored in HTML syntax. We had to manually edit and delete those so the posts properly word-wrapped.