Ok, so this was a rather long, drawn out, painful process. I actually blew one entire weekend getting this working. SubText itself is good, Wordpress too. Converting one’s data to the other kind of sucked. Nah, it really sucked. I think I started this about a month ago but I'd give myself over 2 days of time getting this to work correctly.
So I managed to get the conversion to work and I bet I pissed off a few of my friends while I was at it due to trackbacks. So here is my disjointed story on doing it.
Here is a little background on the two blog services. Wordpress is a well known, PHP / MySQL based blog engine. SubText is a c# / ASP.Net / SQL Server based blog engine. This is literally a square peg in a round hole to do the conversion.
A few of Clint's rules:
Rule 13492: Never destroy what you have until you get everything working 100%
Rule 29695: Don't leave until everything old works on what is new
Rule 91683: Be sure the new system works before destroying the old system.
Even right now, I actually still have the instance of Wordpress running in the background JUST IN CASE. This move has lots of moving parts and I actually had it fail on me after I did 2 successful runs on my testing domain. Scary, no?
The Data Conversion
BlogML got me about 90% of the way there but as I outlined, I ran into a little formatting problem. Wordpress does post processing on the text and doesn't keep all the HTML in the database per post. DasBlog and SubText doesn't like this idea. So I had to reverse engineer how Wordpress does this. Reading PHP to me is a bit of a pain in the ass since I lack a proper IDE and in my opinion, it seemed crazy with what they were doing. I like to think PHP is to Perl. Only the person who wrote it can read it.
I modified BlogML’s Wordpress export PHP script and added in wpautop function to the output to get my posts to format correctly. This took a bit to track down but it worked. There were more issues I’ll outline later on. Getting the BlogML data into SubText is easy. You just say "Create a new blog" after you "installed" the files and then there is an option to import data. This is different than how you'd do it with DasBlog. It confused me since I tried DasBlog first so I figured I'd mention it.
Keeping with backwards compatibility
All my links have ?p or ?page_id that enter me. This means I have to support it. It my last post, I outlined how to do this. What I forgot to add in and remembered due to Scott Hanselman’s bits of advice was to remember my RSS feed. With a quickie HttpHeader add in due to adding in the ?p support, this instantly solved the RSS feed problem. This has been verified by checking out my reader.google.com threads.
Here is the line from the web.config for the RSS feed and keeping the old crappy perm link support:
<HttpHandler pattern="(?:default\.aspx\?feed=rss2)$" type="Subtext.Framework.Syndication.RssHandler, Subtext.Framework" handlerType="Direct"/>
<HttpHandler pattern="(?:default\.aspx\?(p|page_id)=\d+)$" controls="viewpost.ascx,Comments.ascx,PostComment.ascx"/>
So running the straight up BlogML importer for me wasn’t going to work due to how SubText imports data. It does a straight up insert instead of keeping the IDs synced up. I didn’t want to destroy the ability to really update my blog’s software later on down the line so I minimized the zones of impact. Anywhere I’d have to modify for the BlogML importer, I created a new mirrored function. This protects me from breaking everything else. Neat, huh? Best of all, it is 1 time use code more or less. If people want this code, I can post it. After running the importer, I also had to run one additional script.
update subtext_content
set author = 'Clint', email='clint@rutkas.com'
Now the code to do the importing, the SQL below included, can be thrown away. The code for the query string must be added back after each upgrade. By having my blog, I've documented what I've done so this isn't the end of the world.
Here is the stored procedure: (the big thing is the SET IDENTITY_INSERT [subtext_Content] ON line to be able to get the ID's to be non-auto-incremental)
USE [better_subtext]
GO
/****** Object: StoredProcedure [dbo].[subtext_InsertEntry2] Script Date: 11/04/2007 17:47:31 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER OFF
GO
CREATE PROC [dbo].[subtext_InsertEntry2]
(
@Title nvarchar(255)
, @Text ntext = NULL
, @PostType int
, @Author nvarchar(50) = NULL
, @Email nvarchar(50) = NULL
, @Description nvarchar(500) = NULL
, @BlogId int
, @DateAdded datetime
, @PostConfig int
, @EntryName nvarchar(150) = NULL
, @DateSyndicated DateTime = NULL
, @ID int output
)
AS
IF(LEN(RTRIM(LTRIM(@EntryName))) = 0)
SET @EntryName = NULL
IF(@EntryName IS NOT NULL)
BEGIN
IF EXISTS(SELECT EntryName FROM [dbo].[subtext_Content] WHERE BlogId = @BlogId AND EntryName = @EntryName)
BEGIN
RAISERROR('The EntryName of your entry is already in use with in this Blog. Please pick a unique EntryName.', 11, 1)
RETURN 1
END
END
IF(LTRIM(RTRIM(@Description)) = '')
SET @Description = NULL
SET IDENTITY_INSERT [subtext_Content] ON
INSERT INTO subtext_Content
(
Title
, [Text]
, PostType
, Author
, Email
, DateAdded
, DateUpdated
, [Description]
, PostConfig
, FeedbackCount
, BlogId
, EntryName
, DateSyndicated
, Id
)
VALUES
(
@Title
, @Text
, @PostType
, @Author
, @Email
, @DateAdded
, @DateAdded
, @Description
, @PostConfig
, 0
, @BlogId
, @EntryName
, @DateSyndicated
, @ID
)
--SELECT @ID = SCOPE_IDENTITY()
EXEC [dbo].[subtext_UpdateConfigUpdateTime] @BlogId, @DateAdded
EXEC [dbo].[subtext_UpdateBlogStats] @BlogId