Text Encodings, why must you cause me headaches?

So I'm attempting to get a cool demo of Ajax up and working but am fighting one API every way since I'm actually not actually using their API in a traditional context.

After some tweaks, I got the JSON to c# object working but now I have a text encoding problem.  Some of the data from the object is in UTF-8 for HTML reasons.  < becomes \u003C and what not.  But since that \ needs to be escaped, it becomes \\.  \\u003C is not actually the same as \u003 when you do this wonderful line of code.  Now the real question is how dangerous do I want to be with this.  I could do a regular expression validate test the data.

\\\\u[A-Fa-f0-9]{4,4} would properly validate it according to Wikipedia if I read the standard correct.  I may just be overcomplicating this since I really think the greater than and less than symbols will be the only ones used.

Here is the function that is causing me the headaches. 

private static string Utf8ToUnicode(string utf8)
{
    return Encoding.Unicode.GetString(
        Encoding.Convert(
        Encoding.UTF8,
        Encoding.Unicode,
        Encoding.UTF8.GetBytes(utf8)));
}

Update:

I don't know if this is even possible since by stripping out one of the slashes, it still won't programmatically shift to UTF8.  Only way I can get it to read as UTF8 is if I literally hardcode in the string.  So right now I have 3 replacements, <, > and & were the UTF8 encoded characters I saw.  I'm betting there will be more but can't do much until I see them.

Matt Newman Sep 23, 2007 @ 5:09 PM

# 
I'd gladly loan you mine... but Halo 3 is gonna be here soon... :)

Maybe you'll be able to find one while you are in Minnesota

Matt Newman Sep 23, 2007 @ 7:09 PM

# 
I also use my Xbox as a Media Center extender so my wife wouldn't be happy anyway...

Matt Newman Sep 24, 2007 @ 8:09 AM

# 
Nothing actually, generally speaking there the only stuff my wife watches is recorded tv or dvds... I just watch the rest on my desktop.

Ian Sep 24, 2007 @ 2:09 PM

# 
WAH WAH.

Post a Comment

Please add 2 and 2 and type the answer here: