posted on
Sunday, October 21, 2007 11:10 PM |
So I'm attempting to get a cool demo of Ajax up and working but am fighting one API every way since I'm actually not actually using their API in a traditional context.
After some tweaks, I got the JSON to c# object working but now I have a text encoding problem. Some of the data from the object is in UTF-8 for HTML reasons. < becomes \u003C and what not. But since that \ needs to be escaped, it becomes \\. \\u003C is not actually the same as \u003 when you do this wonderful line of code. Now the real question is how dangerous do I want to be with this. I could do a regular expression validate test the data.
\\\\u[A-Fa-f0-9]{4,4} would properly validate it according to Wikipedia if I read the standard correct. I may just be overcomplicating this since I really think the greater than and less than symbols will be the only ones used.
Here is the function that is causing me the headaches.
private static string Utf8ToUnicode(string utf8)
{
return Encoding.Unicode.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.Unicode,
Encoding.UTF8.GetBytes(utf8)));
}
Update:
I don't know if this is even possible since by stripping out one of the slashes, it still won't programmatically shift to UTF8. Only way I can get it to read as UTF8 is if I literally hardcode in the string. So right now I have 3 replacements, <, > and & were the UTF8 encoded characters I saw. I'm betting there will be more but can't do much until I see them.