In Firefox 3.0.0 there is a “strange” regression issue regarding the encoding of XMLHttpRequest requests. It's not a bug per se, it's just different behavior, which we ran into (and no other browser does it this way)
What we basically do on the client side in JavaScript:
this.data = new XMLHttpRequest();
this.data.open('POST', dataURI);
this.data.send(xml);
where “xml” is a DOMDocument Object.
In Firefox 2.0 this request came with a
Content-Type: application/xml
and the xml in the POST body was encoded in UTF-8 (no encoding information in the XML declaration)
IE7 does:
text/xml; charset=UTF-8,
But Firefox 3.0.0 sends this as
Content-Type: application/xml; charset=ISO-8859-1
and the xml in the body is actually ISO-8859-1 encoded, but there is no encoding information in the XML declaration (eg. no <?xml encoding=”ISO-8859-1″?>) and of course our XML loader fall flat on its nose, when it had non-ASCII characters in it…
While having the encoding information only in the HTTP header and not also in the XML declaration is (as far as I can remember, didn't look up any specs) correct from a technical point of view, it was pretty annoying to find this “bug”. And now I have to check on the backend, how the request is encoded on that request on not just rely on “it's UTF-8 nowadays anyway or at least written in the XML declaration, so the XML parser can take care of it” (which was maybe naive from the beginning :))
Here's the code-snippet for the PHP server side:
function transformFromContentTypeToUTF8($str) {
if (isset($_SERVER['CONTENT_TYPE']) && preg_match('#charset=([^/s^;]+)#',$_SERVER['CONTENT_TYPE'],$matches)) {
if ($matches[1] == 'UTF-8') {
return $str;
}
if ($matches[1] == "ISO-8859-1") {
return utf8_encode($str);
}
return iconv($matches[1],"UTF-8",$str);
}
//if no charset, then return as it came
return $str;
}
function fixXMLEncodingFromHTTP($xml) {
if (!preg_match("#<?xml[^>]+encoding=#",$xml)) {
return transformFromContentTypeToUTF8($xml);
}
return $xml;
}
$rawpost = fixXMLEncodingFromHTTP(file_get_contents('php://input'));
// create a new DOM document out of the posted string
$xmlData = new DOMDocument();
$xmlData->loadXML($rawpost);
BTW, for non-ISO-8859-1 characters, FF 3 does transform them to numeric entities, welcome web 1.0 :)
And there's already a report of that issue on bugzilla, of course. But no idea, if they change that back soon