Now I am writing firefox 4 boot extension.
Here is my story:
When I use @ mozilla.org / xmlextras / xmlhttprequest; 1, nsIXMLHttpRequest , the contents of the destination URL can be successfully loaded req.responseText.
I parsed the responseText for the DOM using the createElement method and innerHTML property into a BODY element.
Everything seems successful.
However, there is a problem with character encoding (charset).
Since I need the extension to detect the encoding of the target documents, override the Mine request type with the text / html; charset = blahblah .. doesn't seem to fit my need.
I tried @ mozilla.org / intl / utf8converterservice; 1, nsIUTF8ConverterService , but it looks like XMLHTTPRequest does not have a ScriptableInputStream or even any InputStream or readable stream.
I donβt know how to read the contents of the target document in a suitable, automatically detected encoding, regardless of the auto-detection function of the character encoding in the GUI or the encoding read in the main meta tag of the content document.
EDIT: Would it be practical if I parsed the entire document, including the HTML, HEAD, BODY tag, into a DOM object, but without loading an extensive document such as js, css, ico files?
EDIT: A method in an MDC article called " HTML to DOM " that uses @ mozilla.org / feed-unescapehtml; 1, nsIScriptableUnescapeHTML is unacceptable because it analyzed a lot of errors and the error with baseURI cannot be set to type text / html . All HREF attributes in A elements are omitted when it contains a relative path .
EDIT # 2: It would be nice if there were any methods that could convert the incoming Text response to readable UTF-8 encoding strings. :-)
Any ideas or work on solving the encoding problem are welcome. :-)
PS. target documents are universal , so there is no defined charset (or ... preknown ), and, of course, not only UTF8, as it is already defined by default.
SUPP:
So far, I have two brief basic ideas for solving this problem.
Can someone help me work with XPCOM module and method names?
In Specify the encoding when parsing content in the DOM.
We need to first find out the encoding of the document (by extracting the head meta tag or title). Then
- Find out a method that can determine the encoding when analyzing body contents.
- Find a method that can analyze both the head and body.
In Convert or Make an Incoming Reply Text to / will be UTF-8, so parsing the default DOM element with UTF-8 encoding still works.
X seems impractical and reasonable: redefining Mine with a character set is an implementation of this idea, but we cannot predict the encoding before starting the request.