Updated September 28, 1999
KeyLabs Tests Confirm...
Internet Explorer 5 Rendering Engine Alters HTML Attribute Tags
HERE'S A PROBLEM that hasn't come over the consumer horizon yet, but is already a serious issue for Web developers.
The problem is that Microsoft Internet Explorer 5 (IE5) and Microsoft's underlying MSHTML editing and rendering engine can corrupt some HTML tags.
Specifically, when users save a document with IE5's default setting of "Web Page, complete," or when developers instruct the MSHTML engine to render a document on the fly, the resulting document no longer retains quotation marks around HTML element attributes.
<P ALIGN="left" ID="para1"></P>
<P ALIGN=left ID=para1></P>
Both IE5 and Netscape 4.61 will display a page with the corrupted code just fine because at this time neither browser requires full compliance with the W3C's HTML 4.0 Transitional standard. This may or may not change in the future, but already the corrupted code is causing problems for some Web developers.
The pinch is that without the quotation marks, the HTML code produced by IE5 and company is not compliant with eXtensible Markup Language (XML) standards, and therefore Web forms filled out by people using these Microsoft products can't be parsed for use in custom database applications.
That's what happened to Web development consultant Harvey Mushman. Using Microsoft Visual FoxPro and West Wind Technologies' XML toolkit, Mushman created a Web application that dynamically generated HTML via Microsoft's MSHTML engine. To his surprise, when these documents were passed through an XML parser, they failed. "After creating the entire page within the IE engine, when we brought it up in IE with the XML tags, and it just quit working," he explained. "XML says the code must be tight. It simply will not work without quotes."
For users, any IE5 Web page saved in this format is no better than a document created with a word processor. An XML parser, which could normally extract document information (like headlines, story text, dates, etc.) will not be able to extract any document-specific information from these pages. "Six months from now, you may want to take all these pages and put them in XML and redisplay them without having to work with each one separately, added Mushman. "But you can't do that; once the underlying HTML is invalid, you can't go back and make an invalid page valid again."
MICROSOFT HAS ACKNOWLEDGED the problem. According to Ray Sun, product manager for IE5 at Microsoft, the problem stems from the way the MSHTML engine writes data to disk in order to make images available for offline viewing. With the "Web Page, complete" command, "We have to rewrite the target of the source, so the images will load relative URLs. Then we persist that memory to disk," illustrated Sun. " The issue at hand here is our view of it is different from the original page, because we have saved it as an efficient format."
Although Sun did not indicated that a direct solution is forthcoming for developers, he suggests that if users want to save HTML in its pristine format, they can simply use the "Web Page, HTML only" rather than the "Web Page, complete" option.
Unfortunately, with this method the resulting document will not retain local copies of images, sounds, or other non-textual elements, which is clearly not an acceptable solution for some developers.
-- Bradley F. Shimmin
© BugNet material copyright 1994-1999 by BugNet.
This historic replica of BugNet from the period 1994-1999
BF Communications Inc.
Website by Running Dog