This document deals with both James Clark's SP and the OpenSP project, which represents the ongoing development of SP. Developed for SGML analysis and processing, it is also a powerful toolkit for working with XML. It is widely used in online tools including the Site Valet, (Code Valet and Page Valet), and the validation services for HTML (and more recently other markup) of the W3C and WDG.
The SP parser generates a range of diagnostic messages, which are (among other things) the basis for the validators output. However, the SP output loses much of the internal structure of the SP messages. An application such as a validator which parses the messages must seek to rebuild the structure, and is faced with two serious problems:
Taking these considerations together with several calls for the validators to generate XML output, I have developed a new class XMLMessageReporter for SP, that will generate SP's messages in an XML format, preserving the full internal structure.
The basic structure of a message is:
The Message Location comprises three fields: The filename or URL of the entity giving rise to the message, and the Line and Column within it that triggered the message.
ospam:http://localhost/bad.html:4:9:E:character "$" not allowed in attribute specification list
ospam is complaining of an error in test.html detected at the ninth character of the fourth line, of severity Error (E). This is a typical HTML validation error. Note the colon in the URL: this could equally have been a local filesystem path containing an arbitrary number of colons, making this output hard to parse.
ospam:http://localhost/bad:W:URL Redirected to "http://fenris.webthing.com/bad/"
ospam is warning that the URL was redirected by the server. Note that this message doesn't have the Location attributes, but has an extra colon in the message text.
These basic SP messages may be accompanied by associated messages giving further information:
A message may refer back to an earlier event: for example, when complaining of duplicate IDs, SP will reference the point where the ID was first declared.
ospam:test.html:10:6:E:end tag for "FORM" omitted, but its declaration does not permit this ospam:test.html:7:0:start tag was here
ospam is complaining of an element that is unclosed at the point where its containing element is closed. The secondary messages tells us where the offending element was opened. It is easy for a human reader, but much harder for a parser, to see that these messages are related.
Where an entity included from a document generates a message, a subsidiary message describing the inclusion may be generated (the -e option):
In Entity HTML included from test.html:1:48 ospam:HTML4-strict.dtd:364:0:W: unused parameter entity "MultiLengths"
The -g option will cause SP to describe open elements for parser messages:
ospam:test.html:4:9:E: element "WIBBLE" undefined ospam:test.html:4:9: open elements: HTML BODY (H1)
The error occurs within the HTML and BODY elements. The previous element to be closed, and the last valid element completed at this point, is H1.
Someone please confirm this or correct me on this!
The -x option will cause SP to generate additional messages referencing specific clauses in the SGML specifications.
ospam:foobar.sgml:5:20: relevant clauses: ISO/IEC 10744:1997 A3.5.2
Outdated - this is no longer complete
As we see from the above discussion, there is a good deal of structure in the SP messages. We have encapsulated this in a DTD fragment. We will use this fragment with XHTML to generate XML reports in the Site Valet tools (see implementation below).
The basic message is sp:message. The sp:message represents the basic message line described above in its attributes. Any of the additional messages are represented by the sp:reference, sp:context, sp:clause, sp:openelement, and sp:prevelement elements within the sp:message element to which they refer.
<!ELEMENT sp:reference (#PCDATA) > <!ELEMENT sp:context EMPTY > <!ELEMENT sp:clause (#PCDATA) > <!ELEMENT sp:openelement (#PCDATA) > <!ELEMENT sp:prevelement (#PCDATA) > <!ELEMENT sp:message (#PCDATA| sp:clause| sp:context| sp:reference| sp:openelement| sp:prevelement)* > <!ATTLIST sp:message sp:id ID #REQUIRED sp:location CDATA #IMPLIED sp:line NMTOKEN #IMPLIED sp:column NMTOKEN #IMPLIED sp:severity NMTOKEN #IMPLIED > <!ATTLIST sp:reference sp:location CDATA #REQUIRED sp:line NMTOKEN #REQUIRED sp:column NMTOKEN #REQUIRED > <!ATTLIST sp:context sp:location CDATA #REQUIRED sp:line NMTOKEN #REQUIRED sp:column NMTOKEN #REQUIRED sp:entity NMTOKEN #IMPLIED > <!ATTLIST sp:openelement sp:matchindex NMTOKEN #IMPLIED > <!ATTLIST sp:prevelement sp:matchindex NMTOKEN #IMPLIED >
(now removed; no need for dummy demos when real applications are becoming widespread)
The source code is now incorporated into OpenSP as standard, and is no longer available separately here.