A discussion document, Nick Kew, 2001-07-19. Copyright © WebThing Ltd, 2001.

Contents

  1. Introduction
  2. The SP Message Structure
  3. DTD fragment for SP messages
  4. Implementation
  5. Source Code

Introduction

This document deals with both James Clark's SP and the OpenSP project, which represents the ongoing development of SP. Developed for SGML analysis and processing, it is also a powerful toolkit for working with XML. It is widely used in online tools including the Site Valet, (Code Valet and Page Valet), and the validation services for HTML (and more recently other markup) of the W3C and WDG.

The SP parser generates a range of diagnostic messages, which are (among other things) the basis for the validators output. However, the SP output loses much of the internal structure of the SP messages. An application such as a validator which parses the messages must seek to rebuild the structure, and is faced with two serious problems:

  1. The output is generated in colon-separated lines. But individual components of a line may themselves contain colons, which are neither escaped nor quoted. So parsing the lines is problematic.
  2. An SP message may have several related components, which will be printed as separate lines of output. To reconstruct the relationship between the lines is a major processing task, which none of the validators has attempted.

Taking these considerations together with several calls for the validators to generate XML output, I have developed a new class XMLMessageReporter for SP, that will generate SP's messages in an XML format, preserving the full internal structure.

SP/OpenSP Messages

Basic Message Format

The basic structure of a message is:

The Message Location comprises three fields: The filename or URL of the entity giving rise to the message, and the Line and Column within it that triggered the message.

ospam:http://localhost/bad.html:4:9:E:character "$" not allowed in attribute specification list

ospam is complaining of an error in test.html detected at the ninth character of the fourth line, of severity Error (E). This is a typical HTML validation error. Note the colon in the URL: this could equally have been a local filesystem path containing an arbitrary number of colons, making this output hard to parse.

ospam:http://localhost/bad:W:URL Redirected to "http://fenris.webthing.com/bad/"

ospam is warning that the URL was redirected by the server. Note that this message doesn't have the Location attributes, but has an extra colon in the message text.

Additional messages

These basic SP messages may be accompanied by associated messages giving further information:

References

A message may refer back to an earlier event: for example, when complaining of duplicate IDs, SP will reference the point where the ID was first declared.

ospam:test.html:10:6:E:end tag for "FORM" omitted, but its declaration does not permit this
ospam:test.html:7:0:start tag was here

ospam is complaining of an element that is unclosed at the point where its containing element is closed. The secondary messages tells us where the offending element was opened. It is easy for a human reader, but much harder for a parser, to see that these messages are related.

Context

Where an entity included from a document generates a message, a subsidiary message describing the inclusion may be generated (the -e option):

In Entity HTML included from test.html:1:48
ospam:HTML4-strict.dtd:364:0:W: unused parameter entity "MultiLengths"

Open Elements

The -g option will cause SP to describe open elements for parser messages:

ospam:test.html:4:9:E: element "WIBBLE" undefined
ospam:test.html:4:9: open elements: HTML BODY[1] (H1[1])

The error occurs within the HTML and BODY elements. The previous element to be closed, and the last valid element completed at this point, is H1.

Clauses

Someone please confirm this or correct me on this!

The -x option will cause SP to generate additional messages referencing specific clauses in the SGML specifications.

ospam:foobar.sgml:5:20: relevant clauses: ISO/IEC 10744:1997 A3.5.2

DTD for SP messages

Outdated - this is no longer complete

As we see from the above discussion, there is a good deal of structure in the SP messages. We have encapsulated this in a DTD fragment. We will use this fragment with XHTML to generate XML reports in the Site Valet tools (see implementation below).

The basic message is sp:message. The sp:message represents the basic message line described above in its attributes. Any of the additional messages are represented by the sp:reference, sp:context, sp:clause, sp:openelement, and sp:prevelement elements within the sp:message element to which they refer.


	<!ELEMENT sp:reference		(#PCDATA)	>
	<!ELEMENT sp:context		EMPTY		>
	<!ELEMENT sp:clause		(#PCDATA)	>
	<!ELEMENT sp:openelement	(#PCDATA)	>
	<!ELEMENT sp:prevelement	(#PCDATA)	>
	<!ELEMENT sp:message
		(#PCDATA|
		sp:clause|
		sp:context|
		sp:reference|
		sp:openelement|
		sp:prevelement)*
	>
	<!ATTLIST sp:message
		sp:id		ID	#REQUIRED
		sp:location	CDATA	#IMPLIED
		sp:line		NMTOKEN	#IMPLIED
		sp:column	NMTOKEN	#IMPLIED
		sp:severity	NMTOKEN	#IMPLIED
	>
	<!ATTLIST sp:reference
		sp:location	CDATA	#REQUIRED
		sp:line		NMTOKEN	#REQUIRED
		sp:column	NMTOKEN	#REQUIRED
	>
	<!ATTLIST sp:context
		sp:location	CDATA	#REQUIRED
		sp:line		NMTOKEN	#REQUIRED
		sp:column	NMTOKEN	#REQUIRED
		sp:entity	NMTOKEN	#IMPLIED
	>
	<!ATTLIST sp:openelement
		sp:matchindex	NMTOKEN	#IMPLIED
	>
	<!ATTLIST sp:prevelement
		sp:matchindex	NMTOKEN	#IMPLIED
	>

Implementation

(now removed; no need for dummy demos when real applications are becoming widespread)

Source Code

The source code is now incorporated into OpenSP as standard, and is no longer available separately here.