Document Tree Model

 

DocTree Overview

This chapter is a reference guide to the XML document tree model. The document tree model contains all the information in the XML string, but compiled into a structured form. All of the basic building blocks of XML are contained in the document tree model.

The document tree model serves as the default output from the xml function, and may be used as a foundation for the implementation of a DOM compliant model. For its use as a default compiler output completeness and easy addressability are key design ingredients.

The XML document tree model is a recursive Structure with the data inside the XML document addressable by attribute as well as by numeric index. For example the following XML input string:

<?xml version = '1.0' standalone='yes' encoding = 'hello' ?>
			<!-- This is a dynamic name and address example -->
			<Address FirstName = "yes" LastName = 'yes'> 
			   This is just general content for the Address element.
			   <FirstName>Michael</FirstName>
			   <LastName>Korns</LastName>
			   <Street>214 Shorebreaker</Street>
			   <City>Laguna Niguel</City>
			   <State>California</State>
			   This is more content for the Address element.
			</Address>

Returns the following XML document tree model:

#{
			  __attlist: #{version: '1.0' standalone: 'yes' encoding: 'hello'}
			  Address: #{
			        __attlist: #{FirstName: "yes" LastName: 'yes'}
			        __content: "This is just general content for the Address element."
			        FirstName: "Michael"
			        LastName:  "Korns"
			        Street: "214 Shorebreaker"
			        City: "Laguna Niguel"
			        State: "California"
			        __content: "This is more content for the Address element."
			         }
			}

Notice how the terminal nodes of the document tree model are all singletons, while the intermediate nodes of document tree model are recursive element Structures with attributes and values. Finally, notice how the various parts of the document tree model can be referenced either by attribute or by element index number:

document.Address.FirstName == "Michael"
and
document[4][2] == "Michael"

The XML document tree model also handles multiple elements with the same tag using the simple expedient that only the first (of non-unique) element can be referenced by its tag, all others must be referenced by numeric index. For example the following XML input string:

<?xml version="1.0"?>
			<poem>
			<line>Roses are red,</line>
			<line>Violets are blue.</line>
			<line>Sugar is sweet,</line>
			<line>and I love you.</line>
			</poem>

Returns the following XML document tree model:

#{
			  __attlist: #{version: "1.0"}
			  poem: #{
			      line: "Roses are red,"
			      line: "Violets are blue."
			      line: "Sugar is sweet,"
			      line: "and I love you."
			      }
			 }

Notice how the terminal nodes of the document tree model are all singletons, while the intermediate nodes of document Structure are recursive element Structures with attributes and values. Finally, notice how the first "line" element can be referenced either by name or by index and all other "line" elements can only be referenced by index number:

document.poem.line == "Roses are red,"
and
document.poem[0] == "Roses are red,"
and
document.poem[2] == "Sugar is sweet,"

Elements

The xml document tree model supports elements as the basic node of the model. Each element is a binding in the parent Structure. The Name of the binding is the Tag name of the element. The value of the binding may be a singleton or a recursive Structure depending upon which interpretation maintains completeness and saves the most real estate.

Example1

XML Input: <?xml?>
Tree Model: #{xml: true}

Note:This is the most ergonomic tree model for the document shown.

Example2

XML Input: <Name/>
Tree Model: #{Name: true}

Note: This is the most ergonomic tree model for the document shown.

Example3

XML Input: <?xml?><Name>John</Name>
Tree Model: #{xml: true Name: "John"}

Note: An embedded element is compiled as a simple attribute binding of the parent Structure. The value is always a singleton, iff it can be compiled as such without loss of information.

Example4

XML Input: <?xml?><Name><First>John</First><Last>Doe</Last></Name>
Tree Model: #{xml: true Name: #{First: "John" Last: "Doe"}}

Note:An embedded element is compiled as a simple attribute binding of the parent Structure. The value is always a child Structure, iff it must be compiled as such to avoid loss of information.

Attribute Lists

The xml document tree model supports element attribute lists in the basic nodes of the model. Each element attribute list is a binding in the element Structure. The Name of the binding is "__attlist. The value of the binding is always a Structure containing the name value binds for the attribute list. If there is no attribute list, the binding will not be present to save real estate.

Example1

XML Input: <?xml version=?1? standalone="yes"?>
Tree Model: #{xml: true __attlist: #{version: 1 standalone: "yes"}}

Note: This is the most ergonomic tree model for the document shown.

Example2

XML Input: <Name/>
Tree Model: #{Name: true}

Note: This are no attributes, so the binding is not present.

Example3

XML Input: <?xml?><Name firstonly="yes">John</Name>
Tree Model: #{xml: true __attlist: #{firstonly: "yes"} Name: "John"}

Note:An embedded element is compiled as a simple attribute binding of the parent Structure. The value is always a singleton, if it can be compiled as such without loss of information.

Document Type Definitions

The xml document tree model supports document type definitions in the basic node of the model. Each document type definition is a binding in the parent Structure. The Name of the binding is "__dtd. The value of the binding is always a Structure containing the name value binds for the document type definition list. For each binding within the __dtd Structure, the Name of the binding is the tag name of the document type definition (DOCTYPE, ELEMENT, ENTITY, ATTLIST, or NOTATION). The value of the binding will be the character data ready for processing. If there are no document type definitions, the binding will not be present to save real estate.

Note: The binding for javaScript expects the required four parameters. The javaScript is compiled so that the key words document, this, and piStructure will be bound properly.

Example1

XML Input: <?xml?><!DOCTYPE MyDoc SYSTEM "MYDOC.DTD">
Tree Model: #{xml: true __dtd: #{DOCTYPE: {MyDoc SYSTEM "MYDOC.DTD"}}}

Note: A document type definition is compiled as a simple attribute binding of the parent Structure. The value is always a singleton character data stream. The name is always the document type definition tag name.

Example2

XML Input:
<?xml?>
<!DOCTYPE list [
<!ELEMENT list (item+)>
<!ELEMENT item (#PCDATA)>
<!ATTLIST item topic CDATA #IMPLIED>
]>
Tree Model:
#{xml: true __dtd: #{DOCTYPE: "list"}
ELEMENT: "list (item+)")
ELEMENT: "item (#PCDATA)")
ATTLIST: "item topic CDATA #IMPLIED")
}

Note: A document type definition is compiled as a simple attribute binding of the parent Structure. The value is always a singleton character data stream. The name is always the document type definition tag name. Nested document type definitions appear at the top level of the __dtd Structure.

Processing Instructions

The xml document tree model supports processing instructions in the basic node of the model. Each processing instruction is a binding in the parent Structure. The Name of the binding is "__pi. The value of the binding is always a Structure containing the name value binds for the processing instruction list. For each binding within the __pi Structure, the Name of the binding is the Target name of the processing instruction. The value of the binding will be the character data ready for processing. If there are no processing instructions, the binding will not be present to save real estate.

The processing instruction will be performed depending upon whether the piStructure is present, as an argument to the xml function, and if the piStructure contains a binding for the target name. Each processing instruction target name, in the piStructure, must be bound to a lambda value expecting four arguments: (piStructure) the piStructure itself so that scripts can add to the available processing instruction targets, (document) the xml document in progress, (this) the current xml element, and (source) the processing instruction content. For the purpose of the examples in this section, let us assume that the piStructure, passed as an argument to the xml function, appears as follows.

(define PiStructure 
			(new Structure:
			   javaScript: (lambda(piStructure document this source) 
			               (eval (compile (morph (javaScript source))
			                     (new Lambda: 
			                          Pv: (new structure: 
			                               document: document 
			                               this: this 
			                               piStructure: piStructure)); end new Lambda
			                )))) ; end define

Note: The binding for javaScript expects the required four parameters. The javaScript is compiled so that the key words document, this, and piStructure will be bound properly.

Example1

XML Input: <?xml?><?doNothing writeln(?Hello World?);?>
Tree Model: #{xml: true __pi: #{doNothing: "writeln(?Hello World?);"}}

Note: A processing instruction is compiled as a simple attribute binding of the parent Structure. The value is always a singleton character data stream. The name is always the target name. No processing takes place because there is no piStructure binding for the target name doNothing.

Example2

XML Input: <?xml?><?javaScript writeln(?Hello World?);?>
Tree Model: #{xml: true __pi: #{javaScript: "writeln(?Hello World?);"}} Console:"Hello World"

Note1: A processing instruction is compiled as a simple attribute binding of the parent Structure. Processing takes place because there is a piStructure binding for the target name javaScript.

Example3

XML Input: <?xml?> <?noScript x+1?> <?noScript y-1?>
Tree Model: #{xml: true __pi: #{ noScript: "x+1" noScript: "y-1"}} Console:"Hello World"

Note: A processing instruction is compiled as a simple attribute binding of the parent Structure. No processing takes place because there is no piStructure binding for the target name noScript.

Character Data

The xml document tree model supports character data binding values in the basic nodes of the model. Each character data field is a binding in the parent Structure. The Name of the binding is either the Tag name of the inclosing element, Target name of the inclosing processing instruction, or the special Name "__content" depending upon which is the most ergonomic form with no loss of data. The value of the binding will be the character data.

Example1

XML Input: <?xml?>These are just characters
Tree Model: #{xml: true __content: "These are just characters"}

Note: A character data is compiled as a simple attribute binding of the parent Structure. The value is always a singleton character data stream.

Example2

XML Input: <?xml?><Name>John Doe</Name>
Tree Model: #{xml: true Name: "John Doe"}

Note: A character data is compiled as a simple attribute binding of the parent Structure. The value is always a singleton character data stream. The name may be the Tag name of the inclosing element.

Example3

XML Input: <?xml?>My first name<First>John</First>and my last name<Last>Doe</Last>
Tree Model: #{xml: true __content: "My first name" First: John __content: "and my last name" Last: "Doe"}

Note: A character data is compiled as a simple attribute binding of the parent Structure. The value is always a singleton character data stream. The name may be the special name "__content" iff require to prevent loss of data..

CDATA Section

Data inside an XML Document is normally parsed by the XML parser. There are some characters that are illegal in XML and have to be replaced by entity references. These characters include < and > among others. To avoid replacing illegal characters with entity references for brevity and convenience, the CDATA Section is used.

Data inside a CDATA Section is ignored by the parser. The text inside the CDATA Section is treated as character data. The character data values are bound in the basic nodes of the model. Each character data field is a binding in the parent Structure. The Name of the binding is the special Name "__content" The value of the binding will be the unparsed character data.

Example1

XML Input: <?xml?><![CDATA[ any data at all can be in here ]]>
Tree Model: #{xml: true __content: any data at all can be in here }

Note: A cdata is compiled like a character data, that is, as a simple attribute binding of the parent Structure. The value is always a singleton character data stream.

Example2

XML Input: <?xml?><![CDATA[<Name>John Doe</Name>]]>
Tree Model: #{__pi: #{xml: true} __content: <Name>John Doe</Name>}

Note: The Characters enclosed in the CDATA tag are not parsed by the XML parser. The characters are displayed as it is in the Document Tree Model.

Compare this example with the example of parsed data below.

Example3

XML Input: <?xml?> <Name>John Doe</Name>
Tree Model: #{__pi: #{xml: true} Name: John Doe}

Comment Data

The xml document tree model supports comment data binding values in the basic nodes of the model. Each comment data field is a binding in the parent Structure. The Name of the binding is the special Name "__comment". The value of the binding will be the comment data.

Example1

XML Input: <?xml?><!?This is a comment -->
Tree Model: #{xml: true __comment: "This is a comment"}

Note: Comment data is compiled as a simple attribute binding of the parent Structure. The value is always a singleton character data stream.

Example2

XML Input: <?xml?><Name>John Doe</Name><!?This is a comment -->
Tree Model: #{xml: true Name: "John Doe" __comment: "This is a comment"}

Note: Comment data is compiled as a simple attribute binding of the parent Structure. The value is always a singleton character data stream.