parseLib Syntax Rules

 

Overview

This section contains initial working notes on the design of the syntax rules and features definition language. The syntax definition sublanguage is a combination of the lexical analysis ideas in [1.3.3] and the feature based grammar ideas in [2.7].

Syntax Features

Reference [2.7] uses the concept of attributed lexical tokens. The compiler definition language supports the definition of syntax features (token attributes for classes of lexical tokens) as follows:

d1 : [t11 t12 ... t1n] [v11 v12 ... v1n]

d2 : [t21 t22 ... t2n] [v21 v22 ... v2n]

. . .

dn : [tn1 tn2 ... tnn] [vn1 vn2 ... vnn]

Where each di is a distinct attribute name, and each [ti1 ti2 ... tin] is a unique list of individual lexical tokens, and each [vi1 vi2 ... vin] is an optional unique list of individual attribute values (if the optional value list is missing, values of true are assumed). While each list contains unique tokens, different lists may contain the same token. In effect, this allows a single lexical token to be associated with multiple attributes. The syntax is as follows:

#SyntaxFeatures#

RelationalOperator: [== < > != <= >=]

Lisp: [== < > != <= >=] [= < > <> <= >=]

ArithmeticOperator: [+ - * / %]

Operator: [+ - * / % == < > <> <= >= = += -= *= /=]

AssignmentOperator: [= += -= *= /=]

Lisp: [= += -= *= /=] [setq += -= *= /=]

IFStatement: [if]

LeftParen: [|(|]

RightParen: [|)|]

LeftBrace: [|{|]

RightBrace: [|}|]

LeftBracket: [|[|]

RightBracket: [|]|]

Semicolon: [|;|]

#End#

For instance, the syntax feature definitions, shown above, generate the following attributes for the = token:

= #{AssignmentOperator: true Operator: true Lisp: setq Value: =}

We can assign attributes to individual tokens or to whole classes of tokens using the attribute assignment syntax.

Note: All attribute names must begin with an uppercase character and must contain at least one non uppercase character.

Note: All lexical token, if a symbol, are surrounded by vertical bar symbols. All attribute values are left unaltered. For instance,

Boolean: [true false] [true false]

generates Lisp code as follows:

(_setSyntaxFeature Boolean: #( |true| |false| ) #( true false) )

Clearly this sets the symbol |true| with a Boolean value of true in the compiler's token directory.

 

For natural language definition, groups of words can be given multiple common features as follows:

WORDS: [...word list...] [...common features list...]

Where each word, in the word list, may be a single word or a collection. The word list starts with the root word and proceeds with all of its synonyms. For instance:

[give gives gave given giving]

The root word is `give'. The words `gives', `gave', `given', and `giving', are all synonyms for the root word `give'. Each word may be a single word or a word group, enclosed in square brackets, starting with the word and proceeding with all of its special features. For instance:

[give gives gave given [giving Noun]]

Once again, the root word is `give', and the words `gives', `gave', `given', and `giving', are all synonyms for the root word `give'. However, in this case the word `giving' has been given a special feature of Noun.

The common features list describes the features which will be common to all words in the word list. Each feature may be a single word or a word group, enclosed in square brackets, starting with the feature and proceeding with the value of the feature. A singleton feature is assumed to have a value of true. For instance:

WORDS: [sky] [Noun [Color blue]]

In effect, this defines the word sky with a feature set of Noun = true and Color = blue. The word features definition, shown above, generates compiler initialization Lisp code as follows:

(_setWordFeatures #( |sky| ) #( Noun #(Color blue)) )

which, during compiler initialization, generates the following attributes for the sky word:

= #{Value: "sky" Sky: true Noun: true Color: blue }

Some examples of the word definition syntax is as follows:

#SyntaxFeatures#

WORDS: [Alice] [Noun Name Female]

WORDS: [computer [computers Plural] [computing Verb]] [Machine Noun]

#End#

Note: All feature names must begin with an uppercase character and must contain at least one non uppercase character.

Syntax Rule Definitions

Reference [1.3.3] uses syntax definition from regular definitions of the form:

d1 : r1 || c1 || :: a1 ::

d2 : r2 || c2 || :: a2 ::

. . .

dn : rn || cn || :: a1 ::

Where each di is a rule name, each ri is a rule expression, each ci is a Lisp conditional expression, and each ai is a Lisp action expression. The syntax for rule definition is as follows:

#SyntaxRules#

 

REXPR: + Term || (= $2.Term true) || :: $2 ::

REXPR: - Term || (= $2.Number true) || :: (setq $2.Value (- 0 $2.Value))

REXPR=: - Term :: (setq $2.Value (list |-|: 0 $2.Value))

REXPR=: Term :: $1

REXPR=: REXPR RelationalOperator REXPR:: (setq $0.Value (list $2.Lisp $1 $3))

REXPR=: LeftParen REXPR RightParen :: $2

 

SEXPR: + Term || (= $2.Term true) || :: $2 ::

SEXPR: - Term || (= $2.Number true) || :: (setq $2.Value (- 0 $2.Value))

SEXPR: - Term :: (setq $2.Value (list |-|: 0 $2.Value))

SEXPR: Term :: $1

SEXPR: SEXPR Operator SEXPR:: (setq $0.Value (list $2.Lisp $1 $3))

SEXPR: LeftParen SEXPR RightParen :: $2

 

#End#

The Lisp condition rule is optional. If present, it must be enclosed by the || symbol. The Lisp action rule is mandatory. It must be enclosed by the :: symbol. The rule variable $0 is the default structure initialized by the rule. The $0 variable always has the attribute of the named rule set to true. The rule variables $1 through $9 correspond to the respective token expressions in the rule body.

Note1: All rule names must contain only uppercase characters and must contain no non uppercase characters, numerals, or underscores.

Note2: The $ symbol must not be used in an argument phrase, action, or condition rule anywhere except as a rule variable identifier $0 through $9. If the condition or action rule requires a $ symbol, for instance inside a string constant, place the $ symbol in a user defined function which is called by the argument phrase, action, or condition rule.

BNF Notation

Syntax rule names, syntax feature names, but not constants, may have trailing BNF operators of "*" or "+" or "?". For example:

SEQUENCE: Number+ :: (setq $0.Value $1) ::

Any syntax rule name and any syntax feature name (other than the special Eof and Nop features) may have trailing BNF operators. The user is required to make sure that the resulting rule does not cause the new compiler to loop endlessly on the input string. The BNF operators have the following meanings:

??        The "*" operator signifies none or more (may cause endless looping if specified inappropriately).

??        The "+" operator signifies one or more.

??        The "?" operator signifies none or one.

Note1: For syntax features, the BNF operators a vector of each repetition result.

Note2: For syntax rules, the BNF operators return a vector of each repetition result.

$this

The $this variable contains the current input character, at each invocation of a syntax rule with the + and * BNF command modifiers, during syntax analysis. The $this variable can be used in connection with user defined condition rules, for example:

MAIN: Any{(isNumber $this.Value)}*

Argument Passing

User defined rules may be passed arguments. A Lisp argument phrase, enclosed with the ( ) symbol pair, will cause the user defined rule to receive the specified argument. Within a user defined rule definition, the %0 thru %9 variables represent any arguments which may have been passed to the rule as follows:

QUALIFY: DotOperator Name QUALIFY( (setq $0.Value (append |ref|: %0.Value $2.Value)) )

:: $3.Value ::

QUALIFY: DotOperator Name :: (setq $.0.Value (append |ref|: %0.Value $2.Value)) ::

TERM: Name QUALIFY($1) :: $2 ::

TERM: Name :: $1 ::

The TERM rule will recognize all syntax of the form Name.Name.Name ... The rule returns when a Dot Operator no longer qualifies the name. The result is a structure with the attribute TERM = true, and the Value attribute containing the complete expression already reformed into a nested ref notation list.

Note: The % symbol must not be used in an argument phrase, action, or condition rule anywhere except as a rule variable identifier %0 through %9. If the argument phrase, action, or condition rule requires a % symbol, for instance inside a string constant, place the % symbol in a user defined function which is called by the argument phrase, action, or condition rule.

Iterative Rules

User defined rules may be repeated iteratively. Two forms of iteration are provided. Enclose an action rule with << >> to cause the user defined rule to repeat. Enclose an action rule with && && to cause the user defined rule to substitute and repeat.

Rule repetition using the << >> rule form

The contents of the $0 variable remain intact. The builtin Eof attribute name allows a rule to test for End Of File in the following rule:

SEXPR: Term Operator Term

<< (setq $.0.Value (appendList $0.Value (list (list $2.Value $1.Value $3.Value)))) >>

SEXPR: Operator Term << (setq $.0.Value (list $1.Value $0.Value $2.Value)) >>

SEXPR: Eof :: $0 ::

The SEXPR rule will recognize all syntax of the form Term Operator Term Operator Term Operator ... The rule returns when the End Of File is reached. The result is a structure with the attribute SEXPR = true, and the Value attribute containing the complete expression already reformed into a prefix notation list.

Note: The $n symbol contains the repetition count for the rule. During the first iteration through the rule, the $n variable is set to 1.

 

Rule repetition using the && && rule form for substitution

Use the && && action rule form to match and substitute tokens in the input. Like the << >> action rule syntax, && && causes the rule to repeat until no more matches are found. Using the && && rule form allows the input stream to be treated like a working set.

Consider the following example.

			

#SyntaxRules#

MAIN: MOVES :: $1.Value ::

MOVES: Direction Number && (makemove $1.Value $2.Value) &&

MOVES: Moves Direction Number && (makemove $1 $2.Value $3.Value) &&

MOVES: Moves :: $1 ::

#end#

#UserDefinedFunctions#

(defchild moveCompiler:makemove (...)

vars(moves)

(if (= (argCount) 2) ; Create new Moves token structure

(setq moves (new Structure: Moves: true

Value: (new Vector: 2 (argFetch 0) (argFetch 1)))

else

(begin ; Add to existing Moves token structure

(setq moves (argFetch 0))

(setq moves.Value[(length moves.Value)] (argFetch 1))

(setq moves.Value[(length moves.Value)] (argFetch 2))

))

moves)

#end#

The && && rule form causes the replacement of the matched tokens with the result of the action expression. The result of the action expression is normally a structure as shown in the example above.

From the console, calling the moveCompiler would give the following result:

(writeln (moveCompiler "West 2 East 15 West 3 North 5")) ??

#("West" 2 "East" 15 "West" 3 "North" 5)

It is important to understand that the && && action rule form modifies the input stream in place and parses again from the token position substituted into. Consider the following illustration of this process give the source input in the example above:

Lexical output:

n Token Structure

0                 #{Value: "West" Direction: true Charpos: 0}

1                 #{Value: 2 Number: true Charpos: 5}

2                 #{Value: "East" Direction: true Charpos: 7}

3                 #{Value: 15 Number: true Charpos: 12 }

4                 #{Value: "West" Direction: true Charpos: 15}

5                 #{Value: 3 Number: true Charpos: 20}

6                 #{Value: "North" Direction: true Charpos: 22}

7                 #{Value: 5 Number: true Charpos: 28}

After the first firing of the MOVES rule:

n Token Structure

0                 #{Value: #("West" 2) Moves: true }

1                 #{Value: "East" Direction: true Charpos: 7}

2                 #{Value: 15 Number: true Charpos: 12 }

3                 #{Value: "West" Direction: true Charpos: 15}

4                 #{Value: 3 Number: true Charpos: 20}

5                 #{Value: "North" Direction: true Charpos: 22}

6                 #{Value: 5 Number: true Charpos: 28}

After the second firing of the MOVES rule:

n Token Structure

0                 #{Value: #("West" 2 "East" 15) Moves: true }

1                 #{Value: "West" Direction: true Charpos: 15}

2                 #{Value: 3 Number: true Charpos: 20}

3                 #{Value: "North" Direction: true Charpos: 22}

4                 #{Value: 5 Number: true Charpos: 28}

 

After the third firing of the MOVES rule:

n Token Structure

0                 #{Value: #("West" 2 "East" 15 "West" 3) Moves: true }

1                 #{Value: "North" Direction: true Charpos: 22}

2                 #{Value: 5 Number: true Charpos: 28}

 

After the fourth firing of the MOVES rule:

n Token Structure

0                 #{Value: #("West" 2 "East" 15 "West" 3 "North" 5) Moves: true }

Term Conditions

User defined rules may be also have user defined conditions attached. A Lisp condition phrase, enclosed with the { } symbol pair, will cause the user defined rule to receive the specified condition. Within a user defined rule condition, the %0 thru %9 variables represent any arguments which may have been passed to the rule, while the $0 thru $9 variables represent any terms which may have been recognized by the rule.

STRING: Quote{(= $n 1)} << true >>

STRING: Any << (setq $0.Value (appendList $0.Value $1.Value) >>

STRING: Quote{(> $n 1)} :: $0 ::

The STRING rule will recognize all syntax tokens inclosed within two quotes The rule returns only when the second quote is recognized. User defined rules may have both argument passing and user defined conditions attached. The suer defined condition is always last, as follows.

TERM: NAME(%0){(= $1.Term true)} :: $1 ::

Note: The % symbol must not be used in an argument phrase, action, or condition rule anywhere except as a rule variable identifier %0 through %9. If the argument phrase, action, or condition rule requires a % symbol, for instance inside a string constant, place the % symbol in a user defined function which is called by the argument phrase, action, or condition rule.

MAIN Rule

The user must define a MAIN rule in the compiler definition. The MAIN rule is the rule which the new compiler will invoke to start the syntax analysis phase. If there is no MAIN rule defined, no syntaxtical analysis will result.

MAIN: STATEMENT Semicolon << (setq $.0.Value (appendList $0.Value $1.Value)) >>

MAIN: Eof :: $0 ::

This sample MAIN rule will recognize all syntax of the form statement; statement; statement; ... The rule returns when the End Of File is reached.

Note: Remember to avoid excessive recursion errors by making strategic rules iterative instead of right recursive.

Special Rule Syntax

Any

In direct mode, if a rule is to accept any token, use the Any attribute. This special test works because Any tests the token directly and does not assume that it is feature based. For example:

;; This rule recognizes a plus sign between anything

RULE: Any Any{(= $2 "+")} Any :: (setq $0.Value (list '+ $1 $3)) ::

Eof

If a rule is to test for end of file, use the Eof attribute. For example:

;; This rule recognizes an end of file condition

RULE: Eof :: $0 ::

Nop

The special Nop attribute always returns a constant token of #void. The Nop attribute is designed to provide a test which always is true, but does not promote the token pointer (i.e. a no-operation rule).

MAIN: STATEMENT Semicolon << (setq $.0.Value (appendList $0.Value $1.Value)) >>

MAIN: Eof :: $LIST ::

MAIN: Nop :: (error "If we get here we have an invalid token") ::

This sample MAIN rule will recognize all syntax of the form statement; statement; statement; ... However, if the MAIN rule encounters anything else (other than statement;), then an error message will be returned. The rule returns when the End Of File is reached, or an error is generated.

Note: If the Nop test is used, user ordering of the specified rule is almost always required.

$N

We may use a previously recognized parser variable to indicate a test for equality. For example:

;; These two rules are equivalent

RULE: Any $1 :: $0 ::

RULE: Any Any{(= $2 $1)} :: $0 ::

%N

We may use a previously passed parser argument variable to indicate a test for equality. For example:

;; These two rules are equivalent

RULE: Any %0 :: $0 ::

RULE: Any Any{(= $2 %0)} :: $0 ::

Append List Function

The builtin appendList function allows multiple arguments to be append together to form a list as follows:

(define X '(5 6 7))

(define Y '(10 20))

(appendList '+ X Y) ==> (+ 5 6 7 10 20)

The appendList function is builtin and may be used in any rule's action expression.

Rule Precedence

This Lambda supports multiple rule definitions up to the limits of available memory. The rule precedence, within a rule, is determined by the parseLib, to maximize search speed, and is unpredictable by the user. However any RULE may have automatic rule ordering turned off by specifying the following special rule statement as the first statement of the rule.

RULENAME: user ordering :: true ::

If automatic rule ordering is turned off, the parseLib will attempt to use the rule ordering supplied by the programmer in the DEFINITIONS file. If at all possible, the rule ordering, specified by the developer, will be closely followed by the parseLib in generating the compiler code.

Note: If the Eof and/or Nop special feature test are used in a rule, user ordering is advisable, because these tend to make it hard for parseLib to gues the correct ordering on its own.