Copyright (c) 2000 Thomas Yip, Inc. All Rights Reserved.
LEAFSOFT MAKES NO REPRESENTATIONS OR WARRANTIES ABOUT THE
SUITABILITY OF THE SOFTWARE AND/OR THE DOCUMENT, EITHER EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR
NON-INFRINGEMENT. LEAFSOFT SHALL NOT BE LIABLE FOR ANY DAMAGES
SUFFERED BY LICENSEE AS A RESULT OF USING, MODIFYING OR DISTRIBUTING
THIS SOFTWARE AND/OR THE DOCUMENT OR ITS DERIVATIVES.
Introduction
Leafsoft's document fragmentation declaration
language, Leato, was originally designed to describe XML fragments
within Leafsoft Larix. Larix raises some specific requirements which
current declaration specifications, for example DTD and Schema, do
not address. Larix requires document declaration to be intuitive,
compact and extensible.
Leato is very compact, allows declaration of
characters in Regular Expression and supports of non-deterministic
sub elements. Also, Leato syntax compatible to XML format, thus
declaration can be passed to Leato validator using any XML parsers
without modification to parser. And, multiple declarations can be
stored as or embedded in a single XML document.
Leato isn't mean to be an replacement of DTD. It
doesn't scale too well for very complicated declaration. But, it is
very certainly handy for decelerating small fragments and suitable
for variety of tasks which DTD and schema doesn't help.
Body
1. Declaration of a fragment
1.1. Define an element
Defining an element in Leato is straightforward. The
element to be declared is just appearing as a regular XML element.
For example, if we want to allow <ElementName> as
the only element in our XML fragment structure, the declaration
would be:
<Leato>
<ElementName/>
</Leato>
1.2. Define sub-elements
Compact Mode
Defining sub element is as simple. Consider the
following example in compact mode:
<Leato>
<Name>
<First>{#PCDATA}</First>
<Last>{#PCDATA}</Last>
</Name>
</Leato>
It defines element <Name> allows sub-element
<First> follow by a <Middle> and follow by a
<Last>. Also, {#PCDATA} allows parsed character data as
sub-element of <First> and <Last>.
Expanded Mode
The example above can be rewritten in expanded mode
as the following:
<Leato>
<Name>
<First/>
<Last/>
</Name>
<First>{#PCDATA}</First>!
<Last>{#PCDATA}</Last>!
</Leato>
It represents the same declaration as in the
pervious example.
The excitation mark, "!", specified that
preceding tag is not an sub-element of the element which embraces it
(see section Modifier for detail). So, in our example, element
<First> is not a valid element after <Name>. In
fact, Leato allows only one root element, which is <Name> in
the above example. Any element other than the first one will be
processed as if it is modified by "!".
"!" is a very convenient device in Leato.
Developer can now move declaration around to maximize readability.
It allows, for example, some depth nested elements to be moved out
and thus make the declaration easier to read.
At most, only one occurrence of elements having the
same name is non-empty in the declaration. In the actual fragment,
all empty elements having the same name will carries all
sub-elements and attribute from the non-empty one. The following
declaration is consider invalid:
<Leato>
<People>
<Employee>
<Name>
<First/><Middle/><Last/>
</Name>
</Employee>
<Customer>
<Name>
<First/><Last/>
</Name>
</Customer>
</People>
<First>{#PCDATA}</First>!
<Middle>{#PCDATA}</Middle>!
<Last>{#PCDATA}</Last>!
</Leato>
In fact, even if all the sub-elements of
<Name> are the same, the declaration is still invalid.
To correct the error, the declaration may rewrite as
the following:
<Leato>
<People>
<Employee><Name/></Employee>
<Customer><Name/></Customer>
</People>
<Name>
<First>{#PCDATA}</First>
<Last>{#PCDATA}</Last>
</Name>!
</Leato>
There is no constraint on which element of the same
name should contain the sub-element's declaration. The above example
can be rewritten as the following:
<Leato>
<People>
<Employee>
<Name>
<First>{#PCDATA}</First>
<Last>{#PCDATA}</Last>
</Name>
</Employee>
<Customer><Name/></Customer>
</People>
</Leato>
Though, it generally enhances readability if
developer move out sub-element which occur inside more than one
element.
Compact mode and expanded mode can be mixed in
justice of developer to maximize readability.
Unnamed Root Element
Leato allows the root element not be named. Unnamed
root element declarated as <_>. Note that unnamed root element
can have attribute declarations as usual root element. Consider the
following example:
<Leato>
<_>
<Name>
<First>{#PCDATA}</First>
<Last>{#PCDATA}</Last>
</Name>
</_>
</Leato>
The above declaration allows different element with
same sub-element. Both of the below fragments are valid for the
above declaration:
example 1/
<Employee>
<Name>
<First>Thomas</First>
<Last>Yip</Last>
</Name>
</Employee>
example 2/
<Customer>
<Name>
<First>Bob</First>
<Last>Smith</Last>
</Name>
</Customer>
Modifier
Beside !, modifiers ?, * and + can be used to
specify the occurrence of the element. It carries the usual meaning
as in regular expression.
"?" specifies that the preceding tag is
optional, that is, occur zero or one time.
"*" specified that the preceding tag
can occur zero or more time, and
"+" specified that the preceding tag
can occur one or more time.
And again,
"!" specifies that the preceding tag
wasn't an sub-element of the embraced element.
And, there are one more
modifier:
"~" specifies that the preceding tag
match any element, but having the same name.
Modifier is always placed after the modified
element, and after the end-element tag if the tag isn't singular.
For example,
<Leato>
<Employee>
<Name>
<First>{#PCDATA}</First>
<Middle>{#PCDATA}</Middle>?
<Last>{#PCDATA}</Last>
</Name>
<Remark/>*
</Employee>
</Leato>
The question mark, ?, specifies that the sub-element
<Middle> is optional in element <Name>. And, the
asterisk, *, specifies that there can be zero or more
<Remark>.
Sequence and Choice
There are two grouping patterns in Leato. Without
being overridden, all sub-elements inside an element is treat as
sequence. Sequence of <A/><B/><C/> matches an
<A> followed by a <B> followed by a <C>. The
default can be overridden with a pair square bracket, [ and ],
around sub-elements. For example,
<Leato>
<body>
[<p/><h1/><h2/><h3/>]
</body>
</Leato>
specifies that element <body> can have exactly
one of <p>, <h1>, <h2>, or <h3/>. Choice
work as "exclusive or".
If sub-elements are embraced by a pair of round
brackets, ( and ), it make the sub-element a sequence. Unlike
regular expression, round brackets does not denote to back reference
nor as memory.
Grouping pattern of sequence and choice can be
nested inside each other. Grouping pattern can be modified by
modifier ?, * and + and !. But, it must not enclose no element nor
elements or groups which all has modifier ! behind.
Grouping pattern must be opened and closed in the
same level. For example, the following is invalid:
<Leato>
<Element>
[<Subelement>
<Subsubelement>]
</Subelement>
</Element>
</Leato>
Deterministic vs. non-deterministic
Leato only support non-deterministic declaration of
sub-element occurrence. However, developer should aware that
according to XML specification, "XML processors built using
SGML systems may flag non-deterministic content models as
errors."
And, Leato allows element have itself as sub-element
or one of the sub-element.
Reference
Element can be declared to have the same set of
attribute and sub-element as other. The declaration is done by
reference tag. Reference tag is similar to regular tag, but have
curly brackets instead of triangular brackets. Reference will be the
only sub-element of the super element and it can not contains
sub-sub-element nor attribute. For example, we can rewrite our
employee and customer example to look like that:
<Leato>
<People>
<Employee>{Person/}</Employee>
<Customer>{Person/}</Customer>
</People>
<Person>
<Name>
<First>{#PCDATA}</First>
<Middle>{#PCDATA}</Middle>?
<Last>{#PCDATA}</Last>
</Name>
</Person>!
</Leato>
Element must not have any sub-element which is a
reference of itself.
Special element
All special elements are enclosed in a pair curly
brackets but, unlike reference, it start with a # sign or ~ sign.
Some special element will ignore the modifier. In the current
implementation, all special character shouldn't contains entity or
notation which will break character data into pieces. This
limitation may be removed in release version. All special element
with # sign can be together with other sub-element declaration. But,
special element with ~ sign can be the only sub-element of its
super-element.
Consider the special element {#PCDATA} in the
following example,
<Leato>
<OL>
(<Li/>{#PCDATA})*
</OL>
</Leato>
Notice that {#PCDATA} may use together with other
sub-element under <OL>.
{~ALL}
~ALL match one or more
element, both declared or undeclared, characters, or nothing. In
fact, if element with it special element, its sub-elements not
verified for validness, if any.
{#PCDATA}
The most famous special element is {#PCDATA}, which
stands for Parsed Characters Data. It has same meaning as PCDATA in
XML specification.
{()}
Syntax:
{('a' | 'b' | 'c' | 'd')}
{#Re}
PCDATA which follow match regular expression.
Syntax:
{#Re regular expression}
match character data which specified as value of
"exp".
{#EMPTY}
With {#EMPTY} as sub-element, it is like declaring
singular element or element with nothing between start and end tags.
But, it can be used to help developer to make sure that element is
not declared to have sub-element somewhere else.
{#ANY}
#ANY matches #PCDATA, one or more declared elements
or no element at all
{#ELEM}
#ELEM matches exactly one declared element
{#ELEM?}
#ELEMS matches one or more declared element
{#ELEM*}
#ELEM* matches none or one element
{#ELEM+}
#ELEM+ matches any number of element
{#Leato}
matches Leato declaration
{#Integer}
Syntax:
{#INTEGER '[0, 100)'}
[ ... ] -- inclusive
( ... ) -- exclusive
match Integer, as defined in Java. Range is
optional.
Also, range "[,200)" is allowed. And, it
will match any integer from Integer.MIN_VALUE to 200.
{#Long}
Syntax:
{#Long '[0, 100)'}
[ ... ] -- inclusive
( ... ) -- exclusive
match long integer, as defined in Java. Range is
optional.
{#Float}
Syntax:
{#Float '[0, 100)'}
{#Double}
Syntax:
{#Double '[0, 100)'}
Similar as #Integer. If min or max range is not
specified, it will be replaced by -Float.MAX_VALUE or
Float.MAX_VALUE respectively.
{#Anon}
annotation holder
Syntax:
{#anon 'CDATA'}
Have no effect on declaration.
Attribute declarations
Attribute declaration consists
4 parts. The first part is the attribute name. The second part is an
equal sign. And, the third part is type of attribute. And, the last
part is the default. Unsurprisingly, attribute declaration is place
after the declared element name within a pair of angular brackets.
Consider the following example:
Attribute type
CDATA
<Leato>
<AnElement attr="#CDATA
value"/>
</Leato>
The above declaration allows
the root element <AnElement> to have attribute named, "attr"
and the value type is CDATA. And, the default value is
"value".
()
Beside CDATA, attribute type
can be enumeration. Consider the following example:
<Leato>
<AnElement attr="(valueA|valueB|valueC)
#REQUIRED"/>
</Leato>
In the above example,
attribute "attr" can have either "valueA",
"valueB" or "valueC" as its value. And, the
attribute is required.
Re
Also, an attribute may defined
using regular expression. Consider the following example:
<Leato>
<AnElement email="Re
[._-]+(\@[._-]+) #IMPLIES"/>
</Leato>
Integer, Long,
Float and Double
<Leato>
<AnElement attr="Float (-1,1)
#FILL 0"/>
</Leato>
Numerical type can be declared
starting with keyword Integer, Long, Float and Double. The range is
optional.
#Fill is an type which not
exist in DTD. If the attr does not exist in the XML, Leato will add
the attribute with the specified value.
Validating XML with Leato
Leato can work with a SAX parser. It act as a middle
layer between non-validating parser and a SAX DocumentHandler. The
following is code fragment to get leato work in such configuration.
public void main( String[] args ) {
// set up validating rule and get an verifier
Tree leatoRule = Tree.createTree(new
FileReader(args[0]));
Leato l = new Leato( leatoRule );
Verifier v = l.getVerifier();
// SAX parser of you choice, syntax may vary with
different parsers
// set documentHandler and errorHandler to leato
verifier
Parser p = new Parser( false /* do not do
validating check */ );
p.setDocumentHandler( v );
p.setErrorHandler( v );
// set up the verifier
v.setDocumentHandler( xmlTarget );
v.setErrorHandler( errorTarget );
// start parsing
p.parse();
}
Possible extensions of Leato
Query
The following example combine XML XPath and Leato to
do a query. A list of <item> which satisfied with Leato
declaration will be return.
<Query>
<Select xpath="/descendant::olist/child::item ">
<item
code="#re \bca061">
<color>{(while|yellow)}</color>
<size>{#PCDATA}</size>
<price cnd="#Float (0,2000] #REQUIRED"/>
</item>
</Select>
</Query>
Validating an XML fragment in a Java Program
public class AnXMLApp extends HttpServlet {
private DFD oldApplet = new DFD(
"<Transaction>"
+
"<Param name="#CDATA #REQUIRED">{#PCDATA}</Param>+"
+
"</Transaction>" );
private DFD newApplet = new DFD(
"<Transaction>"
+
"<Sales>{#PCDATA}</Sales>" +
"<Date>{#PCDATA}</Date>" +
"<Customer id="{#PCDATA}"/>" +
"<Detail>{#PCDATA}</Detail>" +
"</Transaction>" );
public void doPost (HttpServletRequest request,
HttpServletResponse response)
throws ServletException, IOException
{
// set content type and other response header fields first
response.setContentType("text/html");
// get the communication channel with the requesting client
Tree input =
Tree.createTree(new InputStreamReader(request.getInputStream()) );
if (
newApplet.getVerifier().isValid( input ) ) {
// do the transcation
.....
} else if (
oldApplet.getVerifier().isValid( input ) ) {
// notify the client to reload the applet....
.....
} else {
// transaction invalid, ignore it
.....
}
}
}
Current Implementation of Leato validator
[...later........]
Add your own data type
Leato implements in Java. Special element can be
easily added using Java. For example, developer want to make a
special element called "{#Email}", which will allows
either string email address string or element <Email>. The Class DFD will be extended if user want
to add such special element.
class EmailNode extends
com.leafsoft.leato.NDFA {
public MorePowerfulDFD(Tree t ) {
super( t );
}
protected EmailNode createSpecial( String
special ) {
if ( special.startsWith(
"#Email") {
return new EmailNode( special );
}
return
super.createSpecial(special);
}
}
And, a new class extends EmailNode will be
needed,
public class EmailNode extends
AbstractDFDNode {
DNFA next;
int modifier;
public EmailNode( DFD root ) {
}
public void doEmpty( DFDNodeSet resultSet ) {
resultSet.add(this);
}
public void doElement( DFDNodeSet resultSet,
String name ) {
if ( name.equals(Email) )
{
resultSet.add(this);
}
}
public void doCharacters( DFDNodeSet resultSet,
char[] ch, int start, int length ) {
String email = new
String( ch, start, length );
if ( email.matches("[^@]@([^@\.].)+")
) {
resultSet.add( next );
}
}
public void doIgnorableWhitespace( DFDNodeSet
resultSet, String chars ) {
resultSet.add(this);
}
public void doProcessingInstruction( DFDNodeSet
resultSet, String target, String data ) {
resultSet.add(this);
}
public String getSymbol() {
return
"{#Email}";
}
public boolean isFinalState() {
return false;
}
void addNode( AbstractDFDNode toBeAdded ) {
}
void setModifier( int mod ) {
modifier = mod;
}
void close() {
}
void setNext( NDFA next ) {
this.next = next;
}
}
Please refer to API document for detail.
Appendix
EBNF grammar of Leato
EBNF Grammar for Document Fragment Declaration
(Leato)
[still
working...............]
Leato ::= Elem
Elems ::= ( '[' Elems ']' | '(' Elems ')' ) ( '?' | '*' | '+' | '#'
)?
Elems ::= Elem +
Elem ::= ( SpecElem | RefElem | '<' ElemName '/>' | '<'
ElemName Attrs '>' Elems '</' ElemName '>' )
SpecElem ::= '{' ( '#ANY' | '#Leato' | '#ELEM' | '#ELEMS' |'#CDATA'
| '#ALLSCOPE' | '#RE' reg ) '/'? }'
RefElem ::= '{' NMTOKEN '/'? '}'
ElemName ::= NMTOKEN
Attrs ::= Attr *
Attr ::= AttrName '=' '"' attype S Default '"'
Default ::= 'default' | 'implies'
AttrName ::= NMTOKEN
attype ::= 'CDATA' | 'ID' | 'IDREF' | 'IDREFS' | 'ENTITY' |
'ENTITIES' | 'NMTOKEN' | 'NMTOKENS' | Notation | enum | re
enum ::= 'NOTATION' enum
enum ::= '(' NMTOKEN ( | NMTOKEN )* ')'
re ::= 'RE' rexp
Leato in Leato
[Later.............]
XHTML in Leato
[Later.............]
|