previous next contentsMl4 languageMl4Ml4 (Meta-Language of Depot4) is based on EBNF. In fact,
it is a true extension of one of its variants (introduced by N. Wirth).
A Ml4 program in this form does not exist, instead there is a set of
Ml4 productions, which can be translated independently of each other. Thus, Ml4
features production (resp. rule) based modularization. Translators are configured dynamically by selecting
one of the rules as root production. I.e., the nonterminal
on the left-hand side of the production is declared as start symbol of
the grammar. By this an applicable language processor is formed.
The settlement of the language root can always be changed dynamicly.
Together with the dynamic loading of the modules this enables the testing
of parts of the language processor before finishing the implementation of
all the productions.
The formal description of the EBNF in section 3.3. is
already a set of valid Ml4 productions,
which can be translated by the Depot4 metalanguage translator into executable
code. By choosing Rule as start symbol we get
an acceptor for EBNF productions.
A Ml4 production has the general structure:
identifier = sourceExpression -> targetExpression .
where the part starting with -> is called target production and may occur
repeatedly.Ml4's unique features.
All the structure operators of the EBNF are available both on the source and on the target side. There are
further extensions such as declarations, assign and call statements, etc.Ml4, which are not part of the basic EBNF (extensions) are separated among
themselves and from those basic elements by semicolons. (In fact, semicolons may be used within the EBNF
parts, too.)
(* and *) and may be nested.
DO or
do are very likely to clash and thus should be avoided.)Ml4 language:
ARR DCL END FLEX GLOBVAR IMPORTS INIT MODULE REC TYPE TYPEND USE VAR
':=' or 'BEGIN'.
If the character ' itself is needed in a literal, it has to be written
twice, e.g. '''Hallo!'''. \n newline choosen corresponding to the actual operating system \c carriage return \l line feed \f form feed \t horizontal tabulator \v vertical tabulator \B bell \b backspace \\ \ \0 Nullbyte
$i, where i is substituted by one of the digits 1...9 describing the
number of symbols at least needed. So '$3INTEGER' accepts the strings
INT, INTE,... INTEGER, but not
INTEGERS. If a literal starts with the character '$',
then it has to be written twice, e.g. '$$a$' accepts the string $a$.
To guarantee the separation of a literal from the succeeding text
the separating symbol $ can be used after the string.
For instance 'REAL' also accepts the beginning of the string
REALUM, but 'REAL' $ does not accept it. identifier = expression.
identifier is the name of a nonterminal.
The dot marks the end of the production. expression is the collection of all right-hand sides
of the productions with identifier on the left-hand side. B = A1 A2 ... An.
B = A1 | A2 | ... | An.
B = [ A ].
Due to an intersection with indexing in in the enhanced language (Ml4), an option following an
identifier must be separated by at least one space (or other deliminator).
Iteration may be directly represented (without using recursion) by curly braces. Iteration is useful
to express left association when left recursion is forbidden (as in Depot4).
B = { A }.
B = 'a' ('b'|'c') 'd'.
describes the language {'abd', 'acd'}.
Rule = ident '=' Expr '.'.
Expr = Term { '|' Term }.
Term = { Factor }.
Factor = string | ident |'('Expr')'|'['Expr']'|'{'Expr'}'.
Ml4 allows empty productions, i.e. empty = . is valid.
Ml4: primitive types, structured types, and
opaque types. The latter are of interest only in connection with the import feature and allow a simple
handling (declaration, parameter passing) of foreign data.
Ml4 production.
- INT - actually $3INTEGER
- Integer type is mapped on the respective type of the host language.
- REAL
- Floating point type, mapped on a real type, too. There is only a limited support for this type, e.g., no conversions are available.
- BOOL - actually $4BOOLEAN
- The boolean type, whose values are
TRUEandFALSE- SYM
- A type, whose values are symbols, i.e., possibly limited strings of characters. They can be, at least, concatenated and compared.
- TXT
- This is the basic target type. Values of this type can only be concatenated.
- RECORD - actually $3RECORD
- The syntax of a record definition follows that of Pascal/Modula (without variants).
- ARRAY - actually $3ARRAY
- An array is a constant sized vector of elements (which may be in turn of array type again). Only the number of elements is given, their counts start with zero.
- FLEX - also FLEX1, resp. FLEX2
- Flexible arrays (FLEXes) are suited to store information in connection with EBNF's iteration construct. They have no upper limit for the number of their elements. Accessing a non-existing element
f[i]will create it.
The index range of FLEX starts with one.
The use of this data type requires runtime management of the associated data structures and, thus, is expected to be in most host languages less efficient than ordinary arrays.
Flexible array may be of dimension one (FLEX/FLEX1) or two (FLEX2).
ARR 20 OF INT REC name, town: SYM; age: INT; gen: BOOLEAN END FLEX OF SYM FLEX2 OF RECORD F: FLEX OF INTEGER; AAR: ARRAY 10 OF ARRAY 5 OF REAL END
Ml4.
Because of efficiency reasons Ml4 allows to combine several productions into a module.
This is restricted to groups of nonterminals, where only one is called from
outside, but the remaining are needed only locally. The name of the
module has to be the name of the nonterminal called from outside.
Productions resp. modules are translated separately. There is no need for any used nonterminal (i.e.,
a nonterminal on the right-hand side of the rule) to be defined yet.
The nonterminal's identifier (i.e., the left-hand side of the rule) becomes the identifier of all the
generated entities (host language source file, object file, etc.). This means, if there are two
productions with the same left-hand side, translating one of them will possibly overwrite the
implementation of the other.
There exists just one global name space for all productions. Thus, it is useful to follow a naming
convention when defining new rules.
Depot4 supports prefixing, i.e., if an identifier contains a small letter or digit followed by a capital
letter, all the part before the first such capital is regarded as common prefix. (E.g. Dp4
is the prefix of Dp4ExAmPlE1.) This avoids name collisions and is also applied for
automatic structuring (into subsystems/packages) if the host systems offers this.
Ml4 production.
The basic structure of this part is given by EBNF.
id, ident
letter {letter|digit} See also Threadment of Keywords
str, string
integer
digit{digit} | digit{hexdigit}'H'
num
digit{digit}
number
digit{hexdigit}'H'
| digit{digit}['.'{digit}[('E'|'D')
['+'|'-']digit{digit}]]
filename
line
ident4root
any
[ident] 'END' as the closing end will be
accepted as an identifier. There are at least two ways to overcome this. First one can change the
grammar, e.g. into (ident 'END'|'END') which solves the problem.Depot4 has a more convenient solution now. One can write all these words that
are not identifiers into a file. As a default Depot4 looks in the current directory
for a file NoIdent.lst (can be changed in module Dp4Config) and
excludes all the words that it contains from being recognized as identifiers.NoIdents from module Dp4Stdlex. The argument is the filename string.
This call discards the previous list and installs a new one, which will be empty if no file was found. pushNoIdents(filenameString) saves the old settings in addition, while
popNoIdents() restores the saved status.
The syntax of an exclusion file is simple: just list the words, separated by spaces or newlines.
IMPORTS Dp4Stdlex;
lextst = Dp4Stdlex.NoIdents('PascalNoIdent.lst');
{ ident } 'END' Dp4Stdlex.pushNoIdents('CNoIdent.lst')
{ ident } 'end' Dp4Stdlex.popNoIdents(); { ident } 'UNTIL'
.
with file PascalNoIdent.lst containing at least END and file CNoIdent.lst
containing end will acceptalfa beta END END ELSE end end UNTIL
There are two possibilities to modify nonterminals in the description of the source:
Name:NT the nonterminal NT gets the new designation
Name. Renaming is usually used if a nonterminal occurs on
several positions in a production: Prod = F1:Fact [ Op F2:Fact]
-> F1_ [Op_ F2_].
But renaming can also be used in the reversed way.
It is possible to give different nonterminals in different branches of an
alternative the same name if they are to be treated equally: Stat = S:IfStat | S:AssStat | S:ForStat
-> S_.
NT[index] it is possible to provide nonterminals with indices.
This is usually used in connection with iterations:
DclSeq = { Dcl[i] }
-> { Dcl_[i] }.
Every nonterminal can get at most two indices.
To distinguish between the parentheses for indices and for options the following has to be
obeyed: There must not be a space, newline or comment between the nonterminal and the
opening index parenthesis. In contrast there has to be a delimiter between a nonterminal
and an opening option parenthesis.
Seq = { D:ConstDef[i] | D:TypeDef[i] }
-> { D_[i] }.
< and >. In
this way class terminals can easily be implemented, too.
Integer = digit < { digit } >.
By the enclosure in < ... > delimiters inside the number are prohibited.
An exception is the first digit, so that delimiters in front of the number can be ignored.<digit [digit>] will not work correctly if only
one digit was accepted.
Ml4 aims at the goal of translation descriptions which are highly
independent from the system's actual host language it does not take a purist's view and
offers an interface to those basic system features. The interface is defined by procedures
(or routines or methods) encapsulated in an unity called module, e.g.
a class in Java or
an Ada package. Calls to such procedures may be embedded in the source text of the parsing part.
The import of modules is described in 3.14.1. Ml4 code position.
Intrinsic procedures are described in 3.7.2, independently if they are proper
procedures (i.e. have no return value) or not.
3.6.5 Assignments
Any variable can be assigned to a value of its type. There are some automatic conversions into type
SYM. Be aware that the translator does not know anything about the type of imported entities. Thus
it cannot insert any conversion or check compatibility.
The result of an assignment is not reverted during back-tracking.
3.7. Expressions
Expressions can be build similar to the rules of Pascal, i.e., with three levels of priority. Unary
operators (sign, NOT) are of the highest level.
3.7.1 Operators
+, -, OR*, DIV, MOD, &=, #, <=, >=, <, >not equal
ABS(IntegerExpression):IntegerValue
ABS(RealExpression):RealValue
IntegerExpression MOD 2 = 0
ODD(IntegerExpression):BoolValue
Len(SymExpression):IntValue
Len(TxtExpression):IntValue
Str2Int(SymExpression):IntValue
Str2Int(TxtExpression):IntValue
Int2Str(IntExpression):SymValue
LogPos and Dp4OP.ERROR, thus avoiding the need to import
Dp4OP for this reason only
Ml4 production and, thus,
must not be explicitly redeclared. They serve as default control variables (see there), but can
- with some care - be applied elsewhere, too.Integer: N, O, i, c
Variables with special function (all of type SYM):
curChr: current character to parse (-) see
nxtChr: the character following curChr (if exists)
(-) see
Ml4Date: the current date in a default format (as defined in procedure
Date see)
previous next contents