previous         next         contents

Copy comments from source to target

Keeping comments is a much more demanding task than one may think on first sight.
There are questions that cannot be solved on a syntactic level in principle, such as if a certain comment belongs to its predecessor or successor. One may, e.g., use to write "END(*LOOP*);" while another likes "END;(*LOOP*)".
Second, comments are, like spaces, skipped usually before parsing. Even if they are saved elsewhere the problem, which syntactic entity they should be attached to, remains.
Comments, in general, are much closer related to natural languages than any other feature of formal languages. An optimal handling would require to understand them, i.e. would require a semantic analysis that is much behind the scope of a syntax-controlled translator. Thus, there is, obviously, no fully satisfying solution.

However, with Depot4 there are two ways to tackle this problem:

  1. Fully parse comments
    This will probably produce the best results but it is also by fare the most expensive approach. It means, comments will no longer be skipped rather they are real elements of the language, .i.e. in fact, your language has no longer comments in the ordinary sense.
    Instead one has to define a certain production, e.g.
     PascalComment=
       ('(*'c:= 2; { !c#1;('*)'|any)}
       |'{' c:= 2; {!c#1;('}'|any)})
     -> SOURCETEXT.
    and (one comment may be followed by another)
     Comment= {PascalComment[i]} ->  {PascalComment_[i]}
    To suppress the default comment definition call DefCom('$-$', '');.
    Finally, calls to Comment must be inserted (and handled!) in the grammar at every point where a comment is allowed. By this, the size of the grammar will easily be doubled - at least.

  2. Collect them during skipping
    This is a more convenient way but it has some drawbacks too. It is based on the possibility to define a production to handle the text inside a certain comment format. This is used to store the skipped text in a global variable. At different points in the grammar the saved comments can be handled (inserted) and the global be cleared.
    The advantage is that the trade-off between the expense to handle comments close to their origin and the resulting blow-up of the grammar can be defined individually. The more often the global is inspected the closer the results will come to the first approach. (Of course, the size of the grammar will also come closer to that.)
    The main drawback is due to the parsing strategy that tries branches sequentially. In some situations, comments will be skipped several times and thus, be saved several times too. To a certain degree, this can be circumvented by insertion of calls to Skip(); in front of alternatives, options and before the end of iterations. (Note, that cannot be done in general because it prevents the correct operation of skipping suppression <...>.)

    The following example code illustrates this approach. You can experience the mentioned problem by deleting the Skip(); in the last but one line.

    ExampleKeepCom=
      GLOBVAR USE coms: FLEX OF TXT;  USE nrOfComs: INT;
      DefCom('[*', '*] ');                   (* accept nested comments but do not
          call ExampleKeepCom recursively *)
     {ExampleKeepC1; INC(nrOfComs); coms[nrOfComs]:= ExampleKeepC1_; }
     DefCom('$1-:ExampleKeepCom$[*', '*] ');  (* re-activate ExampleKeepCom *).
    
    ExampleKeepC1= {('*]' | any)!c#1; } !N>0;
    -> '(* ' SOURCETEXT ' *)\n'.
    
    ExampleKeepInsert=  (* call this whenever you want to insert cumulated comments *)
      GLOBVAR USE coms: FLEX OF TXT;  USE nrOfComs: INT;
      VAR nr: INT;
      nr:= nrOfComs; nrOfComs:= 0;
    ->
      {/..nr/ coms[i] }.
    (*-------------------------------- DEMO ---------------------------------------*)
    ExampleKeepComDemo= (* --- root of the demo --- *)
        GLOBVAR DCL coms: FLEX OF TXT;  DCL nrOfComs: INT;
        nrOfComs:= 0;
        DefCom('$1-:ExampleKeepCom$[*', '*] ');
        '(*' { ExampleKeepComDemoElems[i] ExampleKeepInsert[i]}'*)'
    -> { ExampleKeepComDemoElems_[i] ExampleKeepInsert_[i]}.
    
    ExampleKeepComDemoElems=
      Skip();  (* this is important to avoid multiplied comment texts - try it *)
      (//e:id | e:str | e:num) -> e_ ':'.
Combined solution
If, perhaps due to some conventions, comments are frequently used at fixed points of the grammar (e.g. the end of a construct is qualified as "END (*IF*)" or "} /*while*/"), a mixed approach may be useful:

In general, comments are handled by the second method, but at these fixed points the first one is applied.
Example:

 statementWithComment=  statement <{' '} Comment>
 -> statement_ '--' Comment_ '\n'.
Insert a call to statementWithComment wherever you expect a comment of this type.
(Note: This will append the Ada-like comment regardless whether there is a comment in the source or not. This can be avoided if the new comment delimiters are added already within Comment.)

Back to solution list


Copy and process comments

Keeping comments may cause additional problems if source and target language embody different commenting principles, e.g. free format comments vs. line bound comments.
Then comment texts need not only be saved but also be processed. With Ml4 this can be achieved by a nested layer of processing.
Example:
IMPORTS Dp4StrBuf, Dp4Streams;
KeepCom=
  GLOBVAR USE coms: FLEX OF TXT;  USE nrOfComs: INT;
  VAR bs: Dp4Streams.Stream;
  DefCom('(*', '*) ');               (* as above *)
 KeepC1;
   bs:= Dp4StrBuf.New('tmpstr');     (* make input stream  *)
   Dp4Streams.Tar2Strm(KeepC1_, bs); (* from accepted      *)
   From(Dp4Streams.StrmSrc(bs));     (* comment text       *)
   KeepC2 Back();                    (* and process it     *)
   INC(nrOfComs); coms[nrOfComs]:= KeepC2_;
 DefCom('$1-:KeepCom$(*', '*) ');    (* re-activate KeepCom *).

KeepC1= {('*)' | any)!c#1; } !N>0; 
-> SOURCETEXT .

KeepC2=  {line[i]}                (* prefix each line *)
-> '\n' {'//' line_[i] }.

Remarks:

Back to solution list


    previous         next         contents


© J. Lampe 1998-2002   juergen_lampe@firemail.de               (21-Mar-2002)