Jutta Degener
Hypercyber...
yacc2html
yacc2html - convert a YACC grammar to HTML
A visitor of my C subtree asked me in email how I'd done the markup for the
yacc and
lex grammars.
Had I used a tool that I could lend him to annotate his
own grammars?
No, but it's a neat idea; so I wrote one.
Here's the output of yacc2html invoked on
its own grammar.
If you want to test it yourself, you're welcome to a
gzipped tarfile of the
betarelease.
The version was updated Jan 5th, 1999. Thanks to
everybody who reported bugs with the source and the site.
Synopsis
- yacc2html [ -b basename ] [ -c ] [ -D name [=url ] ]
[ -h title ] [ -H ]
- [ -i ] [ -l lexfile ]
[ -L lexsedfile ] [ -N nonterminalformat ]
[ -p ] [ -r ] [ -t ] [ -T terminalformat ] [ -u ]
[ -y yaccfile ]
[ -Y yaccsedfile ] [ filename ]
Description
yacc2html converts input specifications for yacc(1) (and
some others) into HTLM format. The physical appearance of
the grammar is kept through HTML <pre>
</pre> preformatted
quoting, but references to nonterminals are linked to their
definition, references to tokens are linked (by default) to
an outside lex file, references to token types (usually
enclosed in angle brackets) can be linked to their %union
definition, and the whole grammar acquires a HTML header and
footer.
Further commandline options allow to strip all C code from
the output, to link arbitrary tokens and types to arbitrary
URLs (uniform resource locators), and to write two input
files suitable for processing with sed(1) which can be used
to turn a lex specification into a HTML file and to add
links to the yacc hypertext to arbitrary other HTML documents.
The yacc specification is read from standard input, or from
a file given as a command line argument; the HTML result is
written to standard output.
Input Format
In addition to the input format expected by yacc, yacc2html
accepts:
-
Literal strings enclosed in double quotes (not just
single characters enclosed in single quotes, like yacc
does.)
-
Grammars that lack the first of yacc's three parts, as
well as the %% that separate it from the second, main
part (the rules.)
These two extensions allow to use yacc2html even on ``informal'' grammars, such as
list: foo? bar+ (baz|quux) ';'
even though yacc itself would not understand, or
misinterpret, them.
Nonterminals and Terminals
Symbols that appear in front of a colon (:) are considered
to be nonterminals. Nonterminals are tagged with an
<a name="nonterminal"> anchor.
The generated nonterminal names
are affected by the N option; see below. There can be multiple defining rules for a nonterminal; only the first is
anchored. If a nonterminal occurs outside of a rule that
defines it, it is tagged with an <a href="#nonterminal">
reference to the first defining rule. Nonterminals in rules
that define them are not tagged by default; this behavior is
toggled by the -r (recursivenonterminals) option.
Symbols that appear behind a yacc `%token'
declaration are
considered to be terminals. By default, terminals are
tagged with an reference
to their first defining rule in an external lexfile:
<a href="lexfile.html#terminal">.
The -t option makes yacc2html link terminals
to their %token
declaration rather than to an external file.
The -i option suppresses linking of terminals altogether.
Sed scripts
The scripts, appropriate for handing to sed with the
-f filename option, can be written using the
-L output
and
Y output options.
The first of these turns an input file for lex into HTML
with anchors suitable for
referencing from the converted yacc input file. The second
turns occurances of nonterminals in arbitrary input text
into references to the corresponding anchor in the converted
yacc input file.
Thus, given lex and yacc input files grammar.l and
grammar.y, the two steps for turning them into HTML files
are
% yacc2html -L script.sed grammar.y > grammar-y.html
% sed -f script.sed grammar.l > grammar-l.html
Options
- -H
- (help) Print a short online help message, and exit.
- -bbasename
-
Use basename to default the names of the yacc and lex
HTML files, rather than the name of the input file (or
"stdin", if yacc2html is used as a filter).
- -c
- (ccode suppression) Strip
%{ ... %} sections, { ... }
action statements, and trailing code after the second
%% from the HTML file. (The Clike comments within the
yacc code are still written; I consider them part of
the specification, not of the the C implementation.)
- -Dsymbol[=URL]
-
Without a second argument, pretend that symbol
is neither a token nor a nonterminal (but defined elsewhere);
create neither an anchor nor links for it.
When an argument is present, it specifies an URL that all occurances
of the nonterminal should be linked to. Regardless
of whether or not a second argument is present, no
defined symbol ever shows up in the sed files.
- -htitle
-
(header title)
Let the document title (as for the
<title>...</title> HTML element)
be title.
If this option is not present, html2yacc defaults to the input
file name, or *standard input* if input is read from
standard input.
- -i
- (ignore terminals) Create neither anchors nor links
to terminal tokens.
- -Llexsedfile
-
Write to lexsedfile a sed script (suitable for use
with sed's -ffilename option) to turn a lex file into
HTML code. The script adds a header and footer to the
text that is passed through it, quotes
&, <, >, and ",
and turns the first appearance of every word that
yacc2html recognized as a terminal into an anchor.
- -Nnonterminalnameformat
-
When generating a reference to, or a definition of, a
nonterminal, use nonterminalnameformat to derive the
local tag from the name of the nonterminal. In the
format string, the following sequences are recognized:
- %b
- the argument of a
-b option
if specified, or the input file name without a suffix;
- %l
- The argument of a -l option if specified, or
the defaulted HTML lex file name
("
%b-l.html");
- %s
- The name of the nonterminal;
- %y
- The argument of a -y option if specified, or
the defaulted HTML yacc file name ("%b-
y.html").
Thus, to make all nonterminal references refer to a
nonterminal NAME as "yaccname" rather than just
"name", use -Nyacc-%s.
- -r
- (recursive links) Link nonterminal references to nonterminal
definitions, even when the nonterminal occurs
in the defining rule. Normally, such `recursive'
nonterminals are left unlinked.
- -t
- (tokenterminals) Link terminals to their
%token
declaration, rather than to the lex file.
- -Tterminalurlformat
-
When generating a reference to a terminal symbol, use
terminalurlformat to derive the URL from the name of
the terminal, rather than the default. In the format
string, the following sequences are recognized:
- %b
- the argument of a
-b option
if specified, or
the input file name without a suffix;
- %l
- The argument of a
-l option if specified, or
the defaulted HTML lex file name ("%b-
l.html");
- %s
- The name of the terminal;
- %y
- The argument of a
-y option if specified, or
the defaulted HTML yacc file name ("%b-
y.html").
Thus, to make all token references refer to a token
NAME as "token-NAME" rather than just "NAME", use
-T%l#token-%s.
- -u
- (%union links) Link type references (outside of C code)
to the
%union definition at the start of the grammar,
if such a definition exists. Normally, token type
references remain unlinked.
- -Yyaccsedfile
-
Write to yaccsedfile a sed script (suitable for use
with sed's -ffilename option) to turn all occurrences of
words that yacc2html recognized as a nonterminals into
links to the nonterminals' definitions in the HTML
output file. The name of the HTML output file can be
specified explicitly using -yyaccfile or defaults
implicitly to
%b-y.html.
See also
lex(1), sed(1), yacc(1)
Bugs
The anchors in the lex file shouldn't be created at the
token name itself, but at the start of the paragraph that
contains it. The parser's error messages are not very
helpful. Please forward other bug reports to the author,
Jutta Degener <jutta@cs.tuberlin.de>. Thanks.