Creating and editing SGML/XML documents is usually only half the battle. After you've composed your document, you'll want to publish it. Publishing, for our purposes, means either print or web publishing. For SGML and XML documents, this is usually accomplished with some kind of stylesheet. In the (not too distant) future, you may be able to publish an XML document on the Web by simply putting it online with a stylesheet, but for now you'll probably have to translate your document into HTML.
There are many ways, using both free and commercial tools, to publish SGML documents. In this chapter, we're going to survey a number of possibilities, and then look at just one solution in detail: Jade and the Modular DocBook Stylesheets. We used jade to produce this book and to produce the online versions on the CD-ROM; it is also being deployed in other projects such as <SGML>&tools;, which originated with the Linux Documentation Project.
For a brief survey of other tools, see Appendix D, Resources,.
Over the years, a number of attempts have been made to produce a standard stylesheet language and, failing that, a large number of proprietary languages have been developed.
First, the U.S. Department of Defense, in an attempt to standardize stylesheets across military branches, created the Output Specification, which is defined in MIL-PRF-28001C, Markup Requirements and Generic Style Specification for Electronic Printed Output and Exchange of Text.[1]
Commonly called FOSIs (for Formatting Output Specification Instances), they are supported by a few products including ADEPT Publisher by Arbortext and DL Composer by Datalogics.
Next, the International Organization for Standardization (ISO) created DSSSL, the Document Style Semantics and Specification Language. Subsets of DSSSL are supported by Jade and a few other tools, but it never achieved widespread support.
The W3C CSS Working Group created CSS as a style attachment language for HTML, and, more recently, XML.
Most recently, the XML effort has identified a standard Extensible Style Language (XSL) as a requirement. The W3C XSL Working Group is currently pursuing that effort.
By way of comparison, here's an example of each of the standard style languages. In each case, the stylesheet fragment shown contains the rules that reasonably formatted the following paragraph:
<para> This is an example paragraph. It should be presented in a reasonable body font. <emphasis>Emphasized</emphasis> words should be printed in italics. A single level of <emphasis>Nested <emphasis>emphasis</emphasis> should also be supported.</emphasis> </para>
FOSIs are SGML documents. The element in the FOSI that controls the presentation of specific elements is the e-i-c (element in context) element. A sample FOSI fragment is shown in Example 4-1.
Example 4-1. A Fragment of a FOSI Stylesheet
<e-i-c gi="para"> <charlist> <textbrk startln="1" endln="1"> </charlist> </e-i-c> <e-i-c gi="emphasis"> <charlist inherit="1"> <font posture="italic"> </charlist> </e-i-c> <e-i-c gi="emphasis" context="emphasis"> <charlist inherit="1"> <font posture="upright"> </charlist> </e-i-c>
DSSSL stylesheets are written in a Scheme-like language (see "Scheme" later in this chapter). It is the element function that controls the presentation of individual elements. See the example in Example 4-2.
CSS stylesheets consist of selectors and formatting properties, as shown in Example 4-3.
XSL stylesheets are XML documents, as shown in Example 4-4. The element in the XSL stylesheet that controls the presentation of specific elements is the xsl:template element.
Example 4-4. A Fragment of an XSL Stylesheet
<?xml version='1.0'?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"> <xsl:template match="para"> <fo:block> <xsl:apply-templates/> </fo:block> </xsl:template> <xsl:template match="emphasis"> <fo:sequence font-style="italic"> <xsl:apply-templates/> </fo:sequence> </xsl:template> <xsl:template match="emphasis/emphasis"> <fo:sequence font-style="upright"> <xsl:apply-templates/> </fo:sequence> </xsl:template> </xsl:stylesheet>
Jade is a free tool that applies DSSSL stylesheets to SGML and XML documents. As distributed, Jade can output RTF, TeX, MIF, and SGML. The SGML backend can be used for SGML to SGML transformations (for example, DocBook to HTML).
A complete set of DSSSL stylesheets for creating print and HTML output from DocBook is included on the CD-ROM. More information about obtaining and installing Jade appears in Appendix A, Installation,. >
DSSSL is a stylesheet language for both print and online rendering. The acronym stands for Document Style Semantics and Specification Language. It is defined by ISO/IEC 10179:1996. For more general information about DSSSL, see the DSSSL Page.
The DSSSL expression language is Scheme, a variant of Lisp. Lisp is a functional programming language with a remarkably regular syntax. Every expression looks like this:
(operator [arg1] [arg2] ... [argn] )This is called "prefix" syntax because the operator comes before its arguments.
In Scheme, the expression that subtracts 2 from 3, is (- 3 2). And (+ (- 3 2) (* 2 4)) is 9. While the prefix syntax and the parentheses may take a bit of getting used to, Scheme is not hard to learn, in part because there are no exceptions to the syntax.
A complete DSSSL stylesheet is shown in Example 4-5. After only a brief examination of the stylesheet, you'll probably begin to have a feel for how it works. For each element in the document, there is an element rule that describes how you should format that element. The goal of the rest of this chapter is to make it possible for you to read, understand, and even write stylesheets at this level of complexity.
Example 4-5. A Complete DSSSL Stylesheet
<!DOCTYPE style-sheet PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN"> <style-sheet> <style-specification> <style-specification-body> (element chapter (make simple-page-sequence top-margin: 1in bottom-margin: 1in left-margin: 1in right-margin: 1in font-size: 12pt line-spacing: 14pt min-leading: 0pt (process-children))) (element title (make paragraph font-weight: 'bold font-size: 18pt (process-children))) (element para (make paragraph space-before: 8pt (process-children))) (element emphasis (if (equal? (attribute-string "role") "strong") (make sequence font-weight: 'bold (process-children)) (make sequence font-posture: 'italic (process-children)))) (element (emphasis emphasis) (make sequence font-posture: 'upright (process-children))) (define (super-sub-script plus-or-minus #!optional (sosofo (process-children))) (make sequence font-size: (* (inherited-font-size) 0.8) position-point-shift: (plus-or-minus (* (inherited-font-size) 0.4)) sosofo)) (element superscript (super-sub-script +)) (element subscript (super-sub-script -)) </style-specification-body> </style-specification> </style-sheet>
This stylesheet is capable of formatting simple DocBook documents like the one shown in Example 4-6.
Example 4-6. A Simple DocBook Document
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook V3.1//EN"> <chapter><title>Test Chapter</title> <para> This is a paragraph in the test chapter. It is unremarkable in every regard. This is a paragraph in the test chapter. It is unremarkable in every regard. This is a paragraph in the test chapter. It is unremarkable in every regard. </para> <para> <emphasis role=strong>This</emphasis> paragraph contains <emphasis>some <emphasis>emphasized</emphasis> text</emphasis> and a <superscript>super</superscript>script and a <subscript>sub</subscript>script. </para> <para> This is a paragraph in the test chapter. It is unremarkable in every regard. This is a paragraph in the test chapter. It is unremarkable in every regard. This is a paragraph in the test chapter. It is unremarkable in every regard. </para> </chapter>
The result of formatting a simple document with this stylesheet can be seen in Figure 4-1.
We'll take a closer look at this stylesheet after you've learned a little more DSSSL.
One of the first things that may strike you about DSSSL stylesheets (aside from all the parentheses), is the fact that the stylesheet itself is an SGML document! This means that you have all the power of SGML documents at your disposal in DSSSL stylesheets. In particular, you can use entities and marked sections to build a modular stylesheet.
In fact, DSSSL stylesheets are defined so that they correspond to a particular architecture. This means that you can change the DTD used by stylesheets within the bounds of the architecture. A complete discussion of document architectures is beyond the scope of this book, but we'll show you one way to take advantage of them in your DSSSL stylesheets in Section 4.6" later in the chapter.
A DSSSL processor builds a tree out of the source document. Each element in the source document becomes a node in the tree (processing instructions and other constructs become nodes as well). Processing the source tree begins with the root rule and continues until there are no more nodes to process.
There aren't any global variables or side effects. It can be difficult to come to grips with this, especially if you're just starting out.
It is possible to define constants and functions and to create local variables with let expressions, but you can't create any global variables or change anything after you've defined it.
DSSSL has a rich vocabulary of expressions for dealing with all of the intricacies of formatting. Many, but by no means all of them, are supported by Jade. In this introduction, we'll cover only a few of the most common.
Element expressions, which define the rules for formatting particular elements, make up the bulk of most DSSSL stylesheets. A simple element rule can be seen in Example 4-7. This rule says that a para element should be formatted by making a paragraph (see Section 4.3.6.2").
Example 4-7. A Simple DSSSL Rule
(element para (make paragraph space-before: 8pt (process-children)))
An element expression can be made more specific by specifying an element and its ancestors instead of just specifying an element. The rule (element title ...) applies to all Title elements, but a rule that begins (element (figure title) ...) applies only to Title elements that are immediate children of Figure elements.
If several rules apply, the most specific rule is used.
When a rule is used, the node in the source tree that was matched becomes the "current node" while that element expression is being processed.
A make expression specifies the characteristics of a "flow object." Flow objects are abstract representations of content (paragraphs, rules, tables, and so on). The expression:
(make paragraph font-size: 12pt line-spacing: 14pt ...)specifies that the content that goes "here" is to be placed into a paragraph flow object with a font-size of 12pt and a line-spacing of 14pt (all of the unspecified characteristics of the flow object are defaulted in the appropriate way).
They're called flow objects because DSSSL, in its full generality, allows you to specify the characteristics of a sequence of flow objects and a set of areas on the physical page where you can place content. The content of the flow objects is then "poured on to" (or flows in to) the areas on the page(s).
In most cases, it's sufficient to think of the make expressions as constructing the flow objects, but they really only specify the characteristics of the flow objects. This detail is apparent in one of the most common and initially confusing pieces of DSSSL jargon: the sosofo. Sosofo stands for a "specification of a sequence of flow objects." All this means is that processing a document may result in a nested set of make expressions (in other words, the paragraph may contain a table that contains rows that contain cells that contain paragraphs, and so on).
The general form of a make expression is:
(make flow-object-name keyword1: value1 keyword2: value2 ... keywordn: valuen (content-expression))
Keyword arguments specify the characteristics of the flow object. The specific characteristics you use depends on the flow object. The content-expression can vary; it is usually another make expression or one of the processing expressions.
Some common flow objects in the print stylesheet are:
Contains a sequence of pages. The keyword arguments of this flow object let you specify margins, headers and footers, and other page-related characteristics. Print stylesheets should always produce one or more simple-page-sequence flow objects.
Nesting simple-page-sequence does not work. Characteristics on the inner sequences are ignored.
A paragraph is used for any block of text. This may include not only paragraphs in the source document, but also titles, the terms in a definition list, glossary entries, and so on. Paragraphs in DSSSL can be nested.
A sequence is a wrapper. It is most frequently used to change inherited characteristics (like font style) of a set of flow objects without introducing other semantics (such as line breaks).
A score flow object creates underlining, strike-throughs, or overlining.
A table flow object creates a table of rows and cells.
The HTML stylesheet uses the SGML backend, which has a different selection of flow objects.
Creates an element. The content of this make expression will appear between the start and end tags. The expression:
(make element gi: "H1" (literal "Title"))
produces <H1>Title</H1>.
Creates an empty element that may not have content. The expression:
(make empty-element gi: "BR" attributes: '(("CLEAR" "ALL")))
produces <BR CLEAR="ALL">.
Produces no output in of itself as a wrapper, but is still required in DSSSL contexts in which you want to output several flow objects but only one object top-level object may be returned.
Inserts an entity reference. The expression:
(make entity-ref name: "nbsp")
produces .
In both stylesheets, a completely empty flow object is constructed with (empty-sosofo).
Extracting parts of the source document can be accomplished with these functions:
Returns all of the character data from nd as a string.
Returns the value of the attr attribute of nd.
Returns the value of the attr attribute of nd. If that attribute is not specified on nd, it searches up the hierarchy for the first ancestor element that does set the attribute, and returns its value.
A common requirement of formatting is the ability to reorder content. In order to do this, you must be able to select other elements in the tree for processing. DSSSL provides a number of functions that select other elements. These functions all return a list of nodes.
Returns the current node.
Returns the children of nd.
Returns the descendants of nd (the children of nd and all their children's children, and so on).
Returns the parent of nd.
Returns the first ancestor of nd named name.
Returns the element in the document with the ID id, if such an element exists.
Returns all of the elements of the node-list that have the name name. For example, (select-elements (descendants (current-node)) "para") returns a list of all the paragraphs that are descendants of the current node.
Returns a node list that contains no nodes.
Other functions allow you to manipulate node lists.
Returns true if (and only if) nl is an empty node list.
Returns the number of nodes in nl.
Returns a node list that consists of the single node that is the first node in nl.
Returns a node list that contains all of the nodes in nl except the first node.
There are many other expressions for manipulating nodes and node lists.
Processing expressions control which elements in the document will be processed and in what order. Processing an element is performed by finding a matching element rule and using that rule.
Processes all of the children of the current node. In most cases, if no process expression is given, processing the children is the default behavior.
Processes each of the elements in nl.
You can declare your own functions and constants in DSSSL. The general form of a function declaration is:
(define (function args) function-body)A constant declaration is:
(define constant constant-function-body)
The distinction between constants and functions is that the body of a constant is evaluated when the definition occurs, while functions are evaluated when they are used.
In DSSSL, the constant #t represents true and #f false. There are several ways to test conditions and take action in DSSSL.
The form of an if expression is:
(if condition true-expression false-expression)
If the condition is true, the true-expression is evaluated, otherwise the false-expression is evaluated. You must always provide an expression to be evaulated when the condition is not met. If you want to produce nothing, use (empty-sosofo).
case selects from among several alternatives:
(case expression ((constant1) (expression1)) ((constant2) (expression2)) ((constant3) (expression3)) (else else-expression))
The value of the expression is compared against each of the constants in turn and the expression associated with the first matching constant is evaulated.
cond also selects from among several alternatives, but the selection is performed by evaluating each expression:
(cond ((condition1) (expression1)) ((condition2) (expression2)) ((condition3) (expression3)) (else else-expression))
The value of each conditional is calculated in turn. The expression associated with the first condition that is true is evaluated.
Any expression that returns #f is false; all other expressions are true. This can be somewhat counterintuitive. In many programming languages, it's common to assume that "empty" things are false (0 is false, a null pointer is false, an empty set is false, for example.) In DSSSL, this isn't the case; note, for example, that an empty node list is not #f and is therefore true. To avoid these difficulties, always use functions that return true or false in conditionals. To test for an empty node list, use (node-list-empty?).
The way to create local variables in DSSSL is with (let). The general form of a let expression is:
(let ((var1 expression1) (var2 expression2) ... (varn expressionn)) let-body)
In a let; expression, all of the variables are defined "simultaneously." The expression that defines var2 cannot contain any references to any other variables defined in the same let expression. A let* expression allows variables to refer to each other, but runs slightly slower.
Variables are available only within the let-body. A common use of let is within a define expression:
(define (cals-rule-default nd) (let* ((table (ancestor "table" nd)) (frame (if (attribute-string "frame" table) (attribute-string "frame" table) "all"))) (equal? frame "all")))
This function creates two local variables table and frame. let returns the value of the last expression in the body, so this function returns true if the frame attribute on the table is all or if no frame attribute is present.
DSSSL doesn't have any construct that resembles the "for loop" that occurs in most imperative languages like C and Java. Instead, DSSSL employs a common trick in functional languages for implementing a loop: tail recursion.
Loops in DSSSL use a special form of let. This loop counts from 1 to 10:
(let loopvar ((count 1)) (if (> count 10) #t (loopvar (+ count 1))))
Example 4-5 is a style sheet that contains a style specification. Stylesheets may consist of multiple specifications, as we'll see in Section 4.4.3."
The actual DSSSL code goes in the style specification body, within the style specification. Each construction rule processes different elements from the source document.
Chapters are processed by the chapter construction rule. Each Chapter is formatted as a simple-page-sequence. Every print stylesheet should format a document as one or more simple page sequences. Characteristics on the simple page sequence can specify headers and footers as well as margins and other page parameters.
One important note about simple page sequences: they cannot nest. This means that you cannot blindly process divisions (Parts, Reference) and the elements they contain (Chapters, RefEntrys) as simple page sequences. This sometimes involves a little creativity.
The make expression in the title element rule ensures that Titles are formatted in large, bold print.
This construction rule applies equally to Chapter titles, Figure titles, and Book titles. It's unlikely that you'd want all of these titles to be presented in the same way, so a more robust stylesheet would have to arrange the processing of titles with more context. This might be achieved in the way that nested Emphasis elements are handled in Section 4.3.7.4".
Para elements are simply formatted as paragraphs.
Processing Emphasis elements is made a little more interesting because we want to consider an attribute value and the possibility that Emphasis elements can be nested.
In the simple case, in which we're processing an Emphasis element that is not nested, we begin by testing the value of the role attribute. If the content of that attribute is the string strong, it is formatted in bold; otherwise, it is formatted in italic.
The nested case is handled by the (emphasis emphasis) rule. This rule simply formats the content using an upright (nonitalic) font. This rule, like the rule for Titles, is not robust. Emphasis nested inside strong Emphasis won't be distinguished, for example, and nestings more than two elements deep will be handled just as nestings that are two deep.
Processing Subscript and Superscript elements is really handled by the super-sub-script function. There are several interesting things about this function:
You might ordinarily think of passing a keyword or boolean argument to the super-sub-script function to indicate whether subscripts or superscripts are desired. But with Scheme, it's possible to pass the actual function as an argument!
Note that in the element construction rules for Superscript and Subscript, we pass the actual functions + and -. In the body of super-sub-script, we use the plus-or-minus argument as a function name (it appears immediately after an open parenthesis).
optional arguments are indicated by #!optional in the function declaration. Any number of optional arguments may be given, but each must specify a default value. This is accomplished by listing each argument and default value (an expression) as a pair.
In super-sub-script, the optional argument sosofo is initialized to process-children. This means that at the point where the function is called, process-children is evaluated and the resulting sosofo is passed to the function.
It is possible to use the "current" value of an inherited characteristic to calculate a new value. Using this technique, superscripts and subscripts will be presented at 80 percent of the current font size.
The best way to customize the stylesheets is to write your own "driver" file; this is a stylesheet that contains your local modifications and then includes the appropriate stylesheet from the standard distribution by reference. This allows you to make local changes and extensions without modifying the distributed files, which makes upgrading to the next release much simpler.
A basic driver file looks like this:
<!DOCTYPE style-sheet PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN" [ <!ENTITY dbstyle PUBLIC "-//Norman Walsh//DOCUMENT DocBook Print Stylesheet//EN" CDATA DSSSL> ]> <style-sheet> <style-specification use="docbook"> <style-specification-body> ;; your changes go here... </style-specification-body> </style-specification> <external-specification id="docbook" document="dbstyle"> </style-sheet>
There are two public identifiers associated with the Modular DocBook Stylesheets:
-//Norman Walsh//DOCUMENT DocBook Print Stylesheet//EN
-//Norman Walsh//DOCUMENT DocBook HTML Stylesheet//EN
You can add your own definitions, or redefinitions, of stylesheet rules and parameters so that
;; your changes go here...occurs in the previous example.
For a concrete example of a driver file, see plain.dsl in the docbook/print directory in the stylesheet distribution (or on the CD-ROM). This is a customization of the print stylesheet, which turns off title page and TOC generation.
As distributed, the stylesheets use English for all generated text, but other localization files are also provided. At the time of this writing, the stylesheets support Dutch, French, German, Italian, Norwegian, Polish, Portuguese, Russian, and Spanish. (If you can write a localization for another language, please contribute it.)
There are two ways to switch languages: by specifying a lang attribute, or by changing the default language in a customization.
One of the DocBook common attributes is lang. If you specify a language, the DocBook stylesheets will use that language (and all its descendants, if no other language is specified) for generated text within that element.
Table 4-1 summarizes the language codes for the supported languages.[2] The following chapter uses text generated in French:
<chapter lang="fr"><title>Bêtises</title> <para>Pierre qui roule n'amasse pas mousse.</para> </chapter>
If no lang attribute is specified, the default language is used. You can change the default language with a driver.
In the driver, define the default language. Table 4-1 summarizes the language codes for the supported languages. The following driver makes German the default language:
<!DOCTYPE style-sheet PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN" [ <!ENTITY dbstyle PUBLIC "-//Norman Walsh//DOCUMENT DocBook Print Stylesheet//EN" CDATA DSSSL> ]> <style-sheet> <style-specification use="docbook"> <style-specification-body> (define %default-language% "dege") </style-specification-body> </style-specification> <external-specification id="docbook" document="dbstyle"> </style-sheet>
There are two other settings that can be changed only in a driver. Both of these settings are turned off in the distributed stylesheet:
If a language code is specified in %gentext-language%, then that language will be used for all generated text, regardless of any lang attribute settings in the document.
If turned on (defined as #t), then the stylesheets will generate the text associated with a cross reference using the language of the target, not the current language. Consider the following book:
<book><title>A Test Book</title> <preface> <para>There are three chapters in this book: <xref linkend=c1>, <xref linkend=c2>, and <xref linkend=c3>. </para> </preface> <chapter lang=usen><title>English</title> ... </chapter> <chapter lang=fr><title>French</title> ... </chapter> <chapter lang=dege><title>Deutsch</title> ... </chapter> </book>
The standard stylesheets render the Preface as something like this:
There are three chapters in this book: Chapter 1, Chapter 2, and Chapter 3.
With %gentext-use-xref-language% turned on, it would render like this:
There are are three chapters in this book: Chapter 1, Chapitre 2, and Kapitel 3.
A DSSSL stylesheet consists of one or more "style specifications." Using more than one style specification allows you to build a single stylesheet file that can format with either the print or SGML backends. Example 4-8 shows a stylesheet with two style specifications.
Example 4-8. both.dsl: A Stylesheet with Two Style Specifications
<!DOCTYPE style-sheet PUBLIC "-//James Clark//DTD DSSSL Style Sheet//EN" [ <!ENTITY html-ss PUBLIC "-//Norman Walsh//DOCUMENT DocBook HTML Stylesheet//EN" CDATA dsssl> <!ENTITY print-ss PUBLIC "-//Norman Walsh//DOCUMENT DocBook Print Stylesheet//EN" CDATA dsssl> ]> <style-sheet> <style-specification id="print" use="print-stylesheet"> <style-specification-body> ;; customize the print stylesheet </style-specification-body> </style-specification> <style-specification id="html" use="html-stylesheet"> <style-specification-body> ;; customize the html stylesheet </style-specification-body> </style-specification> <external-specification id="print-stylesheet" document="print-ss"> <external-specification id="html-stylesheet" document="html-ss"> </style-sheet>
Once you have stylesheets with more than one style specification, you have to be able to indicate which style specification you want to use. In Jade, you indicate this by providing the ID of the style specification after the stylesheet filename, separated with a hash mark: #.
Using the code from Example 4-8, you can format a document using the print stylesheet by running:
jade -t rtf -d both.dsl#print file.sgm
and using the HTML stylesheet by running:
jade -t sgml -d both.dsl#html file.sgm
The DocBook SGML DTD and the DocBook DSSSL Stylesheets happen to use the same SGML declaration. This makes it very easy to run Jade with DocBook. However, you may sometimes wish to use Jade with other document types, for example the DocBook XML DTD, which has a different declaration. There are a couple of ways to do this.
If your stylesheets parse fine with the default declaration, but you want to use an alternate declaration with a particular document, just pass the declaration on the command line:
jade options the-declaration the-documentNote that there's no option required before the declaration; it simply occurs before the first filename. Jade concatenates all of the files that you give it together, and parses them as if they were one document.
The other way to fix this is with a little catalog trickery.
First, note that Jade always looks in the file called catalog in the same directory as the document that it is loading, and uses settings in that file in preference to settings in other catalogs.
With this fact, we can employ the following trick:
Put a catalog file in the directory that contains your stylesheets, which contain an SGMLDECL directive. Jade understands the directive, which points to the SGML declaration that you should use when parsing the stylesheets. For the DocBook stylesheets, the DocBook declaration works fine.
In the directory that contains the document you want to process, create a catalog file that contains an SGMLDECL directive that points to the SGML declaration that should be used when parsing the document.
There's no easy way to have both the stylesheet and the document in the same directory if they must be processed with different declarations. But this is usually not too inconvenient.
The concept of an architecture was promoted by HyTime. In some ways, it takes the standard SGML/XML notions of the role of elements and attributes and inverts them. Instead of relying on the name of an element to assign its primary semantics, it uses the values of a small set of fixed attributes.
While this may be counterintuitive initially, it has an interesting benefit. An architecture-aware processor can work transparently with many different DTDs. A small example will help illustrate this point.
The following example demonstrates the concept behind architectures, but for the sake of simplicity, it does not properly implement an architecture as defined in HyTime. |
Imagine that you wrote an application that can read an SGML/XML document containing a letter (conforming to some letter DTD), and automatically print an envelope for the letter. It's easy to envision how this works. The application reads the content of the letter, extracts the address and return address elements from the source, and uses them to generate an envelope:
<?xml version='1.0'> <!DOCTYPE letter "/share/sgml/letter/letter.dtd" [ <!ENTITY myaddress "/share/sgml/entities/myaddress.xml"> ]> <letter> <returnaddress>&myaddress;</returnaddress> <address> <name>Leonard Muellner</name> <company>O'Reilly & Associates</company> <street>90 Sherman Street</street> <city>Cambridge</city><state>MA</state><zip>02140</zip> </address> <body> <salutation>Hi Lenny</salutation> ... </body>
The processor extracts the Returnaddress and Address elements and their children and prints the envelope accordingly.
Now suppose that a colleague from payroll comes by and asks you to adapt the application to print envelopes for mailing checks, using the information in the payroll database, which has a different DTD. And a week later, someone from sales comes by and asks if you can modify the application to use the contact information DTD. After a while, you would have 11 versions of this program to maintain.
Suppose that instead of using the actual element names to locate the addresses in the documents, you asked each person to add a few attributes to their DTD. By forcing the attributes to have fixed values, they'd automatically be present in each document, but authors would never have to worry about them.
For example, the address part of the letter DTD might look like this:
<!ELEMENT address (name, company? street*, city, state, zip)> <!ATTLIST address ADDRESS CDATA #FIXED "START" > <!ELEMENT name (#PCDATA)*> <!ATTLIST name ADDRESS CDATA #FIXED "NAME" > <!ELEMENT company (#PCDATA)*> <!ATTLIST company ADDRESS CDATA #FIXED "COMPANY" > <!ELEMENT street (#PCDATA)*> <!ATTLIST street ADDRESS CDATA #FIXED "STREET" > <!ELEMENT city (#PCDATA)*> <!ATTLIST city ADDRESS CDATA #FIXED "CITY" > <!ELEMENT state (#PCDATA)*> <!ATTLIST state ADDRESS CDATA #FIXED "STATE" > <!ELEMENT zip (#PCDATA)*> <!ATTLIST zip ADDRESS CDATA #FIXED "ZIP" >
Effectively, each address in a letter would look like this:
<address ADDRESS="START"> <name ADDRESS="NAME">Leonard Muellner</name> <company ADDRESS="COMPANY">O'Reilly &amp; Associates</company> <street> ADDRESS="STREET">90 Sherman Street</street> <city ADDRESS="CITY">Cambridge</city><state ADDRESS="STATE">MA</state> <zip ADDRESS="ZIP">02140</zip> </address>
In practice, the author would not include the ADDRESS attributes; they are automatically provided by the DTD because they are #FIXED.[3]
Now the address portion of the payroll DTD might look like this:
<!ELEMENT employee (name, mailingaddress)> <!ELEMENT name (#PCDATA)*> <!ATTLIST name ADDRESS CDATA #FIXED "NAME" > <!ELEMENT mailingaddress (addrline1, addrline2, city, state.or.province, postcode)> <!ATTLIST mailingaddress ADDRESS CDATA #FIXED "START" > <!ELEMENT addrline1 (#PCDATA)*> <!ATTLIST addrline1 ADDRESS CDATA #FIXED "STREET" > <!ELEMENT addrline2 (#PCDATA)*> <!ATTLIST addrline2 ADDRESS CDATA #FIXED "STREET" > <!ELEMENT city (#PCDATA)*> <!ATTLIST city ADDRESS CDATA #FIXED "CITY" > <!ELEMENT state.or.province (#PCDATA)*> <!ATTLIST state.or.province ADDRESS CDATA #FIXED "STATE" > <!ELEMENT postcode (#PCDATA)*> <!ATTLIST postcode ADDRESS CDATA #FIXED "ZIP" >
The employee records will look like this:
<employee><name ADDRESS="NAME">Leonard Muellner</name> <mailingaddress ADDRESS="START"> <addrline1 ADDRESS="STREET">90 Sherman Street</addrline1> <city ADDRESS="CITY">Cambridge</city> <state.or.province ADDRESS="STATE">MA</state.or.province> <postcode ADDRESS="ZIP">02140</postcode> </mailingaddress> </employee>
Your application no longer cares about the actual element names. It simply looks for the elements with the correct attributes and uses them. This is the power of an architecture: it provides a level of abstraction that processing applications can use to their advantage. In practice, architectural forms are a bit more complex to set up because they have facilities for dealing with attribute name conflicts, among other things.
Why have we told you all this? Because DSSSL is an architecture. This means you can modify the stylesheet DTD and still run your stylesheets through Jade.
Consider the case presented earlier in Example 4-8. In order to use this stylesheet, you must specify three things: the backend you want to use, the stylesheet you want to use, and the style specification you want to use. If you mismatch any of the parameters, you'll get the wrong results. In practice, the problem is compounded further:
Some stylesheets support several backends (RTF, TeX, and SGML).
Some stylesheets support only some backends (RTF and SGML, but not TeX or MIF).
Some stylesheets support multiple outputs using the same backend (several kinds of HTML output, for example, using the SGML backend: HTML, HTMLHelp, JavaHelp, and so on).
If you have complex stylesheets, some backends may require additional options to define parameter entities or stylesheet options.
None of this complexity is really necessary, after all, the options don't change--you just have to use the correct combinations. The mental model is really something like this: "I want a certain kind of output, TeX say, so I have to use this combination of parameters."
You can summarize this information in a table to help keep track of it:
Desired Output | Backend | Style specification | Options | Supported? |
---|---|---|---|---|
rtf | rtf | -V rtf-backend | yes | |
tex | tex | -V tex-backend -i tex | yes | |
html | sgml | htmlweb | -i html | yes |
javahelp | sgml | help | -i help | yes |
htmlhelp | no |
Putting this information in a table will help you keep track of it, but it's not the best solution. The ideal solution is to keep this information on your system, and let the software figure it all out. You'd like to be able to run a command, tell it what output you want from what stylesheet, what file you want to process, and then let it figure everything else out. For example:
format html mybook.dsl mydoc.sgm
One way to do this is to put the configuration data in a separate file, and have the format command load it out of this other file. The disadvantage of this solution is that it introduces another file that you have to maintain and it's independent from the stylesheet so it isn't easy to keep it up-to-date.
In the DSSSL case, a better alternative is to modify the stylesheet DTD so you can store the configuration data in the stylesheet. Using this alternate DTD, your mybook.dsl stylesheets might look like this:
<!DOCTYPE style-sheet PUBLIC "-//Norman Walsh//DTD Annotated DSSSL Style Sheet V1.2//EN" [ <!-- perhaps additional declarations here --> ]> <style-sheet> <title>DocBook Stylesheet</title> <doctype pubid="-//OASIS//DTD DocBook V3.1//EN"> <doctype pubid="-//Davenport//DTD DocBook V3.0//EN"> <doctype pubid="-//Norman Walsh//DTD Website V1.4//EN"> <backend name="rtf" backend="rtf" fragid="print" options="-V rtf-backend" default="true"> <backend name="tex" backend="tex" fragid="print" options="-V tex-backend -i tex"> <backend name="html" backend="sgml" fragid="htmlweb" options="-i html"> <backend name="javahelp" backend="sgml" fragid="help" options="-i help"> <backend name="htmlhelp" supported="no"> <style-specification id="print" use="docbook"> <style-specification-body> . . .
In this example, the stylesheet has been annotated with a title, a list of the public IDs to which it is applicable, and a table that provides information about the output formats that it supports.
Using this information, the format command can get all the information it needs to construct the appropriate call to Jade. To make HTML from myfile.sgm, format would run the following:
jade -t sgml -d mybook.dsl#htmlweb -i html myfile.sgm
The additional information, titles and public IDs, can be used as part of a GUI interface to simplify the selection of stylesheets for an author.
The complete annotated stylesheet DTD, and an example of the format command script, are provided on the CD-ROM. > >
[1] | See Formally Published CALS Standards for more information. |
[2] | Language codes should conform to IETF RFC 1766. |
[3] | The use of uppercase names here is intentional. These are not attributes that an author is ever expected to type. In XML, which is case-sensitive, using uppercase for things like this reduces the likelihood of collision with "real" attribute names in the DTD. |
Copyright © 1999 O'Reilly & Associates, Inc. All rights reserved.
HIVE: All information for read only. Please respect copyright! |