Dateianhang 'DocBook-Demystification-HOWTO.xml'
Herunterladen 1 <?xml version="1.0"?>
2 <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://docbook.org/xml/4.1.2/docbookx.dtd" [
4 <!ENTITY howto "http://tldp.org/HOWTO/">
5 <!ENTITY mini-howto "http://tldp.org/HOWTO/mini/">
6 ]>
7
8 <article>
9 <articleinfo>
10 <title>DocBook Demystification HOWTO</title>
11
12 <author>
13 <firstname>Eric</firstname>
14 <surname>Raymond</surname>
15 <affiliation>
16 <address>
17 <email>esr@thyrsus.com</email>
18 </address>
19 </affiliation>
20 </author>
21
22 <revhistory>
23 <revision>
24 <revnumber>v1.1</revnumber>
25 <date>2002-10-01</date>
26 <authorinitials>esr</authorinitials>
27 <revremark>
28 Correct inadvertent misrepresentation of FSF's position.
29 Added pointer to the DocBook FAQ.
30 </revremark>
31 </revision>
32 <revision>
33 <revnumber>v1.0</revnumber>
34 <date>2002-09-20</date>
35 <authorinitials>esr</authorinitials>
36 <revremark>
37 Initial version.
38 </revremark>
39 </revision>
40 </revhistory>
41
42 <abstract><para>
43 This HOWTO attempts to clear the fog and mystery surrounding the
44 DocBook markup system and the tools that go with it. It is aimed at
45 authors of technical documentation for open-source projects hosted
46 on Linux, but should be useful for people composing other kinds on
47 other Unixes as well.
48 </para></abstract>
49
50 </articleinfo>
51
52 <sect1 id="intro"><title>Introduction</title>
53
54 <para>A great many major open-source projects are converging on
55 DocBook as a standard format for their documentation — projects
56 including the Linux kernel, GNOME, KDE, Samba, and the Linux
57 Documentation Project. The advocates of XML-based "structural markup"
58 (as opposed to the older style of "presentation markup" exemplified by
59 troff, Tex, and Texinfo) seem to have won the theoretical
60 battle.</para>
61
62 <para>Nevertheless, a lot of confusion surrounds DocBook and the
63 programs that support it. Its devotees speak an argot that is dense
64 and forbidding even by computer-science standards, slinging around
65 acronyms that have no obvious relationship to the things you need to
66 do to write markup and make HTML or Postscript from it. XML standards
67 and technical papers are notoriously obscure. Most DocBook-related
68 tools are very poorly documented, and their documentation is
69 especially prone to assume way too much prior knowledge on the
70 reader's part.</para>
71
72 <para>This HOWTO will attempt to clear up the major mysteries
73 surrounding DocBook and its application to open-source documentation
74 — both the technical and political ones. Our objective is to equip
75 you to understand not just what you need to do to make documents, but
76 why the process is as complex as it is — and how it can be
77 expected to change as newer DocBook-related tools become
78 available.</para>
79
80 </sect1>
81 <sect1><title>Why care about DocBook at all?</title>
82
83 <para>There are two possibilities that make DocBook really
84 interesting. One is <emphasis>multi-mode rendering</emphasis> and the
85 other is <emphasis>searchable documentation
86 databases</emphasis>.</para>
87
88 <para>Multi-mode rendering is the easier, nearer-term possibility; it's
89 the ability to write a document in a single master format that can be
90 rendered in many different display modes (in particular, as both HTML
91 for on-line viewing and as Postscript for high-quality printed
92 output). This capability is pretty well implemented now.</para>
93
94 <para><emphasis>Searchable documentation databases</emphasis> is
95 shorthand for the possibility that DocBook might help get us to a
96 world in which all the documentation on your open-source operating
97 system is one rich, searchable, cross-indexed and hyperlinked
98 database (rather than being scattered across several different formats
99 in multiple locations as it is now).</para>
100
101 <para>Ideally, whenever you install a software package on your machine
102 it would register its DocBook documentation into your system's
103 catalog. HTML, properly indexed and cross-linked to the HTML in the
104 rest of your catalog, would be generated. The new package's
105 documentation would then be available through your browser. All
106 your documentation would would be searchable through an interface
107 resembling a good Web search engine.</para>
108
109 <para>HTML itself is not quite rich enough a format to get us to that
110 world. To name just one lack, you can't explicitly declare index
111 entries in HTML. DocBook <emphasis>does</emphasis> have the semantic
112 richness to support structured documentation databases. Fundamentally
113 that's why so many projects are adopting it.</para>
114
115 <para>DocBook has the vices that go with its virtues. Some people
116 find it unpleasantly heavyweight, and too verbose to be really
117 comfortable as a composition format. That's OK; as long as the markup
118 tools they like (things like Perl POD or GNU Texinfo) can generate
119 DocBook out their back ends, we can all still get we want. It doesn't
120 matter whether or not everybody writes in DocBook — as long as
121 it becomes the common document interchange format that everyone uses,
122 we'll still get unified searchable documentation databases.</para>
123
124 </sect1>
125 <sect1><title>Structural markup: a primer</title>
126
127 <para>Older formatting languages like Tex, Texinfo, and Troff
128 supported <firstterm>presentation
129 markup</firstterm><indexterm><primary>presentation
130 markup</primary></indexterm>. In these systems, the instructions you
131 gave were about the appearance and physical layout of the text (font
132 changes, indentation changes, that sort of thing).</para>
133
134 <para>Presentation markup was adequate as long as your objective was
135 to print to a single medium or type of display device. You run into
136 its limits, however, when you want to mark up a document so that (a)
137 it can be formatted for very different display media (such as printing
138 vs. Web display), or (b) you want to support searching and indexing the
139 document by its logical structure (as you are likely to want to do,
140 for example, if you are incorporating it into a hypertext system).</para>
141
142 <para>To support these capabilities properly, you need a system of
143 <firstterm>structural markup</firstterm><indexterm><primary>structural
144 markup</primary></indexterm>. In structural markup, you describe not
145 the physical appearance of the document but the logical properties of
146 its parts.</para>
147
148 <para>As an example: In a presentation-markup language, if you want to
149 emphasize a word, you might instruct the formatter to set it in
150 boldface. In
151 <citerefentry><refentrytitle>troff</refentrytitle><manvolnum>1</manvolnum></citerefentry>
152 this would look like so:</para>
153
154 <programlisting>
155 All your base
156 .B are
157 belong to us!
158 </programlisting>
159
160 <para>In a structural-markup language, you would tell the formatter to
161 emphasize the word:</para>
162
163 <programlisting>
164 All your base <emphasis>are</emphasis> belong to us!
165 </programlisting>
166
167 <para> The "<emphasis>" and </emphasis>in the line above
168 are called <firstterm>markup
169 tags</firstterm><indexterm><primary>markup tags</primary></indexterm>,
170 or just <firstterm>tags</firstterm> for short. They are the
171 instructions to your formatter.</para>
172
173 <para>In a structural-markup language, the physical appearance of the
174 final document would be controlled by a <firstterm>stylesheet</firstterm>
175 <indexterm><primary>stylesheet</primary></indexterm>. It is the
176 stylesheet that would tell the formatter "render emphasis as a font
177 change to boldface". One advantage of presentation-markup languages
178 is that by changing a stylesheet you can globally change the
179 presentation of the document (to use different fonts, for example)
180 without having to hack all the the individual instances of (say)
181 <markup>.B</markup> in the document itself.</para>
182
183 </sect1>
184 <sect1><title>Document Type Definitions</title>
185
186 <para>(Note: to keep the explanation simple, most of this
187 section is going to tell some lies, mainly by omitting a lot of
188 history. Truthfulness will be fully restored in a following
189 section.)</para>
190
191 <para>DocBook is a structural-level markup language. Specifically, it
192 is a dialect of XML. A DocBook document is a hunk of XML that uses
193 XML tags for structural markup.</para>
194
195 <para>In order for a document formatter to apply a stylesheet to your
196 document and make it look good, it needs to know things about the
197 overall structure of your document. For example, it needs to know
198 that a book manuscript normally consists of front matter, a sequence
199 of chapters, and back matter in order to physically format chapter
200 headers properly. In order for it to know this sort of thing, you
201 need to give it a <firstterm>Document Type
202 Definition</firstterm><indexterm><primary>Document Type
203 Definition</primary><secondary>DTD</secondary></indexterm> or DTD. The
204 DTD tells your formatter what sorts of elements can be in the document
205 structure, and in what orders they can appear.</para>
206
207 <para>What we mean by calling DocBook an `application' of XML is
208 actually that DocBook is a DTD — a rather large DTD, with
209 somewhere around 400 tags in it.</para>
210
211 <para>Lurking behind DocBook is a kind of program called a
212 <firstterm>validating parser</firstterm><indexterm><primary>validating
213 parser</primary></indexterm>.When you format a DocBook document, the
214 first step is to pass it through a validating parser (the front end of
215 the DocBook formatter). This program checks your document against the
216 DocBook DTD to make sure you aren't breaking any of the DTD's
217 structural rules (otherwise the back end of the formatter, the part
218 that applies your style sheet, might become quite confused)</para>
219
220 <para>The validating parser will either bomb out, giving you error
221 messages about places where the document structure is broken, or translate
222 the document into a stream of <firstterm>formatting events</firstterm>
223 which the parser back end combines with the information in your stylesheet
224 to produce formatted output</para>
225
226 <para>Here is a diagram of the whole process:</para>
227
228 <mediaobject>
229 <imageobject> <imagedata fileref="figure1.png" format="PNG"/></imageobject>
230 </mediaobject>
231
232 <para>The part of the diagram inside the dotted box is your formatting
233 software, or <firstterm>toolchain</firstterm>. Besides the obvious and
234 visible input to the formatter (the document source) you'll need to
235 keep the two `hidden' inputs of the formatter (DTD and stylesheet) in
236 mind to understand what follows.</para>
237 </sect1>
238 <sect1><title>Other DTDs</title>
239
240 <para>A brief digression into other DTDs may help make clear what parts of
241 the previous section were specific to DocBook and what parts are general to
242 all structural-markup languages.</para>
243
244 <para><ulink url="http://www.tei-c.org/">TEI</ulink> (Text Encoding
245 Initiative) is a large, elaborate DTD used primarily in academia for
246 computer transcription of literary texts. TEI's Unix-based toolchains
247 use many of the same tools that are involved with DocBook, but with
248 different stylesheets and (of course) a different DTD.</para>
249
250 <para>XHTML, the latest version of HTML, is also an XML application
251 described by a DTD, which explains the family resemblance between
252 XHTML and DocBook tags. The XHTML toolchain consists of web browsers
253 and a number of ad-hoc HTML-to-print utilities.</para>
254
255 <para>Many other XML DTDs are maintained to help people exchange
256 structured information in fields as diverse as bioinformatics and
257 banking. You can look at a <ulink
258 url="http://www.xml.com/pub/rg/DTD_Repositories"> list of
259 repositories</ulink> to get some idea of the variety out
260 there.</para>
261
262 </sect1>
263 <sect1><title>The DocBook toolchain</title>
264
265 <para>Normally, what you'll do to make XHTML from your
266 DocBook sources will look like this:</para>
267
268 <screen>
269 bash$ xmlto xhtml foo.xml
270 Convert to XHTML
271 bash$ ls *.html
272 ar01s02.html ar01s03.html ar01s04.html index.html
273 </screen>
274
275 <para>In this example, you converted an XML-Docbook document named
276 <filename>foo.xml</filename> with three top-level sections into an
277 index page and two parts. Making one big page is just as easy:</para>
278
279 <screen>
280 bash$ xmlto xhtml-nochunks foo.xml
281 Convert to XHTML
282 bash$ ls *.html
283 foo.html
284 </screen>
285
286 <para>Finally, here is how you make Postscript for printing:</para>
287
288 <screen>
289 bash$ xmlto ps foo.xml # To make Postscript
290 Convert to XSL-FO
291 Making portrait pages on A4 paper (210mmx297mm)
292 Post-process XSL-FO to DVI
293 Post-process DVI to PS
294 bash$ ls *.ps
295 foo.ps
296 </screen>
297
298 <para>To turn your documents into HTML or Postscript, you need an
299 engine that can apply the combination of DocBook DTD and
300 a suitable stylesheet to your document. Here is how the
301 open-source tools for doing this fit together:</para>
302
303 <mediaobject>
304 <imageobject> <imagedata fileref="figure2.png" format="PNG"/></imageobject>
305 </mediaobject>
306
307 <para>Parsing your document and applying the stylesheet transformation
308 will be handled by one of three programs. The most likely one is
309 <application>xsltproc</application><indexterm><primary>xsltproc</primary></indexterm>,
310 the parser that ships with Red Hat 7.3. The other possibilities are
311 two Java programs,
312 <application>Saxon</application><indexterm><primary>Saxon</primary></indexterm>
313 and
314 <application>Xalan</application><indexterm><primary>Xalan</primary></indexterm>,</para>
315
316 <para>It is relatively easy to generate high-quality XHTML from either
317 DocBook; the fact that XHTML is simply another XML DTD helps a lot.
318 Translation to HTML is done by applying a rather simple stylesheet,
319 and that's the end of the story. RTF is also simple to generate in
320 this way, and from XHTML or RTF it's easy to generate a flat ASCII
321 text approximation in a pinch.</para>
322
323 <para>The awkward case is print. Generating high-quality printed
324 output (which means, in practice, Adobe's
325 PDF<indexterm><primary>PDF</primary></indexterm>
326 (Portable Document Format) is difficult. Doing it right requires
327 algorithmically duplicating the delicate judgments of a human
328 typesetter moving from content to presentation level.</para>
329
330 <para>So, first, a stylesheet translates Docbook's structural markup
331 into another dialect of XML —
332 FO<indexterm><primary>FO</primary></indexterm>
333 (Formatting Objects). FO markup is very much presentation-level; you
334 can think of it as a sort of XML functional equivalent of troff. It
335 has to be translated to Postscript for packaging in a PDF.</para>
336
337 <para>In the toolchain shipped with Red Hat, this job is handled by a
338 TeX macro package called
339 <application>PassiveTeX</application><indexterm><primary>PassiveTeX</primary></indexterm>. It
340 translates the formatting objects generated by
341 <command>xsltproc</command> into Donald Knuth's TeX language. TeX was
342 one of the earliest open-source projects, an old but powerful
343 presentation-level formatting language much beloved of mathematicians
344 (to whom it provides particulaly elaborate facilities for describing
345 mathematical notation). TeX is also famously good at basic
346 typesetting tasks like kerning, line filling, and hyphenating. TeX's
347 output, in what's called DVI<indexterm><primary>DVI</primary></indexterm>
348 (DeVice Independent) format, is then massaged into PDF.</para>
349
350 <para>If you think this bucket chain of XML to Tex macros to DVI to
351 PDF sounds like an awkward kludge, you're right. It clanks, it
352 wheezes, and it has ugly warts. Fonts are a significant problem,
353 since XML and TeX and PDF have very different models of how fonts
354 work; also, handling internationalization and localization is a
355 nightmare. About the only thing this code path has going for it is
356 that it works.</para>
357
358 <para>The elegant way will be
359 FOP<indexterm><primary>FOP</primary></indexterm>, a direct
360 FO-to-Postscript translator being developed by the Apache project.
361 With FOP, the internationalization problem is, if not solved, at least
362 well confined; XML tools handle Unicode all the way through to FOP.
363 Glyph to font mapping is also strictly FOP's problem. The only
364 trouble with this approach is that it doesn't work — yet. As of
365 August 2002 FOP is in an unfinished alpha state — usable, but
366 with rough edges and missing features.</para>
367
368 <para>Here is what the FOP toolchain looks like:</para>
369
370 <mediaobject>
371 <imageobject> <imagedata fileref="figure3.png" format="PNG"/></imageobject>
372 </mediaobject>
373
374 <para>FOP has competition. There is another project called
375 <application>xsl-fo-proc</application><indexterm><primary>xsl-fo-proc</primary></indexterm>
376 which aims to do the same things as FOP, but in C++ (and therefore
377 both faster than Java and not relying on the Java environment). As of
378 August 2002 FOP is in an unfinished alpha state, not as far along as
379 FOP.</para>
380
381 </sect1>
382 <sect1><title>Who are the projects and the players?</title>
383
384 <para>The DocBook DTD itself is maintained by the DocBook Technical
385 Committee, headed by Norman Walsh. Norm is the principal author of
386 the DocBook stylesheets, a man who has focused remarkable energy and
387 talent over many years on the extremely complex problems DocBook
388 addresses. He is as universally respected in the DocBook/SGML/XML
389 community as Linus Torvalds is in the Linux world.</para>
390
391 <para>The <ulink url="http://sources.redhat.com/docbook-tools/">
392 docbook-tools</ulink> project provides open-source tools for
393 converting SGML DocBook to HTML, Postscript, and other formats. This
394 package is shipped with Red Hat and other Linux distributions. It is
395 maintained by Mark Galassi.</para>
396
397 <para><ulink url="http://www.jclark.com/jade/">Jade</ulink> is an
398 engine used to apply DSSSL stylesheets to SGML documents. It is
399 maintained by James Clark.</para>
400
401 <para><ulink url="http://openjade.sourceforge.net/">OpenJade</ulink>
402 is a community project undertaken because the founders thought James
403 Clark's maintainance of Jade was spotty. The docbook-tools programs
404 use OpenJade.</para>
405
406 <para><ulink url="http://xmlsoft.org/XSLT/">libxslt</ulink> is a C
407 library that interprets XSLT, applying stylesheets to XML documents.
408 It includes a wrapper program, <command>xsltproc</command>, that can be
409 used as an XML formatter. The code was written by Daniel Veillard
410 under the auspices of the GNOME project, but does not require any
411 GNOME code to run. I hear it's blazingly fast compared to the
412 Java alternatives, not a surprising claim.</para>
413
414 <para><ulink url="http://cyberelk.net/tim/xmlto/">xmlto</ulink> is the
415 user interface of the XML toolchain that Red Hat ships. It's written
416 and maintained by Tim Waugh.</para>
417
418 <para><ulink url="http://users.iclway.co.uk/mhkay/saxon/">Saxon</ulink>
419 and <ulink url="http://xml.apache.org/xalan-j/">Xalan</ulink> are Java
420 programs that interpret XSLT. Saxon seems to be designed to work
421 under Windows. Xalan is part of the XML Apache project and native to
422 Linux and BSD; it's designed to work with FOP.</para>
423
424 <para><ulink
425 url="http://users.ox.ac.uk/~rahtz/passivetex/">PassiveTeX</ulink> the
426 package of LaTeX macros that <application>xmlto</application> uses for
427 producing DVI from XML-DocBook. <ulink
428 url="http://jadetex.sourceforge.net/">JadeTex</ulink> is the package
429 of LaTeX macros that OpenJade uses for producing DVI from
430 SGML-DocBook.</para>
431
432 <para><ulink url="http://xml.apache.org/fop/">FOP</ulink> translates
433 XML Formatting Objects to PDF. It is part of the Apache XML project
434 and is designed to work with Xalan.</para>
435
436 </sect1>
437 <sect1><title>Migration tools</title>
438
439 <para>The second biggest problem with DocBook is the effort needed to
440 convert old-style presentation markup to DocBook markup. Human beings
441 can usually parse the presentatition of a document into logical
442 structure automatically, because (for example) they can tell from
443 context when an italic font means `emphasis' and when it meabs
444 something else such as `this is a foreign phrase'.</para>
445
446 <para>Somehow, in converting documents to DocBook, those
447 sorts of distinctions need to be made explicit. Sometimes
448 they're present in the old markup; often they are not, and the
449 missing structural information has to be either deduced by
450 clever heuristics or added by a human.</para>
451
452 <para>Here is a summary of the state of conversion tools from
453 various other formats:</para>
454
455 <variablelist>
456 <varlistentry>
457 <term>GNU Texinfo</term>
458 <listitem>
459 <para>The Free Software Foundation has made a policy decision to
460 support DocBook as an interchange format. Texinfo has enough
461 structure to make reasonably good automatic conversion possible, and
462 the 4.x versions of <command>makeinfo</command> feature a
463 <option>--docbook</option> switch that generates DocBook. More at the
464 <ulink url="http://www.gnu.org/directory/texinfo.html">makeinfo
465 project page</ulink>.</para>
466 </listitem>
467 </varlistentry>
468
469 <varlistentry>
470 <term>POD</term>
471 <listitem>
472 <para>There is a <ulink
473 url="http://www.cpan.org/modules/by-module/Pod/">POD::DocBook</ulink>
474 module that translates Plain Old Documentation markup to DocBook. It
475 claims to support every DocBook tag except the L<> italic tag.
476 The man page also says "Nested =over/=back lists are not supported
477 within DocBook." but notes that the module has been heavily
478 tested.</para>
479 </listitem>
480 </varlistentry>
481
482 <varlistentry>
483 <term>LaTeX</term>
484 <listitem>
485 <para>LaTeX is a (mostly) structural markup macro language built on
486 top of the TeX formatter. There is a project called <ulink
487 url="http://www.lrz-muenchen.de/services/software/sonstiges/tex4ht/mn.html">
488 TeX4ht</ulink> that (according to the author of PassiveTeX) can
489 generate DocBook from LaTeX.</para>
490 </listitem>
491 </varlistentry>
492
493 <varlistentry>
494 <term>man pages and other troff-based markups</term>
495 <listitem>
496 <para>This is generally considered the biggest and nastiest conversion
497 problem. And indeed, the basic
498 <citerefentry><refentrytitle>troff</refentrytitle>
499 <manvolnum>1</manvolnum></citerefentry> markup is at too low a presentation
500 level for automatic conversion tools to do much of any good. However,
501 the gloom in the picture lightens significantly if we consider
502 translation from sources of documents written in macro packages like
503 <citerefentry><refentrytitle>man</refentrytitle>
504 <manvolnum>7</manvolnum></citerefentry>. These have enough structural
505 features for automatic translation to get some traction.</para>
506
507 <para>I wrote a tool to do this myself, because I couldn't find
508 anything else that did a half-decent job of it (and the problem is
509 interesting). It's called <ulink
510 url="http://www.tuxedo.org/~esr/doclifter/">doclifter</ulink>. It will
511 translate to either SGML or XML DocBook from
512 <citerefentry><refentrytitle>man</refentrytitle>
513 <manvolnum>7</manvolnum></citerefentry>,
514 <citerefentry><refentrytitle>mdoc</refentrytitle>
515 <manvolnum>7</manvolnum></citerefentry>,
516 <citerefentry><refentrytitle>ms</refentrytitle>
517 <manvolnum>7</manvolnum></citerefentry>, or
518 <citerefentry><refentrytitle>me</refentrytitle>
519 <manvolnum>7</manvolnum></citerefentry> macros. See the documentation
520 for details.</para>
521 </listitem>
522 </varlistentry>
523 </variablelist>
524
525 </sect1>
526 <sect1><title>Editing tools</title>
527
528 <para>One thing we presently do not have is a good open-source
529 structure editor for SGML/XML documents.</para>
530
531 <para><ulink url="http://www.lyx.org/">LyX</ulink> is a GUI word processor
532 that uses LaTeX for printing and supports structural editing of LaTeX
533 markup. There is a LaTeX package that generates DocBook, and a
534 <ulink url="http://bgu.chez.tiscali.fr/doc/db4lyx/">how-to document</ulink>
535 escribing how to write SGML and XML in the LyX GUI.</para>
536
537 <para><ulink url="http://idx-getox.idealx.org/">GeTox</ulink>, the
538 GNOME XML Editor, aims at nontechnical users. But the software is
539 still (as of August 2001) alpha, more a proof of concept than anything
540 useful, and the project group seems not to be very active; there have
541 been no updates of the website between May 2001 and August 2002 (time of
542 writing).</para>
543
544 <para><ulink
545 url="http://www.math.u-psud.fr/~anh/TeXmacs/TeXmacs.html"> GNU
546 TeXMacs</ulink> is a project aimed at producing an editor that is good
547 for technical and mathematical material, including displayed formulas.
548 1.0 was released in April 2002. The developers plan XML support in
549 the future, but it's not there yet.</para>
550
551 <para><ulink url="http://www.freesoftware.fsf.org/thotbook/">ThotBook</ulink>
552 is a project to put together a GUI editor for DocBook based on
553 the Thot toolkit. It way be moribund; the web page was not updated
554 from November 2001 to August 2002 (time of writing).</para>
555
556 <para>Most people still hack the tags by hand using either vi or Emacs, using
557 psgml to validate the results.</para>
558
559 </sect1>
560 <sect1><title>Related standards and practices</title>
561
562 <para>The tools are coming together, if slowly, to edit and format
563 DocBook markup. But DocBook itself is a means, not an end. We'll need
564 other standards besides DocBook itself to accomplish the
565 searchable-documentation-database objective I laid out at the
566 beginning of this document. There are two big issues: document
567 cataloguing and metadata.</para>
568
569 <para>The <ulink
570 url="http://scrollkeeper.sourceforge.net/">Scrollkeeper</ulink>
571 project aims directly to meet this need. It provides a simple set of
572 script hooks that can be used by package install and uninstall
573 productions to register and unregister their documentation.</para>
574
575 <para>Scrollkeeper uses the <ulink
576 url="http://www.ibiblio.org/osrt/omf/"> Open Metadata Format</ulink>.
577 This is a standard for indexing open-source documentation analogous to
578 a library card-catalog system. The idea is to support rich search
579 facilities that use the card-catalog metadata as well as the source
580 text of the documentation itself.</para>
581
582 </sect1>
583
584 <sect1><title>SGML and SGML-Tools</title>
585
586 <para>In previous sections, I have thrown away a lot of DocBook's
587 history. XML has an older brother,
588 SGML<indexterm><primary>SGML</primary></indexterm> or Standard Generalized
589 Markup Language.</para>
590
591 <para>Until mid-2002, no discussion of DocBook would have been
592 complete without a long excursion into SGML, the differences between
593 SGML and XML, and detailed descriptions of the SGML DocBook toolchain.
594 Life can be simpler now; a XML DocBook toolchain is available in open
595 source, works as well as the SGML toolchain ever did, and is easier to
596 use, If you don't think you'll ever have to deal with old SGML-Docbook
597 documents, you can skip the remainder of this section.</para>
598
599 <sect2><title>DocBook SGML</title>
600
601 <para>DocBook was originally an SGML application, and there was an
602 SGML-based DocBook toolchain that is now moribund. There are minor
603 differences between the DocBook SGML DTD and the DocBook XML DTD, but
604 for an introductory discussion we can ignore them. The only one that's
605 normally user-visible is that in SGML contentless tags did not need to
606 have a trailing slash added to them before the closing >.
607 (Requiring the trailing / means XML parsers can be a lot simpler,
608 because they don't have to know about the DTD to know which opening
609 tags need closers.)</para>
610
611 <para>Versions of HTML up to 4.01 (before XHTML) were SGML
612 applications. TEI was originally an SGML application, too. The
613 groups managing all three DTDs jumped to XML for the same reason
614 DocBook's developers did — it's drastically simpler. SGML was
615 extremely complex; unmanageably so, as it turns out. The
616 specification was a dense 150 pages and it is not reliably reported
617 that any software ever fully implemented it.</para>
618
619 <para>The toolchain diagram I gave earlier was simplified; it
620 only showed the XML toolchain. Here is the historically
621 correct version:</para>
622
623 <mediaobject>
624 <imageobject><imagedata fileref="figure4.png" format="PNG"/></imageobject>
625 </mediaobject>
626
627 <para>The DSSSL toolchain is what processed DocBook SGML.
628 Under it, a document goes from DocBook format through one of two
629 closely-related stylesheet engines called Jade and OpenJade. These
630 turn it into a TeX-macro markup. which is processed by a package called
631 JadeTeX, into DVIs, which then get turned into Postscript.</para>
632 </sect2>
633
634 <sect2><title>Why SGML DocBook is dead</title>
635
636 <para>The DSSSL toolchain is, as far as new development goes,
637 effectively dead. The XSLT toolchain has just reached production
638 status as I write in August 2002; a working version shipped in Red Hat
639 7.3. It's where DocBook developers are putting almost all of their
640 effort.</para>
641
642 <para>The reason for the change to XML was threefold. First,
643 SGML turned out to be too complicated to use; then, DSSSL turned out
644 to be too complicated to live with; then, significant parts of the
645 DSSSL toolchain turned out to be weak and irredeemably messy.</para>
646
647 <para>Relative to SGML, XML has a reduced feature set that is
648 sufficient for almost all purposes but much easier to understand and
649 build parsers for. SGML-processing tools (such as validating parsers) have
650 to carry around support for a lot of features that DocBook and other
651 text markup systems never actually used. Removing these features
652 made XML simpler and XML-processing tools faster.</para>
653
654 <para>The language used to describe SGML DTDs is sufficiently spiky
655 and forbidding that composing SGML DTDs was something of a black art.
656 XML DTDs, on the other hand, can be described in a dialect of XML
657 itself; there does not need to be a separate DTD language. An XML
658 description of an XML DTD is called a
659 <firstterm>schema</firstterm><indexterm><primary>schema</primary></indexterm>;
660 the term DTD itself will probably pass out of use as the standards for
661 schemas firm up.</para>
662
663 <para>But mostly the DSSSL toolchain is dead because DSSSL itself, the
664 SGML stylesheet description language in that toolchain, proved just too
665 arcane for most human beings, and made stylesheets too difficult to
666 write and modify. (It was a dialect of Scheme. Your humble editor, a
667 LISP-head from way back, shakes his head in sad bemusement that
668 this should drive people away.)</para>
669
670 <para>XML fans like to sum up all these changes with "XML: tastes great, less
671 filling."</para>
672 </sect2>
673
674 <sect2><title>SGML-Tools</title>
675
676 <para>SGML-Tools was the name of a DTD used by the <ulink
677 url="http://www.linuxdoc.org">Linux Documentation Project</ulink>,
678 developed a few years ago when today's DocBook toolchains didn't exist.
679 SGML-Tools markup was simpler, but also much less flexible than
680 DocBook. The original SGML-Tools formatter/DTD/stylesheet(s)
681 toolchain has been dead for some time now, but a successor called <ulink
682 url="http://sourceforge.net/projects/sgmltools-lite/">SGML-tools
683 Lite</ulink> is still maintained.</para>
684
685 <para>The LDP has been phasing out SGML-Tools in favor of DocBook, but
686 it is still possible you might take over an old HOWTO. These can be
687 regognized by the identifying header "<!doctype linuxdoc
688 system>. If this happens to you, convert the thing to XML DocBook
689 and give the old version a quick burial.</para>
690 </sect2>
691 </sect1>
692
693 <sect1><title>References</title>
694
695 <para>One of the things that makes learning DocBook difficult is that
696 the sites related to it tend to overwhelm the newbie with long lists
697 of W3C standards, massive exercises in SGML theology, and dense
698 thickets of abstract terminology. We're going to try to avoid that
699 here by giving you just a few selected references to look at.</para>
700
701 <para>Michael Smith's <ulink
702 url="http://xml.oreilly.com/news/dontlearn_0701.html">
703 Take My Advice: Don't Learn XML</ulink> surveys the XML world from
704 an angle similar to this document.</para>
705
706 <para>Norman Walsh's <citetitle>DocBook: The Definitive
707 Guide</citetitle> is available <ulink
708 url="http://www.oreilly.com/catalog/docbook/">in print</ulink> and
709 <ulink url="http://www.docbook.org/tdg/en/html/docbook.html">on the
710 web</ulink>. This is indeed the definitive reference, but as an
711 introduction or tutorial it's a disaster. Instead, read this:</para>
712
713 <para><ulink url="http://www.bureau-cornavin.com/opensource/crash-course/index.html">Writing
714 Documentation Using DocBook: A Crash Course</ulink>. This is an excellent
715 tutorial.</para>
716
717 <para>There is an excellent <ulink
718 url="http://www.dpawson.co.uk/docbook/">DocBook FAQ</ulink> with a lot
719 of material on styling HTML output. There is also a DocBook <ulink
720 url="http://docbook.org/wiki/moin.cgi">wiki</ulink>.</para>
721
722 <para>If you're writing for the Linux Documentation Project, read the
723 <ulink url="http://www.linuxdoc.org/LDP/LDP-Author-Guide/index.html">
724 LDP Author Guide</ulink>.</para>
725
726 <para>The best general introduction to SGML and XML that I've
727 personally read all the way through is David Megginson's <ulink
728 url="http://vig.pearsoned.com/store/product/0,,store-562_banner-0_isbn-0136422993,00.html">Structuring
729 XML Documents</ulink> (Prentice-Hall, ISBN: 0-13-642299-3).</para>
730
731 <para>For XML only, <ulink
732 url="http://www.oreilly.com/catalog/xmlnut2/">XML In A Nutshell</ulink>
733 by W. Scott Means and Elliotte "Rusty" Harold is very good.</para>
734
735 <para><ulink url="http://www.ibiblio.org/xml/books/bible/">The XML
736 Bible</ulink> looks like a pretty comprehensive reference on XML and
737 related standards (including Formatting Objects).</para>
738
739 <para>Finally, the <ulink url="http://xml.coverpages.org/">The XML
740 Cover Pages</ulink> will take you into the jungle of XML standards
741 if you really want to go there.</para>
742
743 </sect1>
744 </article>
745
746 <!-- Keep this comment at the end of the file
747 Local variables:
748 mode: sgml
749 sgml-omittag:t
750 sgml-shorttag:t
751 sgml-namecase-general:t
752 sgml-general-insert-case:lower
753 sgml-minimize-attributes:nil
754 sgml-always-quote-attributes:t
755 sgml-indent-step:1
756 sgml-indent-data:nil
757 sgml-parent-document:nil
758 sgml-exposed-tags:nil
759 sgml-local-catalogs:nil
760 sgml-local-ecat-files:nil
761 End:
762 -->
Gespeicherte Dateianhänge
Um Dateianhänge in eine Seite einzufügen sollte unbedingt eine Angabe wie attachment:dateiname benutzt werden, wie sie auch in der folgenden Liste der Dateien erscheint. Es sollte niemals die URL des Verweises ("laden") kopiert werden, da sich diese jederzeit ändern kann und damit der Verweis auf die Datei brechen würde.Sie dürfen keine Anhänge an diese Seite anhängen!