This file is indexed.

/usr/share/doc/libsaxon-java/dtdgen.html is in libsaxon-java-doc 1:6.5.5-12.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
<html>

<head>

<title>DTDGenerator - A tool to generate XML DTDs</title>
</head>

<body leftmargin="150" bgcolor="#ddeeff"><font face="Arial, Helvetica, sans-serif">
<div align=right><a href="index.html">SAXON home page</a></div>

<h1><big><font color="#FF0080"><big>SAXON DTDGenerator</big></font></big></h1>

<h1>A tool to generate XML DTDs</h1>

<p>&nbsp;</p>

<hr>

<h2>Purpose</h2>

<p>DTDGenerator is a program that takes an XML document as input and produces a Document
Type Definition (DTD) as output.</p>

<p>The aim of the program is to give you a quick start in writing a DTD. The DTD is one of
the many possible DTDs to which the input document conforms. Typically you will want to
examine the DTD and edit it to describe your intended documents more precisely.</p>

<p><strong><font color="#FF0080">The program is issued as part of the
 <a HREF="http://users.iclway.co.uk/mhkay/saxon/index.html">SAXON</a> product.
See the <a HREF="http://users.iclway.co.uk/mhkay/saxon/index.html">SAXON</a> 
home page for download instructions.</font></strong></p>

<p>This version of DTDGenerator runs as a SAXON application, though in fact it exploits very few
 SAXON features. A more recent version of DTDGenerator, which is considerably faster, has been
 rewritten as a free-standing SAX application: this is available at
 <a href="http://saxon.sourceforge.net/dtdgen.html">http://saxon.sourceforge.net/dtdgen.html</a>.</p>


<hr>

<h2>Usage</h2>

<p>First install SAXON. Make sure that SAXON, and the directory containing the DTDGenerator
class are all on the class path.</p>

<p>From the command line, enter:</p>

<p><code><b>java DTDGenerator</b> <i>inputfile</i> &gt;<i>outputfile</i></p>

<p></code>The input file must be an XML document; typically it will have no DTD. If it
does have a DTD, the DTD may be used by the parser but it will be ignored by the
DTDGenerator utility.</p>

<p>The output file will be an XML external document type definition.</p>

<p>The input file is not modified; if you want to edit it to refer to the generated DTD,
you must do this yourself.</p>

<hr>

<h2>What it does</h2>

<p>The program makes a list of all the elements and attributes that appear in your
document, noting how they are nested, and noting which elements contain character data.</p>

<p>When the document has been completely processed, the DTD is generated according to the
following rules:</p>

<ul>
  <li>If an element contains both non-space character data and child elements, then it is
    declared with mixed element content, permitting all child elements that are actually
    encountered within instances of that parent element.</li>
  <li>If no significant character data is found in an element, it is assumed that the element
    cannot contain character data.</li>
  <li>If an element contains child elements but no significant character data, then it is
    declared as having element content. If the same child elements occur in every instance
    of the parent and in a consistent sequence, then this sequence is reflected in the element
    declaration: where child elements are repeated or trailing children (only) are omitted
    in some instances of the parent element, this will result in a declaration that shows
    the child element as being repeatable or optional or both. If no such consistency of
    sequence can be detected, then a more general form of element
    declaration is used in which all child elements may appear any number of times in any 
    order.
  <li>If neither character data nor subordinate elements are found in an element, it is
    assumed the element must always be empty.</li>
  <li>An attribute appearing in an element is assumed to be REQUIRED if it appears in every
    occurrence of the element.</li>
  <li>An attribute that has a distinct value every time it appears is assumed to be an
    identifying (ID) attribute, provided that there are at least 10 instances of the element
    in the input document.</li>
  <li>An attribute is assumed to be an enumeration attribute if it has less than ten distinct
    values, provided that the number of instances of the attribute is at least three times the
    number of distinct values and at least ten. </li>
</ul>

<p>The resulting DTD will often contain rules that are either too restrictive or too
liberal. The DTD may be too restrictive if it prohibits constructs that do not appear in
this document, but might legitimately appear in others. It may be too liberal if it fails
to detect patterns that are inherent to the structure: for example, the order of elements
within a parent element. These limitations are inherent in any attempt to infer general
rules from a particular example document.</p>

<p>In general, therefore, you will need to iterate the process. You have a choice: </p>

<ul>
  <li>Either edit the generated DTD to reflect your knowledge of the document type.</li>
  <li>Or edit the input document to provide a more representative sample of features that will
    be encountered in other document instances, and run the utility again.</li>
</ul>

<p>In a few unusual cases DTDGenerator will create a DTD which is invalid, or one to which
 the document does not conform.
You will then have to edit the DTD before you can use it. The known cases are:</p>

<ul>
  <li>An attribute can only be declared as an ID if all its values are valid XML names,
   and as an enumeration type if all its values are valid XML NMTOKENs. DTDGenerator only makes a
   partial check against this condition: in particular, it only rejects values that contain
   ASCII characters that cannot appear in names (e.g. space).</li>
  <li>DTDGenerator will decide that an attribute is an ID attribute if all its values are
    distinct, without checking whether the set of values overlaps with those of another
    ID attribute. In some cases this results in an IDREF attribute being incorrectly classified
    as an ID.</li>
</ul>



<hr>
<i>

<p align="center">Michael H. Kay<br>
10 August 2003</i> </p>
</body>
</html>