This file is indexed.

/usr/share/doc/clang-4.0-doc/html/PCHInternals.html is in clang-4.0-doc 1:4.0.1-10.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <title>Precompiled Header and Modules Internals &#8212; Clang 4 documentation</title>
    <link rel="stylesheet" href="_static/haiku.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    './',
        VERSION:     '4',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true,
        SOURCELINK_SUFFIX: '.txt'
      };
    </script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
    <link rel="next" title="ABI tags" href="ItaniumMangleAbiTags.html" />
    <link rel="prev" title="Pretokenized Headers (PTH)" href="PTHInternals.html" /> 
  </head>
  <body>
      <div class="header" role="banner"><h1 class="heading"><a href="index.html">
          <span>Clang 4 documentation</span></a></h1>
        <h2 class="heading"><span>Precompiled Header and Modules Internals</span></h2>
      </div>
      <div class="topnav" role="navigation" aria-label="top navigation">
      
        <p>
        «&#160;&#160;<a href="PTHInternals.html">Pretokenized Headers (PTH)</a>
        &#160;&#160;::&#160;&#160;
        <a class="uplink" href="index.html">Contents</a>
        &#160;&#160;::&#160;&#160;
        <a href="ItaniumMangleAbiTags.html">ABI tags</a>&#160;&#160;»
        </p>

      </div>
      <div class="content">
        
        
  <div class="section" id="precompiled-header-and-modules-internals">
<h1>Precompiled Header and Modules Internals<a class="headerlink" href="#precompiled-header-and-modules-internals" title="Permalink to this headline"></a></h1>
<div class="contents local topic" id="contents">
<ul class="simple">
<li><a class="reference internal" href="#using-precompiled-headers-with-clang" id="id1">Using Precompiled Headers with <code class="docutils literal"><span class="pre">clang</span></code></a></li>
<li><a class="reference internal" href="#design-philosophy" id="id2">Design Philosophy</a></li>
<li><a class="reference internal" href="#ast-file-contents" id="id3">AST File Contents</a><ul>
<li><a class="reference internal" href="#metadata-block" id="id4">Metadata Block</a></li>
<li><a class="reference internal" href="#source-manager-block" id="id5">Source Manager Block</a></li>
<li><a class="reference internal" href="#preprocessor-block" id="id6">Preprocessor Block</a></li>
<li><a class="reference internal" href="#types-block" id="id7">Types Block</a></li>
<li><a class="reference internal" href="#declarations-block" id="id8">Declarations Block</a></li>
<li><a class="reference internal" href="#statements-and-expressions" id="id9">Statements and Expressions</a></li>
<li><a class="reference internal" href="#pchinternals-ident-table" id="id10">Identifier Table Block</a></li>
<li><a class="reference internal" href="#method-pool-block" id="id11">Method Pool Block</a></li>
</ul>
</li>
<li><a class="reference internal" href="#ast-reader-integration-points" id="id12">AST Reader Integration Points</a></li>
<li><a class="reference internal" href="#chained-precompiled-headers" id="id13">Chained precompiled headers</a></li>
<li><a class="reference internal" href="#modules" id="id14">Modules</a></li>
</ul>
</div>
<p>This document describes the design and implementation of Clang’s precompiled
headers (PCH) and modules.  If you are interested in the end-user view, please
see the <a class="reference internal" href="UsersManual.html#usersmanual-precompiled-headers"><span class="std std-ref">User’s Manual</span></a>.</p>
<div class="section" id="using-precompiled-headers-with-clang">
<h2><a class="toc-backref" href="#id1">Using Precompiled Headers with <code class="docutils literal"><span class="pre">clang</span></code></a><a class="headerlink" href="#using-precompiled-headers-with-clang" title="Permalink to this headline"></a></h2>
<p>The Clang compiler frontend, <code class="docutils literal"><span class="pre">clang</span> <span class="pre">-cc1</span></code>, supports two command line options
for generating and using PCH files.</p>
<p>To generate PCH files using <code class="docutils literal"><span class="pre">clang</span> <span class="pre">-cc1</span></code>, use the option <cite>-emit-pch</cite>:</p>
<div class="highlight-bash"><div class="highlight"><pre><span></span>$ clang -cc1 test.h -emit-pch -o test.h.pch
</pre></div>
</div>
<p>This option is transparently used by <code class="docutils literal"><span class="pre">clang</span></code> when generating PCH files.  The
resulting PCH file contains the serialized form of the compiler’s internal
representation after it has completed parsing and semantic analysis.  The PCH
file can then be used as a prefix header with the <cite>-include-pch</cite>
option:</p>
<div class="highlight-bash"><div class="highlight"><pre><span></span>$ clang -cc1 -include-pch test.h.pch test.c -o test.s
</pre></div>
</div>
</div>
<div class="section" id="design-philosophy">
<h2><a class="toc-backref" href="#id2">Design Philosophy</a><a class="headerlink" href="#design-philosophy" title="Permalink to this headline"></a></h2>
<p>Precompiled headers are meant to improve overall compile times for projects, so
the design of precompiled headers is entirely driven by performance concerns.
The use case for precompiled headers is relatively simple: when there is a
common set of headers that is included in nearly every source file in the
project, we <em>precompile</em> that bundle of headers into a single precompiled
header (PCH file).  Then, when compiling the source files in the project, we
load the PCH file first (as a prefix header), which acts as a stand-in for that
bundle of headers.</p>
<p>A precompiled header implementation improves performance when:</p>
<ul class="simple">
<li>Loading the PCH file is significantly faster than re-parsing the bundle of
headers stored within the PCH file.  Thus, a precompiled header design
attempts to minimize the cost of reading the PCH file.  Ideally, this cost
should not vary with the size of the precompiled header file.</li>
<li>The cost of generating the PCH file initially is not so large that it
counters the per-source-file performance improvement due to eliminating the
need to parse the bundled headers in the first place.  This is particularly
important on multi-core systems, because PCH file generation serializes the
build when all compilations require the PCH file to be up-to-date.</li>
</ul>
<p>Modules, as implemented in Clang, use the same mechanisms as precompiled
headers to save a serialized AST file (one per module) and use those AST
modules.  From an implementation standpoint, modules are a generalization of
precompiled headers, lifting a number of restrictions placed on precompiled
headers.  In particular, there can only be one precompiled header and it must
be included at the beginning of the translation unit.  The extensions to the
AST file format required for modules are discussed in the section on
<a class="reference internal" href="#pchinternals-modules"><span class="std std-ref">modules</span></a>.</p>
<p>Clang’s AST files are designed with a compact on-disk representation, which
minimizes both creation time and the time required to initially load the AST
file.  The AST file itself contains a serialized representation of Clang’s
abstract syntax trees and supporting data structures, stored using the same
compressed bitstream as <a class="reference external" href="http://llvm.org/docs/BitCodeFormat.html">LLVM’s bitcode file format</a>.</p>
<p>Clang’s AST files are loaded “lazily” from disk.  When an AST file is initially
loaded, Clang reads only a small amount of data from the AST file to establish
where certain important data structures are stored.  The amount of data read in
this initial load is independent of the size of the AST file, such that a
larger AST file does not lead to longer AST load times.  The actual header data
in the AST file — macros, functions, variables, types, etc. — is loaded
only when it is referenced from the user’s code, at which point only that
entity (and those entities it depends on) are deserialized from the AST file.
With this approach, the cost of using an AST file for a translation unit is
proportional to the amount of code actually used from the AST file, rather than
being proportional to the size of the AST file itself.</p>
<p>When given the <cite>-print-stats</cite> option, Clang produces statistics
describing how much of the AST file was actually loaded from disk.  For a
simple “Hello, World!” program that includes the Apple <code class="docutils literal"><span class="pre">Cocoa.h</span></code> header
(which is built as a precompiled header), this option illustrates how little of
the actual precompiled header is required:</p>
<div class="highlight-none"><div class="highlight"><pre><span></span>*** AST File Statistics:
  895/39981 source location entries read (2.238563%)
  19/15315 types read (0.124061%)
  20/82685 declarations read (0.024188%)
  154/58070 identifiers read (0.265197%)
  0/7260 selectors read (0.000000%)
  0/30842 statements read (0.000000%)
  4/8400 macros read (0.047619%)
  1/4995 lexical declcontexts read (0.020020%)
  0/4413 visible declcontexts read (0.000000%)
  0/7230 method pool entries read (0.000000%)
  0 method pool misses
</pre></div>
</div>
<p>For this small program, only a tiny fraction of the source locations, types,
declarations, identifiers, and macros were actually deserialized from the
precompiled header.  These statistics can be useful to determine whether the
AST file implementation can be improved by making more of the implementation
lazy.</p>
<p>Precompiled headers can be chained.  When you create a PCH while including an
existing PCH, Clang can create the new PCH by referencing the original file and
only writing the new data to the new file.  For example, you could create a PCH
out of all the headers that are very commonly used throughout your project, and
then create a PCH for every single source file in the project that includes the
code that is specific to that file, so that recompiling the file itself is very
fast, without duplicating the data from the common headers for every file.  The
mechanisms behind chained precompiled headers are discussed in a <a class="reference internal" href="#pchinternals-chained"><span class="std std-ref">later
section</span></a>.</p>
</div>
<div class="section" id="ast-file-contents">
<h2><a class="toc-backref" href="#id3">AST File Contents</a><a class="headerlink" href="#ast-file-contents" title="Permalink to this headline"></a></h2>
<p>An AST file produced by clang is an object file container with a <code class="docutils literal"><span class="pre">clangast</span></code>
(COFF) or <code class="docutils literal"><span class="pre">__clangast</span></code> (ELF and Mach-O) section containing the serialized AST.
Other target-specific sections in the object file container are used to hold
debug information for the data types defined in the AST.  Tools built on top of
libclang that do not need debug information may also produce raw AST files that
only contain the serialized AST.</p>
<p>The <code class="docutils literal"><span class="pre">clangast</span></code> section is organized into several different blocks, each of
which contains the serialized representation of a part of Clang’s internal
representation.  Each of the blocks corresponds to either a block or a record
within <a class="reference external" href="http://llvm.org/docs/BitCodeFormat.html">LLVM’s bitstream format</a>.
The contents of each of these logical blocks are described below.</p>
<img alt="_images/PCHLayout.png" src="_images/PCHLayout.png" />
<p>The <code class="docutils literal"><span class="pre">llvm-objdump</span></code> utility provides a <code class="docutils literal"><span class="pre">-raw-clang-ast</span></code> option to extract the
binary contents of the AST section from an object file container.</p>
<p>The <a class="reference external" href="http://llvm.org/docs/CommandGuide/llvm-bcanalyzer.html">llvm-bcanalyzer</a>
utility can be used to examine the actual structure of the bitstream for the AST
section.  This information can be used both to help understand the structure of
the AST section and to isolate areas where the AST representation can still be
optimized, e.g., through the introduction of abbreviations.</p>
<div class="section" id="metadata-block">
<h3><a class="toc-backref" href="#id4">Metadata Block</a><a class="headerlink" href="#metadata-block" title="Permalink to this headline"></a></h3>
<p>The metadata block contains several records that provide information about how
the AST file was built.  This metadata is primarily used to validate the use of
an AST file.  For example, a precompiled header built for a 32-bit x86 target
cannot be used when compiling for a 64-bit x86 target.  The metadata block
contains information about:</p>
<dl class="docutils">
<dt>Language options</dt>
<dd>Describes the particular language dialect used to compile the AST file,
including major options (e.g., Objective-C support) and more minor options
(e.g., support for “<code class="docutils literal"><span class="pre">//</span></code>” comments).  The contents of this record correspond to
the <code class="docutils literal"><span class="pre">LangOptions</span></code> class.</dd>
<dt>Target architecture</dt>
<dd>The target triple that describes the architecture, platform, and ABI for
which the AST file was generated, e.g., <code class="docutils literal"><span class="pre">i386-apple-darwin9</span></code>.</dd>
<dt>AST version</dt>
<dd>The major and minor version numbers of the AST file format.  Changes in the
minor version number should not affect backward compatibility, while changes
in the major version number imply that a newer compiler cannot read an older
precompiled header (and vice-versa).</dd>
<dt>Original file name</dt>
<dd>The full path of the header that was used to generate the AST file.</dd>
<dt>Predefines buffer</dt>
<dd>Although not explicitly stored as part of the metadata, the predefines buffer
is used in the validation of the AST file.  The predefines buffer itself
contains code generated by the compiler to initialize the preprocessor state
according to the current target, platform, and command-line options.  For
example, the predefines buffer will contain “<code class="docutils literal"><span class="pre">#define</span> <span class="pre">__STDC__</span> <span class="pre">1</span></code>” when we
are compiling C without Microsoft extensions.  The predefines buffer itself
is stored within the <a class="reference internal" href="#pchinternals-sourcemgr"><span class="std std-ref">Source Manager Block</span></a>, but its contents are
verified along with the rest of the metadata.</dd>
</dl>
<p>A chained PCH file (that is, one that references another PCH) and a module
(which may import other modules) have additional metadata containing the list
of all AST files that this AST file depends on.  Each of those files will be
loaded along with this AST file.</p>
<p>For chained precompiled headers, the language options, target architecture and
predefines buffer data is taken from the end of the chain, since they have to
match anyway.</p>
</div>
<div class="section" id="source-manager-block">
<span id="pchinternals-sourcemgr"></span><h3><a class="toc-backref" href="#id5">Source Manager Block</a><a class="headerlink" href="#source-manager-block" title="Permalink to this headline"></a></h3>
<p>The source manager block contains the serialized representation of Clang’s
<a class="reference internal" href="InternalsManual.html#sourcemanager"><span class="std std-ref">SourceManager</span></a> class, which handles the mapping from
source locations (as represented in Clang’s abstract syntax tree) into actual
column/line positions within a source file or macro instantiation.  The AST
file’s representation of the source manager also includes information about all
of the headers that were (transitively) included when building the AST file.</p>
<p>The bulk of the source manager block is dedicated to information about the
various files, buffers, and macro instantiations into which a source location
can refer.  Each of these is referenced by a numeric “file ID”, which is a
unique number (allocated starting at 1) stored in the source location.  Clang
serializes the information for each kind of file ID, along with an index that
maps file IDs to the position within the AST file where the information about
that file ID is stored.  The data associated with a file ID is loaded only when
required by the front end, e.g., to emit a diagnostic that includes a macro
instantiation history inside the header itself.</p>
<p>The source manager block also contains information about all of the headers
that were included when building the AST file.  This includes information about
the controlling macro for the header (e.g., when the preprocessor identified
that the contents of the header dependent on a macro like
<code class="docutils literal"><span class="pre">LLVM_CLANG_SOURCEMANAGER_H</span></code>).</p>
</div>
<div class="section" id="preprocessor-block">
<span id="pchinternals-preprocessor"></span><h3><a class="toc-backref" href="#id6">Preprocessor Block</a><a class="headerlink" href="#preprocessor-block" title="Permalink to this headline"></a></h3>
<p>The preprocessor block contains the serialized representation of the
preprocessor.  Specifically, it contains all of the macros that have been
defined by the end of the header used to build the AST file, along with the
token sequences that comprise each macro.  The macro definitions are only read
from the AST file when the name of the macro first occurs in the program.  This
lazy loading of macro definitions is triggered by lookups into the
<a class="reference internal" href="#pchinternals-ident-table"><span class="std std-ref">identifier table</span></a>.</p>
</div>
<div class="section" id="types-block">
<span id="pchinternals-types"></span><h3><a class="toc-backref" href="#id7">Types Block</a><a class="headerlink" href="#types-block" title="Permalink to this headline"></a></h3>
<p>The types block contains the serialized representation of all of the types
referenced in the translation unit.  Each Clang type node (<code class="docutils literal"><span class="pre">PointerType</span></code>,
<code class="docutils literal"><span class="pre">FunctionProtoType</span></code>, etc.) has a corresponding record type in the AST file.
When types are deserialized from the AST file, the data within the record is
used to reconstruct the appropriate type node using the AST context.</p>
<p>Each type has a unique type ID, which is an integer that uniquely identifies
that type.  Type ID 0 represents the NULL type, type IDs less than
<code class="docutils literal"><span class="pre">NUM_PREDEF_TYPE_IDS</span></code> represent predefined types (<code class="docutils literal"><span class="pre">void</span></code>, <code class="docutils literal"><span class="pre">float</span></code>, etc.),
while other “user-defined” type IDs are assigned consecutively from
<code class="docutils literal"><span class="pre">NUM_PREDEF_TYPE_IDS</span></code> upward as the types are encountered.  The AST file has
an associated mapping from the user-defined types block to the location within
the types block where the serialized representation of that type resides,
enabling lazy deserialization of types.  When a type is referenced from within
the AST file, that reference is encoded using the type ID shifted left by 3
bits.  The lower three bits are used to represent the <code class="docutils literal"><span class="pre">const</span></code>, <code class="docutils literal"><span class="pre">volatile</span></code>,
and <code class="docutils literal"><span class="pre">restrict</span></code> qualifiers, as in Clang’s <a class="reference internal" href="InternalsManual.html#qualtype"><span class="std std-ref">QualType</span></a> class.</p>
</div>
<div class="section" id="declarations-block">
<span id="pchinternals-decls"></span><h3><a class="toc-backref" href="#id8">Declarations Block</a><a class="headerlink" href="#declarations-block" title="Permalink to this headline"></a></h3>
<p>The declarations block contains the serialized representation of all of the
declarations referenced in the translation unit.  Each Clang declaration node
(<code class="docutils literal"><span class="pre">VarDecl</span></code>, <code class="docutils literal"><span class="pre">FunctionDecl</span></code>, etc.) has a corresponding record type in the
AST file.  When declarations are deserialized from the AST file, the data
within the record is used to build and populate a new instance of the
corresponding <code class="docutils literal"><span class="pre">Decl</span></code> node.  As with types, each declaration node has a
numeric ID that is used to refer to that declaration within the AST file.  In
addition, a lookup table provides a mapping from that numeric ID to the offset
within the precompiled header where that declaration is described.</p>
<p>Declarations in Clang’s abstract syntax trees are stored hierarchically.  At
the top of the hierarchy is the translation unit (<code class="docutils literal"><span class="pre">TranslationUnitDecl</span></code>),
which contains all of the declarations in the translation unit but is not
actually written as a specific declaration node.  Its child declarations (such
as functions or struct types) may also contain other declarations inside them,
and so on.  Within Clang, each declaration is stored within a <a class="reference internal" href="InternalsManual.html#declcontext"><span class="std std-ref">declaration
context</span></a>, as represented by the <code class="docutils literal"><span class="pre">DeclContext</span></code> class.
Declaration contexts provide the mechanism to perform name lookup within a
given declaration (e.g., find the member named <code class="docutils literal"><span class="pre">x</span></code> in a structure) and
iterate over the declarations stored within a context (e.g., iterate over all
of the fields of a structure for structure layout).</p>
<p>In Clang’s AST file format, deserializing a declaration that is a
<code class="docutils literal"><span class="pre">DeclContext</span></code> is a separate operation from deserializing all of the
declarations stored within that declaration context.  Therefore, Clang will
deserialize the translation unit declaration without deserializing the
declarations within that translation unit.  When required, the declarations
stored within a declaration context will be deserialized.  There are two
representations of the declarations within a declaration context, which
correspond to the name-lookup and iteration behavior described above:</p>
<ul class="simple">
<li>When the front end performs name lookup to find a name <code class="docutils literal"><span class="pre">x</span></code> within a given
declaration context (for example, during semantic analysis of the expression
<code class="docutils literal"><span class="pre">p-&gt;x</span></code>, where <code class="docutils literal"><span class="pre">p</span></code>’s type is defined in the precompiled header), Clang
refers to an on-disk hash table that maps from the names within that
declaration context to the declaration IDs that represent each visible
declaration with that name.  The actual declarations will then be
deserialized to provide the results of name lookup.</li>
<li>When the front end performs iteration over all of the declarations within a
declaration context, all of those declarations are immediately
de-serialized.  For large declaration contexts (e.g., the translation unit),
this operation is expensive; however, large declaration contexts are not
traversed in normal compilation, since such a traversal is unnecessary.
However, it is common for the code generator and semantic analysis to
traverse declaration contexts for structs, classes, unions, and
enumerations, although those contexts contain relatively few declarations in
the common case.</li>
</ul>
</div>
<div class="section" id="statements-and-expressions">
<h3><a class="toc-backref" href="#id9">Statements and Expressions</a><a class="headerlink" href="#statements-and-expressions" title="Permalink to this headline"></a></h3>
<p>Statements and expressions are stored in the AST file in both the <a class="reference internal" href="#pchinternals-types"><span class="std std-ref">types</span></a> and the <a class="reference internal" href="#pchinternals-decls"><span class="std std-ref">declarations</span></a> blocks,
because every statement or expression will be associated with either a type or
declaration.  The actual statement and expression records are stored
immediately following the declaration or type that owns the statement or
expression.  For example, the statement representing the body of a function
will be stored directly following the declaration of the function.</p>
<p>As with types and declarations, each statement and expression kind in Clang’s
abstract syntax tree (<code class="docutils literal"><span class="pre">ForStmt</span></code>, <code class="docutils literal"><span class="pre">CallExpr</span></code>, etc.) has a corresponding
record type in the AST file, which contains the serialized representation of
that statement or expression.  Each substatement or subexpression within an
expression is stored as a separate record (which keeps most records to a fixed
size).  Within the AST file, the subexpressions of an expression are stored, in
reverse order, prior to the expression that owns those expression, using a form
of <a class="reference external" href="http://en.wikipedia.org/wiki/Reverse_Polish_notation">Reverse Polish Notation</a>.  For example, an
expression <code class="docutils literal"><span class="pre">3</span> <span class="pre">-</span> <span class="pre">4</span> <span class="pre">+</span> <span class="pre">5</span></code> would be represented as follows:</p>
<table border="1" class="docutils">
<colgroup>
<col width="100%" />
</colgroup>
<tbody valign="top">
<tr class="row-odd"><td><code class="docutils literal"><span class="pre">IntegerLiteral(5)</span></code></td>
</tr>
<tr class="row-even"><td><code class="docutils literal"><span class="pre">IntegerLiteral(4)</span></code></td>
</tr>
<tr class="row-odd"><td><code class="docutils literal"><span class="pre">IntegerLiteral(3)</span></code></td>
</tr>
<tr class="row-even"><td><code class="docutils literal"><span class="pre">IntegerLiteral(-)</span></code></td>
</tr>
<tr class="row-odd"><td><code class="docutils literal"><span class="pre">IntegerLiteral(+)</span></code></td>
</tr>
<tr class="row-even"><td><code class="docutils literal"><span class="pre">STOP</span></code></td>
</tr>
</tbody>
</table>
<p>When reading this representation, Clang evaluates each expression record it
encounters, builds the appropriate abstract syntax tree node, and then pushes
that expression on to a stack.  When a record contains <em>N</em> subexpressions —
<code class="docutils literal"><span class="pre">BinaryOperator</span></code> has two of them — those expressions are popped from the
top of the stack.  The special STOP code indicates that we have reached the end
of a serialized expression or statement; other expression or statement records
may follow, but they are part of a different expression.</p>
</div>
<div class="section" id="pchinternals-ident-table">
<span id="identifier-table-block"></span><h3><a class="toc-backref" href="#id10">Identifier Table Block</a><a class="headerlink" href="#pchinternals-ident-table" title="Permalink to this headline"></a></h3>
<p>The identifier table block contains an on-disk hash table that maps each
identifier mentioned within the AST file to the serialized representation of
the identifier’s information (e.g, the <code class="docutils literal"><span class="pre">IdentifierInfo</span></code> structure).  The
serialized representation contains:</p>
<ul class="simple">
<li>The actual identifier string.</li>
<li>Flags that describe whether this identifier is the name of a built-in, a
poisoned identifier, an extension token, or a macro.</li>
<li>If the identifier names a macro, the offset of the macro definition within
the <a class="reference internal" href="#pchinternals-preprocessor"><span class="std std-ref">Preprocessor Block</span></a>.</li>
<li>If the identifier names one or more declarations visible from translation
unit scope, the <a class="reference internal" href="#pchinternals-decls"><span class="std std-ref">declaration IDs</span></a> of these
declarations.</li>
</ul>
<p>When an AST file is loaded, the AST file reader mechanism introduces itself
into the identifier table as an external lookup source.  Thus, when the user
program refers to an identifier that has not yet been seen, Clang will perform
a lookup into the identifier table.  If an identifier is found, its contents
(macro definitions, flags, top-level declarations, etc.) will be deserialized,
at which point the corresponding <code class="docutils literal"><span class="pre">IdentifierInfo</span></code> structure will have the
same contents it would have after parsing the headers in the AST file.</p>
<p>Within the AST file, the identifiers used to name declarations are represented
with an integral value.  A separate table provides a mapping from this integral
value (the identifier ID) to the location within the on-disk hash table where
that identifier is stored.  This mapping is used when deserializing the name of
a declaration, the identifier of a token, or any other construct in the AST
file that refers to a name.</p>
</div>
<div class="section" id="method-pool-block">
<span id="pchinternals-method-pool"></span><h3><a class="toc-backref" href="#id11">Method Pool Block</a><a class="headerlink" href="#method-pool-block" title="Permalink to this headline"></a></h3>
<p>The method pool block is represented as an on-disk hash table that serves two
purposes: it provides a mapping from the names of Objective-C selectors to the
set of Objective-C instance and class methods that have that particular
selector (which is required for semantic analysis in Objective-C) and also
stores all of the selectors used by entities within the AST file.  The design
of the method pool is similar to that of the <a class="reference internal" href="#pchinternals-ident-table"><span class="std std-ref">identifier table</span></a>: the first time a particular selector is formed
during the compilation of the program, Clang will search in the on-disk hash
table of selectors; if found, Clang will read the Objective-C methods
associated with that selector into the appropriate front-end data structure
(<code class="docutils literal"><span class="pre">Sema::InstanceMethodPool</span></code> and <code class="docutils literal"><span class="pre">Sema::FactoryMethodPool</span></code> for instance and
class methods, respectively).</p>
<p>As with identifiers, selectors are represented by numeric values within the AST
file.  A separate index maps these numeric selector values to the offset of the
selector within the on-disk hash table, and will be used when de-serializing an
Objective-C method declaration (or other Objective-C construct) that refers to
the selector.</p>
</div>
</div>
<div class="section" id="ast-reader-integration-points">
<h2><a class="toc-backref" href="#id12">AST Reader Integration Points</a><a class="headerlink" href="#ast-reader-integration-points" title="Permalink to this headline"></a></h2>
<p>The “lazy” deserialization behavior of AST files requires their integration
into several completely different submodules of Clang.  For example, lazily
deserializing the declarations during name lookup requires that the name-lookup
routines be able to query the AST file to find entities stored there.</p>
<p>For each Clang data structure that requires direct interaction with the AST
reader logic, there is an abstract class that provides the interface between
the two modules.  The <code class="docutils literal"><span class="pre">ASTReader</span></code> class, which handles the loading of an AST
file, inherits from all of these abstract classes to provide lazy
deserialization of Clang’s data structures.  <code class="docutils literal"><span class="pre">ASTReader</span></code> implements the
following abstract classes:</p>
<dl class="docutils">
<dt><code class="docutils literal"><span class="pre">ExternalSLocEntrySource</span></code></dt>
<dd>This abstract interface is associated with the <code class="docutils literal"><span class="pre">SourceManager</span></code> class, and
is used whenever the <a class="reference internal" href="#pchinternals-sourcemgr"><span class="std std-ref">source manager</span></a> needs to
load the details of a file, buffer, or macro instantiation.</dd>
<dt><code class="docutils literal"><span class="pre">IdentifierInfoLookup</span></code></dt>
<dd>This abstract interface is associated with the <code class="docutils literal"><span class="pre">IdentifierTable</span></code> class, and
is used whenever the program source refers to an identifier that has not yet
been seen.  In this case, the AST reader searches for this identifier within
its <a class="reference internal" href="#pchinternals-ident-table"><span class="std std-ref">identifier table</span></a> to load any top-level
declarations or macros associated with that identifier.</dd>
<dt><code class="docutils literal"><span class="pre">ExternalASTSource</span></code></dt>
<dd>This abstract interface is associated with the <code class="docutils literal"><span class="pre">ASTContext</span></code> class, and is
used whenever the abstract syntax tree nodes need to loaded from the AST
file.  It provides the ability to de-serialize declarations and types
identified by their numeric values, read the bodies of functions when
required, and read the declarations stored within a declaration context
(either for iteration or for name lookup).</dd>
<dt><code class="docutils literal"><span class="pre">ExternalSemaSource</span></code></dt>
<dd>This abstract interface is associated with the <code class="docutils literal"><span class="pre">Sema</span></code> class, and is used
whenever semantic analysis needs to read information from the <a class="reference internal" href="#pchinternals-method-pool"><span class="std std-ref">global
method pool</span></a>.</dd>
</dl>
</div>
<div class="section" id="chained-precompiled-headers">
<span id="pchinternals-chained"></span><h2><a class="toc-backref" href="#id13">Chained precompiled headers</a><a class="headerlink" href="#chained-precompiled-headers" title="Permalink to this headline"></a></h2>
<p>Chained precompiled headers were initially intended to improve the performance
of IDE-centric operations such as syntax highlighting and code completion while
a particular source file is being edited by the user.  To minimize the amount
of reparsing required after a change to the file, a form of precompiled header
— called a precompiled <em>preamble</em> — is automatically generated by parsing
all of the headers in the source file, up to and including the last
<code class="docutils literal"><span class="pre">#include</span></code>.  When only the source file changes (and none of the headers it
depends on), reparsing of that source file can use the precompiled preamble and
start parsing after the <code class="docutils literal"><span class="pre">#include</span></code>s, so parsing time is proportional to the
size of the source file (rather than all of its includes).  However, the
compilation of that translation unit may already use a precompiled header: in
this case, Clang will create the precompiled preamble as a chained precompiled
header that refers to the original precompiled header.  This drastically
reduces the time needed to serialize the precompiled preamble for use in
reparsing.</p>
<p>Chained precompiled headers get their name because each precompiled header can
depend on one other precompiled header, forming a chain of dependencies.  A
translation unit will then include the precompiled header that starts the chain
(i.e., nothing depends on it).  This linearity of dependencies is important for
the semantic model of chained precompiled headers, because the most-recent
precompiled header can provide information that overrides the information
provided by the precompiled headers it depends on, just like a header file
<code class="docutils literal"><span class="pre">B.h</span></code> that includes another header <code class="docutils literal"><span class="pre">A.h</span></code> can modify the state produced by
parsing <code class="docutils literal"><span class="pre">A.h</span></code>, e.g., by <code class="docutils literal"><span class="pre">#undef</span></code>’ing a macro defined in <code class="docutils literal"><span class="pre">A.h</span></code>.</p>
<p>There are several ways in which chained precompiled headers generalize the AST
file model:</p>
<dl class="docutils">
<dt>Numbering of IDs</dt>
<dd>Many different kinds of entities — identifiers, declarations, types, etc.
— have ID numbers that start at 1 or some other predefined constant and
grow upward.  Each precompiled header records the maximum ID number it has
assigned in each category.  Then, when a new precompiled header is generated
that depends on (chains to) another precompiled header, it will start
counting at the next available ID number.  This way, one can determine, given
an ID number, which AST file actually contains the entity.</dd>
<dt>Name lookup</dt>
<dd>When writing a chained precompiled header, Clang attempts to write only
information that has changed from the precompiled header on which it is
based.  This changes the lookup algorithm for the various tables, such as the
<a class="reference internal" href="#pchinternals-ident-table"><span class="std std-ref">identifier table</span></a>: the search starts at the
most-recent precompiled header.  If no entry is found, lookup then proceeds
to the identifier table in the precompiled header it depends on, and so one.
Once a lookup succeeds, that result is considered definitive, overriding any
results from earlier precompiled headers.</dd>
<dt>Update records</dt>
<dd>There are various ways in which a later precompiled header can modify the
entities described in an earlier precompiled header.  For example, later
precompiled headers can add entries into the various name-lookup tables for
the translation unit or namespaces, or add new categories to an Objective-C
class.  Each of these updates is captured in an “update record” that is
stored in the chained precompiled header file and will be loaded along with
the original entity.</dd>
</dl>
</div>
<div class="section" id="modules">
<span id="pchinternals-modules"></span><h2><a class="toc-backref" href="#id14">Modules</a><a class="headerlink" href="#modules" title="Permalink to this headline"></a></h2>
<p>Modules generalize the chained precompiled header model yet further, from a
linear chain of precompiled headers to an arbitrary directed acyclic graph
(DAG) of AST files.  All of the same techniques used to make chained
precompiled headers work — ID number, name lookup, update records — are
shared with modules.  However, the DAG nature of modules introduce a number of
additional complications to the model:</p>
<dl class="docutils">
<dt>Numbering of IDs</dt>
<dd>The simple, linear numbering scheme used in chained precompiled headers falls
apart with the module DAG, because different modules may end up with
different numbering schemes for entities they imported from common shared
modules.  To account for this, each module file provides information about
which modules it depends on and which ID numbers it assigned to the entities
in those modules, as well as which ID numbers it took for its own new
entities.  The AST reader then maps these “local” ID numbers into a “global”
ID number space for the current translation unit, providing a 1-1 mapping
between entities (in whatever AST file they inhabit) and global ID numbers.
If that translation unit is then serialized into an AST file, this mapping
will be stored for use when the AST file is imported.</dd>
<dt>Declaration merging</dt>
<dd>It is possible for a given entity (from the language’s perspective) to be
declared multiple times in different places.  For example, two different
headers can have the declaration of <code class="docutils literal"><span class="pre">printf</span></code> or could forward-declare
<code class="docutils literal"><span class="pre">struct</span> <span class="pre">stat</span></code>.  If each of those headers is included in a module, and some
third party imports both of those modules, there is a potentially serious
problem: name lookup for <code class="docutils literal"><span class="pre">printf</span></code> or <code class="docutils literal"><span class="pre">struct</span> <span class="pre">stat</span></code> will find both
declarations, but the AST nodes are unrelated.  This would result in a
compilation error, due to an ambiguity in name lookup.  Therefore, the AST
reader performs declaration merging according to the appropriate language
semantics, ensuring that the two disjoint declarations are merged into a
single redeclaration chain (with a common canonical declaration), so that it
is as if one of the headers had been included before the other.</dd>
<dt>Name Visibility</dt>
<dd>Modules allow certain names that occur during module creation to be “hidden”,
so that they are not part of the public interface of the module and are not
visible to its clients.  The AST reader maintains a “visible” bit on various
AST nodes (declarations, macros, etc.) to indicate whether that particular
AST node is currently visible; the various name lookup mechanisms in Clang
inspect the visible bit to determine whether that entity, which is still in
the AST (because other, visible AST nodes may depend on it), can actually be
found by name lookup.  When a new (sub)module is imported, it may make
existing, non-visible, already-deserialized AST nodes visible; it is the
responsibility of the AST reader to find and update these AST nodes when it
is notified of the import.</dd>
</dl>
</div>
</div>


      </div>
      <div class="bottomnav" role="navigation" aria-label="bottom navigation">
      
        <p>
        «&#160;&#160;<a href="PTHInternals.html">Pretokenized Headers (PTH)</a>
        &#160;&#160;::&#160;&#160;
        <a class="uplink" href="index.html">Contents</a>
        &#160;&#160;::&#160;&#160;
        <a href="ItaniumMangleAbiTags.html">ABI tags</a>&#160;&#160;»
        </p>

      </div>

    <div class="footer" role="contentinfo">
        &#169; Copyright 2007-2018, The Clang Team.
      Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.6.7.
    </div>
  </body>
</html>