/usr/share/doc/diffutils-doc/Comparison.html is in diffutils-doc 1:3.3-3.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
| <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- This manual is for GNU Diffutils
(version 3.3, 23 March 2013),
and documents the GNU diff, diff3,
sdiff, and cmp commands for showing the
differences between files and the GNU patch command for
using their output to update files.
Copyright (C) 1992-1994, 1998, 2001-2002, 2004, 2006, 2009-2013 Free
Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
A copy of the license is included in the section entitled
"GNU Free Documentation License." -->
<!-- Created by GNU Texinfo 6.0, http://www.gnu.org/software/texinfo/ -->
<head>
<title>Comparing and Merging Files: Comparison</title>
<meta name="description" content="Comparing and Merging Files: Comparison">
<meta name="keywords" content="Comparing and Merging Files: Comparison">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="index.html#Top" rel="start" title="Top">
<link href="Index.html#Index" rel="index" title="Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="index.html#Top" rel="up" title="Top">
<link href="Output-Formats.html#Output-Formats" rel="next" title="Output Formats">
<link href="Overview.html#Overview" rel="prev" title="Overview">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smalllisp {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nocodebreak {white-space: nowrap}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: serif; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
ul.no-bullet {list-style: none}
-->
</style>
</head>
<body lang="en">
<a name="Comparison"></a>
<div class="header">
<p>
Next: <a href="Output-Formats.html#Output-Formats" accesskey="n" rel="next">Output Formats</a>, Previous: <a href="Overview.html#Overview" accesskey="p" rel="prev">Overview</a>, Up: <a href="index.html#Top" accesskey="u" rel="up">Top</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="What-Comparison-Means"></a>
<h2 class="chapter">1 What Comparison Means</h2>
<a name="index-introduction"></a>
<p>There are several ways to think about the differences between two files.
One way to think of the differences is as a series of lines that were
deleted from, inserted in, or changed in one file to produce the other
file. <code>diff</code> compares two files line by line, finds groups of
lines that differ, and reports each group of differing lines. It can
report the differing lines in several formats, which have different
purposes.
</p>
<p><acronym>GNU</acronym> <code>diff</code> can show whether files are different
without detailing the differences. It also provides ways to suppress
certain kinds of differences that are not important to you. Most
commonly, such differences are changes in the amount of white space
between words or lines. <code>diff</code> also provides ways to suppress
differences in alphabetic case or in lines that match a regular
expression that you provide. These options can accumulate; for
example, you can ignore changes in both white space and alphabetic
case.
</p>
<p>Another way to think of the differences between two files is as a
sequence of pairs of bytes that can be either identical or
different. <code>cmp</code> reports the differences between two files
byte by byte, instead of line by line. As a result, it is often
more useful than <code>diff</code> for comparing binary files. For text
files, <code>cmp</code> is useful mainly when you want to know only whether
two files are identical, or whether one file is a prefix of the other.
</p>
<p>To illustrate the effect that considering changes byte by byte
can have compared with considering them line by line, think of what
happens if a single newline character is added to the beginning of a
file. If that file is then compared with an otherwise identical file
that lacks the newline at the beginning, <code>diff</code> will report that a
blank line has been added to the file, while <code>cmp</code> will report that
almost every byte of the two files differs.
</p>
<p><code>diff3</code> normally compares three input files line by line, finds
groups of lines that differ, and reports each group of differing lines.
Its output is designed to make it easy to inspect two different sets of
changes to the same file.
</p>
<table class="menu" border="0" cellspacing="0">
<tr><td align="left" valign="top">• <a href="#Hunks" accesskey="1">Hunks</a>:</td><td> </td><td align="left" valign="top">Groups of differing lines.
</td></tr>
<tr><td align="left" valign="top">• <a href="#White-Space" accesskey="2">White Space</a>:</td><td> </td><td align="left" valign="top">Suppressing differences in white space.
</td></tr>
<tr><td align="left" valign="top">• <a href="#Blank-Lines" accesskey="3">Blank Lines</a>:</td><td> </td><td align="left" valign="top">Suppressing differences whose lines are all blank.
</td></tr>
<tr><td align="left" valign="top">• <a href="#Specified-Lines" accesskey="4">Specified Lines</a>:</td><td> </td><td align="left" valign="top">Suppressing differences whose lines all match a pattern.
</td></tr>
<tr><td align="left" valign="top">• <a href="#Case-Folding" accesskey="5">Case Folding</a>:</td><td> </td><td align="left" valign="top">Suppressing differences in alphabetic case.
</td></tr>
<tr><td align="left" valign="top">• <a href="#Brief" accesskey="6">Brief</a>:</td><td> </td><td align="left" valign="top">Summarizing which files are different.
</td></tr>
<tr><td align="left" valign="top">• <a href="#Binary" accesskey="7">Binary</a>:</td><td> </td><td align="left" valign="top">Comparing binary files or forcing text comparisons.
</td></tr>
</table>
<hr>
<a name="Hunks"></a>
<div class="header">
<p>
Next: <a href="#White-Space" accesskey="n" rel="next">White Space</a>, Up: <a href="#Comparison" accesskey="u" rel="up">Comparison</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Hunks-1"></a>
<h3 class="section">1.1 Hunks</h3>
<a name="index-hunks"></a>
<p>When comparing two files, <code>diff</code> finds sequences of lines common to
both files, interspersed with groups of differing lines called
<em>hunks</em>. Comparing two identical files yields one sequence of
common lines and no hunks, because no lines differ. Comparing two
entirely different files yields no common lines and one large hunk that
contains all lines of both files. In general, there are many ways to
match up lines between two given files. <code>diff</code> tries to minimize
the total hunk size by finding large sequences of common lines
interspersed with small hunks of differing lines.
</p>
<p>For example, suppose the file <samp>F</samp> contains the three lines
‘<samp>a</samp>’, ‘<samp>b</samp>’, ‘<samp>c</samp>’, and the file <samp>G</samp> contains the same
three lines in reverse order ‘<samp>c</samp>’, ‘<samp>b</samp>’, ‘<samp>a</samp>’. If
<code>diff</code> finds the line ‘<samp>c</samp>’ as common, then the command
‘<samp>diff F G</samp>’ produces this output:
</p>
<div class="example">
<pre class="example">1,2d0
< a
< b
3a2,3
> b
> a
</pre></div>
<p>But if <code>diff</code> notices the common line ‘<samp>b</samp>’ instead, it produces
this output:
</p>
<div class="example">
<pre class="example">1c1
< a
---
> c
3c3
< c
---
> a
</pre></div>
<p>It is also possible to find ‘<samp>a</samp>’ as the common line. <code>diff</code>
does not always find an optimal matching between the files; it takes
shortcuts to run faster. But its output is usually close to the
shortest possible. You can adjust this tradeoff with the
<samp>--minimal</samp> (<samp>-d</samp>) option (see <a href="diff-Performance.html#diff-Performance">diff Performance</a>).
</p>
<hr>
<a name="White-Space"></a>
<div class="header">
<p>
Next: <a href="#Blank-Lines" accesskey="n" rel="next">Blank Lines</a>, Previous: <a href="#Hunks" accesskey="p" rel="prev">Hunks</a>, Up: <a href="#Comparison" accesskey="u" rel="up">Comparison</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Suppressing-Differences-in-Blank-and-Tab-Spacing"></a>
<h3 class="section">1.2 Suppressing Differences in Blank and Tab Spacing</h3>
<a name="index-blank-and-tab-difference-suppression"></a>
<a name="index-tab-and-blank-difference-suppression"></a>
<p>The <samp>--ignore-tab-expansion</samp> (<samp>-E</samp>) option ignores the
distinction between tabs and spaces on input. A tab is considered to be
equivalent to the number of spaces to the next tab stop (see <a href="Adjusting-Output.html#Tabs">Tabs</a>).
</p>
<p>The <samp>--ignore-trailing-space</samp> (<samp>-Z</samp>) option ignores white
space at line end.
</p>
<p>The <samp>--ignore-space-change</samp> (<samp>-b</samp>) option is stronger than
<samp>-E</samp> and <samp>-Z</samp> combined.
It ignores white space at line end, and considers all other sequences of
one or more white space characters within a line to be equivalent. With this
option, <code>diff</code> considers the following two lines to be equivalent,
where ‘<samp>$</samp>’ denotes the line end:
</p>
<div class="example">
<pre class="example">Here lyeth muche rychnesse in lytell space. -- John Heywood$
Here lyeth muche rychnesse in lytell space. -- John Heywood $
</pre></div>
<p>The <samp>--ignore-all-space</samp> (<samp>-w</samp>) option is stronger still.
It ignores differences even if one line has white space where
the other line has none. <em>White space</em> characters include
tab, vertical tab, form feed, carriage return, and space;
some locales may define additional characters to be white space.
With this option, <code>diff</code> considers the
following two lines to be equivalent, where ‘<samp>$</samp>’ denotes the line
end and ‘<samp>^M</samp>’ denotes a carriage return:
</p>
<div class="example">
<pre class="example">Here lyeth muche rychnesse in lytell space.-- John Heywood$
He relyeth much erychnes seinly tells pace. --John Heywood ^M$
</pre></div>
<p>For many other programs newline is also a white space character, but
<code>diff</code> is a line-oriented program and a newline character
always ends a line. Hence the <samp>-w</samp> or
<samp>--ignore-all-space</samp> option does not ignore newline-related
changes; it ignores only other white space changes.
</p>
<hr>
<a name="Blank-Lines"></a>
<div class="header">
<p>
Next: <a href="#Specified-Lines" accesskey="n" rel="next">Specified Lines</a>, Previous: <a href="#White-Space" accesskey="p" rel="prev">White Space</a>, Up: <a href="#Comparison" accesskey="u" rel="up">Comparison</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Suppressing-Differences-Whose-Lines-Are-All-Blank"></a>
<h3 class="section">1.3 Suppressing Differences Whose Lines Are All Blank</h3>
<a name="index-blank-line-difference-suppression"></a>
<p>The <samp>--ignore-blank-lines</samp> (<samp>-B</samp>) option ignores changes
that consist entirely of blank lines. With this option, for example, a
file containing
</p><div class="example">
<pre class="example">1. A point is that which has no part.
2. A line is breadthless length.
-- Euclid, The Elements, I
</pre></div>
<p>is considered identical to a file containing
</p><div class="example">
<pre class="example">1. A point is that which has no part.
2. A line is breadthless length.
-- Euclid, The Elements, I
</pre></div>
<p>Normally this option affects only lines that are completely empty, but
if you also specify an option that ignores trailing spaces,
lines are also affected if they look empty but contain white space.
In other words, <samp>-B</samp> is equivalent to ‘<samp>-I '^$'</samp>’ by
default, but it is equivalent to <samp>-I '^[[:space:]]*$'</samp> if
<samp>-b</samp>, <samp>-w</samp> or <samp>-Z</samp> is also specified.
</p>
<hr>
<a name="Specified-Lines"></a>
<div class="header">
<p>
Next: <a href="#Case-Folding" accesskey="n" rel="next">Case Folding</a>, Previous: <a href="#Blank-Lines" accesskey="p" rel="prev">Blank Lines</a>, Up: <a href="#Comparison" accesskey="u" rel="up">Comparison</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Suppressing-Differences-Whose-Lines-All-Match-a-Regular-Expression"></a>
<h3 class="section">1.4 Suppressing Differences Whose Lines All Match a Regular Expression</h3>
<a name="index-regular-expression-suppression"></a>
<p>To ignore insertions and deletions of lines that match a
<code>grep</code>-style regular expression, use the
<samp>--ignore-matching-lines=<var>regexp</var></samp> (<samp>-I <var>regexp</var></samp>) option.
You should escape
regular expressions that contain shell metacharacters to prevent the
shell from expanding them. For example, ‘<samp>diff -I '^[[:digit:]]'</samp>’ ignores
all changes to lines beginning with a digit.
</p>
<p>However, <samp>-I</samp> only ignores the insertion or deletion of lines that
contain the regular expression if every changed line in the hunk—every
insertion and every deletion—matches the regular expression. In other
words, for each nonignorable change, <code>diff</code> prints the complete set
of changes in its vicinity, including the ignorable ones.
</p>
<p>You can specify more than one regular expression for lines to ignore by
using more than one <samp>-I</samp> option. <code>diff</code> tries to match each
line against each regular expression.
</p>
<hr>
<a name="Case-Folding"></a>
<div class="header">
<p>
Next: <a href="#Brief" accesskey="n" rel="next">Brief</a>, Previous: <a href="#Specified-Lines" accesskey="p" rel="prev">Specified Lines</a>, Up: <a href="#Comparison" accesskey="u" rel="up">Comparison</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Suppressing-Case-Differences"></a>
<h3 class="section">1.5 Suppressing Case Differences</h3>
<a name="index-case-difference-suppression"></a>
<p><acronym>GNU</acronym> <code>diff</code> can treat lower case letters as
equivalent to their upper case counterparts, so that, for example, it
considers ‘<samp>Funky Stuff</samp>’, ‘<samp>funky STUFF</samp>’, and ‘<samp>fUNKy
stuFf</samp>’ to all be the same. To request this, use the <samp>-i</samp> or
<samp>--ignore-case</samp> option.
</p>
<hr>
<a name="Brief"></a>
<div class="header">
<p>
Next: <a href="#Binary" accesskey="n" rel="next">Binary</a>, Previous: <a href="#Case-Folding" accesskey="p" rel="prev">Case Folding</a>, Up: <a href="#Comparison" accesskey="u" rel="up">Comparison</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Summarizing-Which-Files-Differ"></a>
<h3 class="section">1.6 Summarizing Which Files Differ</h3>
<a name="index-summarizing-which-files-differ"></a>
<a name="index-brief-difference-reports"></a>
<p>When you only want to find out whether files are different, and you
don’t care what the differences are, you can use the summary output
format. In this format, instead of showing the differences between the
files, <code>diff</code> simply reports whether files differ. The
<samp>--brief</samp> (<samp>-q</samp>) option selects this output format.
</p>
<p>This format is especially useful when comparing the contents of two
directories. It is also much faster than doing the normal line by line
comparisons, because <code>diff</code> can stop analyzing the files as soon as
it knows that there are any differences.
</p>
<p>You can also get a brief indication of whether two files differ by using
<code>cmp</code>. For files that are identical, <code>cmp</code> produces no
output. When the files differ, by default, <code>cmp</code> outputs the byte
and line number where the first difference occurs, or reports that one
file is a prefix of the other. You can use
the <samp>-s</samp>, <samp>--quiet</samp>, or <samp>--silent</samp> option to
suppress that information, so that <code>cmp</code>
produces no output and reports whether the files differ using only its
exit status (see <a href="Invoking-cmp.html#Invoking-cmp">Invoking cmp</a>).
</p>
<p>Unlike <code>diff</code>, <code>cmp</code> cannot compare directories; it can only
compare two files.
</p>
<hr>
<a name="Binary"></a>
<div class="header">
<p>
Previous: <a href="#Brief" accesskey="p" rel="prev">Brief</a>, Up: <a href="#Comparison" accesskey="u" rel="up">Comparison</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
<a name="Binary-Files-and-Forcing-Text-Comparisons"></a>
<h3 class="section">1.7 Binary Files and Forcing Text Comparisons</h3>
<a name="index-binary-file-diff"></a>
<a name="index-text-versus-binary-diff"></a>
<p>If <code>diff</code> thinks that either of the two files it is comparing is
binary (a non-text file), it normally treats that pair of files much as
if the summary output format had been selected (see <a href="#Brief">Brief</a>), and
reports only that the binary files are different. This is because line
by line comparisons are usually not meaningful for binary files.
</p>
<p><code>diff</code> determines whether a file is text or binary by checking the
first few bytes in the file; the exact number of bytes is system
dependent, but it is typically several thousand. If every byte in
that part of the file is non-null, <code>diff</code> considers the file to be
text; otherwise it considers the file to be binary.
</p>
<p>Sometimes you might want to force <code>diff</code> to consider files to be
text. For example, you might be comparing text files that contain
null characters; <code>diff</code> would erroneously decide that those are
non-text files. Or you might be comparing documents that are in a
format used by a word processing system that uses null characters to
indicate special formatting. You can force <code>diff</code> to consider all
files to be text files, and compare them line by line, by using the
<samp>--text</samp> (<samp>-a</samp>) option. If the files you compare using this
option do not in fact contain text, they will probably contain few
newline characters, and the <code>diff</code> output will consist of hunks
showing differences between long lines of whatever characters the files
contain.
</p>
<p>You can also force <code>diff</code> to report only whether files differ
(but not how). Use the <samp>--brief</samp> (<samp>-q</samp>) option for
this.
</p>
<p>Normally, differing binary files count as trouble because the
resulting <code>diff</code> output does not capture all the differences.
This trouble causes <code>diff</code> to exit with status 2. However,
this trouble cannot occur with the <samp>--text</samp> (<samp>-a</samp>)
option, or with the <samp>--brief</samp> (<samp>-q</samp>) option, as these
options both cause <code>diff</code> to generate a form of output that
represents differences as requested.
</p>
<p>In operating systems that distinguish between text and binary files,
<code>diff</code> normally reads and writes all data as text. Use the
<samp>--binary</samp> option to force <code>diff</code> to read and write binary
data instead. This option has no effect on a <acronym>POSIX</acronym>-compliant system
like <acronym>GNU</acronym> or traditional Unix. However, many personal computer
operating systems represent the end of a line with a carriage return
followed by a newline. On such systems, <code>diff</code> normally ignores
these carriage returns on input and generates them at the end of each
output line, but with the <samp>--binary</samp> option <code>diff</code> treats
each carriage return as just another input character, and does not
generate a carriage return at the end of each output line. This can be
useful when dealing with non-text files that are meant to be
interchanged with <acronym>POSIX</acronym>-compliant systems.
</p>
<p>The <samp>--strip-trailing-cr</samp> causes <code>diff</code> to treat input
lines that end in carriage return followed by newline as if they end
in plain newline. This can be useful when comparing text that is
imperfectly imported from many personal computer operating systems.
This option affects how lines are read, which in turn affects how they
are compared and output.
</p>
<p>If you want to compare two files byte by byte, you can use the
<code>cmp</code> program with the <samp>--verbose</samp> (<samp>-l</samp>)
option to show the values of each differing byte in the two files.
With <acronym>GNU</acronym> <code>cmp</code>, you can also use the <samp>-b</samp> or
<samp>--print-bytes</samp> option to show the <acronym>ASCII</acronym> representation of
those bytes. See <a href="Invoking-cmp.html#Invoking-cmp">Invoking cmp</a>, for more information.
</p>
<p>If <code>diff3</code> thinks that any of the files it is comparing is binary
(a non-text file), it normally reports an error, because such
comparisons are usually not useful. <code>diff3</code> uses the same test as
<code>diff</code> to decide whether a file is binary. As with <code>diff</code>, if
the input files contain a few non-text bytes but otherwise are like
text files, you can force <code>diff3</code> to consider all files to be text
files and compare them line by line by using the <samp>-a</samp> or
<samp>--text</samp> option.
</p>
<hr>
<div class="header">
<p>
Previous: <a href="#Brief" accesskey="p" rel="prev">Brief</a>, Up: <a href="#Comparison" accesskey="u" rel="up">Comparison</a> [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="Index.html#Index" title="Index" rel="index">Index</a>]</p>
</div>
</body>
</html>
|