/usr/share/doc/png-definitive-guide/html/chapter09.html is in png-definitive-guide 20060430-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<HTML>
<HEAD>
<TITLE>Compression and Filtering (PNG: The Definitive Guide)</TITLE>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<!-- http://www.w3.org/TR/REC-CSS2/box.html -->
<STYLE TYPE="text/css">
P { margin-bottom: 0em }
UL {
margin-bottom: 0em;
margin-top: 0em;
list-style: disc;
}
LI {
padding: 0px 0px 0px 0px;
margin: 0px 0px 0px 0px;
}
</STYLE>
<LINK REV="made" HREF="http://pobox.com/~newt/greg_contact.html">
<!-- Copyright (c) 1999 O'Reilly and Associates. -->
<!-- Copyright (c) 2002-2006 Greg Roelofs. -->
</HEAD>
<body bgcolor="#ffffff" text="#000000">
<hr> <!-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- -->
<a href="chapter08.html"><img width=24 height=13 border=0 align="left"
src="images/prev.png" alt="<-"></a>
<a href="chapter10.html"><img width=24 height=13 border=0 align="right"
src="images/next.png" alt="->"></a>
<div align="center">
<a href="chapter08.html"><font size="-1" color="#000000"
><b>PREVIOUS</b></font></a> <a
href="toc.html"><font size="-1" color="#000000"
><b>CONTENTS</b></font></a> <a
href="chapter10.html"><font size="-1" color="#000000"
><b>NEXT</b></font></a>
</div>
<hr> <!-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- -->
<h1 class="chapter">Chapter 9. Compression and Filtering</h1>
<div class="htmltoc"><h4 class="tochead">Contents:</h4><p>
<a href="#png.ch09.div.1">9.1. Filtering</a><br />
<a href="#png.ch09.div.2">9.2. The Deflate Compression Algorithm</a><br />
<a href="#png.ch09.div.2.1">9.2.1. A Final Word on Patents</a><br />
<a href="#png.ch09.div.3">9.3. Real-World Comparisons</a><br />
<a href="#png.ch09.div.4">9.4. Practical Compression Tips</a><br />
<a href="#png.ch09.div.4.1">9.4.1. Tips for Users</a><br />
<a href="#png.ch09.div.4.2">9.4.2. Tips for Programmers</a><br />
</p></div>
<p>One of PNG's strengths, particularly in comparison to the GIF and TIFF image
formats, is its compression. As I noted in
<a href="chapter01.html">Chapter 1, "An Introduction to PNG"</a>, a primary motivation
driving the design of the Portable Network
Graphics format was to create a replacement for GIF that was not only free
but also an improvement over it in essentially all respects. As a result,
PNG compression is completely lossless--that is, the original image data can be reconstructed exactly, bit for
bit--just as in GIF and most forms of TIFF.<a href="#FOOTNOTE-67">[67]</a>
</p><blockquote class="footnote">
<a name="FOOTNOTE-67" />
<a name="INDEX-672" />
<a name="INDEX-673" />
<p>
[67] And as a corollary, PNG file sizes are usually considerably larger
than ordinary JPEG, since the latter uses lossy compression--that is,
it throws away some information. TIFF also supports JPEG compression as
one of its many options, but the more common methods are lossless and
based on either run-length encoding (RLE) or the same LZW algorithm used
in GIF.
</p>
</blockquote>
<!-- GRR POST-1999 UPDATE -->
<font color="#006600">
<p>
<em>I wrote a longer, more technically detailed chapter on PNG compression
for the</em> <a href="http://www.elsevier-international.com/catalogue/title.cfm?ISBN=0126208611">Lossless Compression Handbook</a>,
<em>edited by Khalid Sayood and published in December 2002 by Academic Press
(now Elsevier Science). It includes more rigorous test data, as well. A
near-final draft is available in PDF format at the following link:</em>
</p>
<!-- was "http://www.amazon.com/exec/obidos/ASIN/0126208611" -->
<!-- was
"http://www.apcatalog.com/cgi-bin/AP?ISBN=0126208611&FORM=FORM2"
should be visible at US site, but usually comes up missing:
"http://www.bhusa.com/apcatalog/us/subindex.asp?isbn=0126208611"
-->
<blockquote>
<p>
<pre><a href=
"LCH-png-chapter.pdf"
>http://www.libpng.org/pub/png/book/LCH-png-chapter.pdf</a></pre>
</p>
</blockquote>
<p>
<em>I will update it to the final version and convert it to HTML format
when time permits.</em>
</p>
</font>
<!-- END GRR POST-1999 UPDATE -->
<div class="sect1"><a name="png.ch09.div.1" />
<h2 class="sect1">9.1. Filtering</h2>
<p><a name="INDEX-674" />We'll look at the compression engine itself shortly, but PNG's performance
is not due solely to an improved compression algorithm. PNG also supports a
precompression step called <em class="emphasis">filtering</em>.
Filtering is a method of reversibly transforming the image data so that the
main compression engine can operate more efficiently. As a simple example,
consider a sequence of bytes increasing uniformly from 1 to 255. Since there
is no repetition in the sequence, it compresses either very poorly or not at
all. But a trivial modification of the sequence--namely, leaving the first
byte alone but replacing each subsequent byte by the difference between it
and its predecessor--transforms the sequence into an extremely compressible
set of 255 identical bytes, each having the value 1.</p>
<p>As a real-life example of this (though still not particularly realistic),
consider the image known as <em class="emphasis">16million.png</em>. This
24-bit, 512 × 32,768 RGB image contains one pixel of every
possible color--more than 16 million of them altogether. As raw data, it
therefore requires 48 MB to
store. Simple PNG-style compression with no filtering brings it down to
36 MB, only a 25% reduction in size. But with filtering turned on, the same
compression engine reduces it to 115,989 bytes, more than 300 times better
than the nonfiltered case, for a total compression factor of 434!<a href="#FOOTNOTE-68">[68]</a>
<em class="emphasis">Zowie</em>.</p><blockquote class="footnote">
<a name="FOOTNOTE-68" /><p>[68] Actually, it gets even better. The dimensions of the image were chosen
for convenient web-browser scrolling, but a 4096 × 4096 version created
by Paul Schmidt is half the size--a mere 59,852 bytes (841 times compression).
And just wait until we get to the chapter on MNG!</p>
</blockquote>
<p>Actual image data is rarely that perfect, but filtering does improve
compression in grayscale and truecolor images, and it can help on some
palette images as well. PNG supports five types of filters, and an encoder
may choose to use a different filter for each row of pixels in the image.
<a href="#png.ch09.tbl.1">Table 9-1</a> lists the five filter types.
</p>
<a name="png.ch09.tbl.1" />
<div class="table" align="center">
<p>
<table width="502" border="0">
<tr>
<td>
<b class="emphasis-bold">Table 9-1.</b>
<i>PNG Filter Types</i>
</td>
</tr>
</table>
</p>
<p>
<table width="502" border="1">
<tr>
<td><b class="emphasis-bold"> Name </b></td>
<td><b class="emphasis-bold">Description</b></td>
</tr>
<tr>
<td> None </td>
<td>Each byte is unchanged.</td>
</tr>
<tr>
<td> Sub </td>
<td><p>Each byte is replaced with the difference between it and the
``corresponding byte'' to its left.</p></td>
</tr>
<tr>
<td> Up </td>
<td><p>Each byte is replaced with the difference between it and the byte
above it (in the previous row, as it was before filtering).
<a name="INDEX-675" /></p></td>
</tr>
<tr>
<td> Average </td>
<td><p>Each byte is replaced with the difference between it and the average
of the corresponding bytes to its left and above it, truncating any
fractional part.
<a name="INDEX-676" /></p></td>
</tr>
<tr>
<td> Paeth </td>
<td><p>Each byte is replaced with the difference between it and the <em class="emphasis">Paeth
predictor</em> of the corresponding bytes to its left, above it, and to
its upper left.
<?x-space 5p?></p></td>
</tr>
</table>
</p>
</div>
<p><a name="INDEX-677" />
<a name="INDEX-678" />
<a name="INDEX-679" />The last method requires some explanation. Invented by Alan Paeth, the
Paeth predictor is computed by first calculating a <em class="emphasis">base value</em>, equal
to the sum of the corresponding bytes to the left and above, minus the byte
to the upper left. (For example, the base value might equal
228 + 228 - 227 = 229.) Then the difference between the base value
and each of the three corresponding bytes is calculated, and the byte that
gave the smallest absolute difference--that is, the one that was closest
to the base value--is used as the predictor and subtracted from the target
byte to give the filtered value. In case of ties, the corresponding byte
to the left has precedence as the predicted value, followed by the one
directly above. Note that all calculations to produce the Paeth predictor
are done using exact integer arithmetic. The final filter calculation, on
the other hand, is done using base-256 modular arithmetic; this is true for
all of the filter types.
</p>
<p>Though the concept is simple, there are quite a few subtleties in the actual
mechanics of filtering. Most important among these is that filtering always
operates on bytes, not pixels. For images with pixels smaller than eight bits,
this means that the filter algorithms actually operate on more than one pixel
at a time; for example, in a 2-bit palette or grayscale image, there are
four pixels per byte. This approach improves the efficiency of decoders
by avoiding bit-level manipulations.</p>
<p>At the other end of the spectrum, large pixels (e.g., 24-bit RGB or 64-bit
RGBA) are also operated on as bytes, but only <em class="emphasis">corresponding</em> bytes are
compared. For any given byte, the corresponding byte to its left is the
one offset by the number of bytes per pixel. This means that red bytes in a
truecolor image are compared with red bytes, green with green, and blue with
blue. If there's an alpha channel, the alpha bytes are always compared;
if the sample depth is 16 bits, upper (most significant) bytes are compared
with upper bytes in the same color channel, and lower bytes are compared with
lower. In other words, similar values will always be compared and operated
on, in hopes of improving compression efficiency. Consider an RGB image,
for example; the red, green, and blue values of any given pixel may be quite
different, but neighboring pairs of red, green, and blue will often be
similar. Thus the transformed bytes will tend to be close to zero even if
the original bytes weren't. This is the real point of filtering: most of
the transformed bytes will cluster around zero, thus giving the compression
engine a smaller, more predictable range of byte values to cope with.
</p>
<p><a name="INDEX-680" />What about edges? If the ``corresponding byte'' to the left or above doesn't
exist, the algorithm does not wrap around and use bytes from the other side
of the image; instead, it treats the missing byte as zero. The wraparound
method was, in fact, considered, but aside from the fact that one cannot wrap
the top edge of the image without completely breaking the ability to stream
and progressively display a PNG image, the designers felt that only a few
images would benefit (and minimally, at that), which did not justify the
potential additional complexity.</p>
<p><a name="INDEX-681" />Interlacing is also a bit of a wrench in the works. For the purposes of
filtering, each interlace pass is treated as a separate image with its own
width and height. For example, in a 256 × 256 interlaced image, the
passes would be treated as seven smaller images with dimensions 32 × 32,
32 × 32, 64 × 32, 64 × 64, 128 × 64, 128 × 128, and
256 × 128, respectively.<a href="#FOOTNOTE-69">[69]</a>
This avoids the nasty problem of how to define corresponding bytes between
rows of different widths.</p><blockquote class="footnote">
<a name="FOOTNOTE-69" /><p>[69] Yes, that adds up to the right number of pixels. (Go ahead, add it
up.) Note that things may not come out quite so cleanly in cases in which
one or both image dimensions are not evenly divisible by eight; the
width of each pass is rounded up, if necessary.</p>
</blockquote>
<p><a name="INDEX-682" />So how does an encoder actually choose the proper filter for each row?
Testing all possible combinations is clearly impossible: even a 20-row
image would require testing over 95 trillion combinations, where ``testing''
would involve filtering and compressing the entire image. A simpler approach,
though still computationally expensive, is to incrementally test-compress each
row, save the smallest result, and repeat for the next row. This amounts to
filtering and compressing the entire image five times, which may be a
reasonable trade-off for an image that will be transmitted and decoded many
times.</p>
<p>But users often have barely enough patience to wait for a single round of
compression, so the PNG development group has come up with a few rules of thumb
(or heuristics) for choosing filters wisely. The first rule is that filters
<a name="INDEX-683" />
<a name="INDEX-684" />are rarely useful on palette images, so don't even bother with them. Note,
however, that one has considerable freedom in choosing how to order entries
in the palette itself, so it is possible that a particular method of ordering
would actually result in image data that benefits significantly from filtering.
No one has yet proven this, however, and the most likely approaches would
involve doing statistics on every pair of pixels in the image. Such
approaches would be quite expensive for larger images.</p>
<p>Filters are also rarely useful on low-bit-depth (grayscale) images, although
there have been rare cases in which promoting such an image to 8 bits and then
filtering has been effective. In general, however, filter type None is best.</p>
<p><a name="INDEX-685" />
<a name="INDEX-686" />
<a name="INDEX-687" />
<a name="INDEX-688" />For grayscale and truecolor images of 8 or more bits per sample, with or
without alpha channels, dynamic filtering is almost always beneficial. The
approach that has by now become standard is known as the <em class="emphasis">minimum sum of
absolute differences</em> heuristic and was first proposed by Lee Daniel Crocker
in February 1995. In this approach, the filtered bytes are treated as signed
values--that is, any value over 127 is treated as negative; 128 becomes
-128 and 255 becomes -1. The absolute value of each is then summed, and
the filter type that produces the smallest sum is chosen. This approach
effectively gives preference to sequences that are close to zero and
therefore is biased against filter type None.</p>
<p><a name="INDEX-689" />A related heuristic--still experimental at the time of this writing--is
to use the <em class="emphasis">weighted</em> sum of absolute differences. The theory, to some
extent based on empirical evidence, is that switching filters too often can
have a deleterious effect on the main compression engine. A better approach
might be to favor the most recently used filter even if its absolute sum of
differences is slightly larger than that of other filters, in order to produce
a more homogeneous data stream for the compressor--in effect, to allow
short-term losses in return for long-term gains. The standard PNG library
contains code to enable this heuristic, but a considerable amount of
experimentation is yet to be done to determine the best combination of
weighting factors, compression levels, and image types.</p>
<p>One can also imagine heuristics involving higher-order distance metrics (e.g.,
root-mean-square sums), sliding averages, and other statistical methods, but
to date there has been little research in this area. Lossless compression is
a necessity for many applications, but cutting-edge research in image
compression tends to focus almost exclusively on lossy methods, since the
payoff there is so much greater. Even within the lossless domain,
preconditioning the data stream is likely to have less effect than changing
the back-end compression algorithm itself. So let's take a look at that
next.
<a name="INDEX-690" /></p>
</div>
</div>
<div class="sect1"><a name="png.ch09.div.2" />
<h2 class="sect1">9.2. The Deflate Compression Algorithm</h2>
<p><a name="INDEX-691" />
<a name="INDEX-692" />
<a name="INDEX-693" />In some ways compression is responsible for the very existence of the Portable
Network Graphics format (recall <a href="chapter01.html">Chapter 1, "An Introduction to PNG"</a>), and it is undoubtedly one of the
most important components of PNG. The PNG specification defines a single
compression method, the <em class="emphasis">deflate</em> algorithm, for all image types.</p>
<p><a name="INDEX-694" />Part of the LZ77 class of compression algorithms, deflate was defined
by PKWARE in 1991 as part of the 1.93a beta version of their PKZIP
<a name="INDEX-695" />
archiver. Independently implemented by Jean-loup Gailly
<a name="INDEX-696" />
and Mark Adler,
<a name="INDEX-696.01-new" />
first for Info-ZIP's Zip and UnZip utilities and shortly
<a name="INDEX-696.02-new" />
thereafter for the GNU <em class="emphasis">gzip</em> utility, the deflate algorithm is
battle-tested and today is probably the most commonly used
file-compression algorithm on the Internet. Although it is not the
best-compressing algorithm known,<a href="#FOOTNOTE-70">[70]</a>
deflate has a very desirable mix of characteristics: high
reliability, good compression, good encoding speed, excellent decoding speed,
minimal overhead on incompressible data, and modest, well-defined memory
footprints for both encoding and decoding.
</p><blockquote class="footnote">
<a name="INDEX-696.03-new" /> <!-- arithmetic coding -->
<a name="INDEX-696.04-new" /> <!-- bzip2 -->
<a name="FOOTNOTE-70" /><p>[70] Arithmetic coding
has been around for a long time and significantly outperforms deflate; the
relatively recently published Burrows-Wheeler block transform coding
(implemented in <em class="emphasis">bzip2</em>, for example) shows considerable promise as
well; and the patented BitJazz condensation method is likewise quite
impressive.</p>
</blockquote>
<p><a name="INDEX-697" />As an LZ77-derived algorithm, deflate is fundamentally based on the concept
of a <em class="emphasis">sliding window</em>. One begins with the premise that many types of
interesting data, from binary computer instructions to source code to
ordinary text to images, are repetitious to varying degrees. The basic
idea of a sliding window is to imagine a window of some width immediately
preceding the current position in the data stream (and therefore
sliding along as the current position is updated), which one can use as
a kind of dictionary to encode subsequent data. For example, if the text
of this chapter is the data stream, then the current position at the very
instant you read this is <em class="emphasis">here</em>. Preceding that point is a little more
than 13,000 bytes of text, which includes, among other things, six copies of
the fragment ``or example'' (whoa, there's another one!). Instead of encoding
such strings as literal text, deflate replaces each with a pair of numbers
indicating its length (in this case, 10 bytes) and the distance back to one
of the previous instances (perhaps 950 bytes between the fifth and sixth).
The greater the length of the string, the greater the savings in encoding it
as a pointer into the window.
</p>
<p><a name="INDEX-698" />
<a name="INDEX-699" />
<a name="INDEX-700" />There are various ways to implement LZ77; the approach used by deflate is
a ``greedy'' algorithm originally devised by James Storer and Thomas
Szymanski--hence its name, LZSS. LZSS employs a look-ahead buffer and
finds the longest match for the buffer within the sliding window. If the
match exceeds a given threshold length, the string is encoded as a
length/distance pair and the buffer is advanced a corresponding amount. If the
longest match is <em class="emphasis">not</em> sufficiently long, the first character in the
look-ahead buffer is output as a literal value, and the buffer is advanced
by one. Either way, the algorithm continues by seeking the longest match
for the new contents of the buffer.
</p>
<p>The deflate algorithm is actually a bit more clever than the preceding
description would suggest. Rather than simply storing the length/distance
pairs and literal bytes as is, it further compresses the data by
<a name="INDEX-701" />
<a name="INDEX-702" />Huffman-encoding the LZ77 output. This approach is generically
referred to as LZH; deflate's uniqueness lies in its method of combining
literals and lengths into a single Huffman tree, its use of both fixed and
dynamic Huffman codes, and its division of the output stream into blocks
so that regions of incompressible data can be stored as is, rather than
expanding significantly, as can happen with the LZW algorithm.</p>
<p><a name="INDEX-703" />The PNG specification further dictates that the deflate data stream
must conform to the zlib 1.0 format. In particular, the size of the
sliding window must be a power of 2 between 256 bytes and 32
kilobytes, inclusive, and a small zlib header and trailer are
required. The latter includes a 32-bit checksum on the
<em class="emphasis">uncompressed</em> data; recall that the compressed stream is already
covered by PNG's 32-bit CRC value in each IDAT chunk.</p>
<p>More detailed explanation of the deflate algorithm and the zlib data
format is beyond the scope of this book, but the full zlib and deflate
<!--
specifications are included in Appendixes D and E, respectively.
-->
specifications are available from
<a href="http://www.zlib.org/zlib_docs.html"
>http://www.zlib.org/zlib_docs.html</a> .
In addition, a reference such as
<em class="emphasis">The Data Compression Book</em>, by Mark
Nelson and Jean-loup Gailly, is invaluable for understanding many
compression algorithms, including LZ77 and LZSS.</p>
<a name="INDEX-703.01-new" />
<p>Practically speaking, independent implementation of the deflate algorithm is
both difficult and unnecessary. Almost every PNG implementation available
today makes use of the freely available zlib compression library, and
the examples in Part III, <em class="emphasis">Programming with PNG</em>, do so as well.<a href="#FOOTNOTE-71">[71]</a>
For now I merely note that zlib supports ten compression levels
(including one with no compression at all), differing in the algorithms used
to find matching strings and in the thresholds for terminating the search
prematurely.</p><blockquote class="footnote">
<a name="FOOTNOTE-71" /><p>[71] Nevertheless, at least one alternative (in C++) is available as part of
Colosseum Builders' Image Library, and it is also described in a book by
John Miano, <em class="emphasis">The Programmer's Guide to Compressed Image Files</em>.</p>
</blockquote>
<p>As an aside, note that the efficiency of the compression engine increases
as more data is processed, with peak efficiency being reached when there is
sufficient data to fill the sliding window. This occurs mainly because there
are fewer strings available in the ``dictionary,'' but also, initially, because
those strings that do exist are limited in length--obviously, they cannot be
any longer than the amount of data in the window. Thus, for example, when 25%
of the compressed data has been received, it may correspond to only 20% of the
pixels. But because of data buffering in network protocols and applications,
any large disparities due to the truly low-efficiency encoding at startup will
tend to be washed out at the 512-byte level (or higher). That is, even though
the first 50 bytes might represent only 1% compression, those bytes generally
will not be available until after the 512th byte has been received, by
which point the compression efficiency may have reached 10% or better. And
since this is generally true of most compression algorithms, including those
used by both GIF and PNG, it is reasonable to compare (as I did in <a href="chapter01.html">Chapter 1, "An Introduction to PNG"</a>)
the appearance of the <em class="emphasis">uncompressed</em> pixels at an instant when equal
amounts of <em class="emphasis">compressed</em> data have been received.
</p>
<a name="png.ch09.div.2.1" /><div class="sect2">
<h3 class="sect2">9.2.1. A Final Word on Patents</h3>
<p><a name="INDEX-704" />
<a name="INDEX-705" />As mentioned at the end of
<a href="chapter07.html">Chapter 7, "History of the Portable Network Graphics Format"</a>,
Stac has reportedly claimed that the
deflate algorithm is covered by two of their patents. In fact, there
are a number of patents that <em class="emphasis">can</em> be infringed upon by a
compliant deflate implementation, including one held by PKWARE itself
that involves sorted hash tables. But the deflate specification
includes a section on implementing the algorithm without
infringing,<a href="#FOOTNOTE-72">[72]</a>
and, of course, zlib itself follows that prescription. While these
things are never 100% certain unless and until they are tested in court,
developers and users can be reasonably confident that the use of zlib
and its implementation of the deflate algorithm is not subject to licensing
fees.
<a name="INDEX-706" />
<a name="INDEX-707" />
</p><blockquote class="footnote">
<a name="FOOTNOTE-72" /><p>[72] From Section 4 of the deflate specification,
<em class="emphasis">Compression algorithm details</em>:
``...it is strongly recommended that the implementor of a
compressor follow the general algorithm presented here, which is known
not to be patented per se.''</p>
</blockquote>
</div>
</div>
<div class="sect1"><a name="png.ch09.div.3" />
<h2 class="sect1">9.3. Real-World Comparisons</h2>
<p><a name="INDEX-708" />
<a name="INDEX-709" />The only convincing way to demonstrate the compression benefits of one
image format over another is to do an actual comparison of the two on
a set of real images. The problem is choosing the set of
images--what works for one person may not work for another. What
I've done here is to gather together results from a number of
real-world tests performed over the past few years. Readers can
expect to achieve similar results on similar sets of images, but keep
in mind that one can always choose a <em class="emphasis">particular</em> set of images
for which the results will be dramatically different. I'll explain
that remark after we see a few cases.</p>
<p><a name="INDEX-710" />For starters, let's look at a small, very unscientifically chosen set
of images: seven nonanimated GIF images that happened to be the only
ones readily available on my machine one fine day in June 1998.</p>
<a name="png.ch09.tbl.2" />
<div class="table" align="center">
<p>
<table width="502" border="0">
<tr>
<td>
<b class="emphasis-bold">Table 9-2.</b>
<i>Seven Non-Animated, Non-Scientifically Selected GIF Images</i>
</td>
</tr>
</table>
</p>
<p>
<!-- table width="502" border="1" -->
<table border="1">
<tr>
<td align="center"><b class="emphasis-bold"> Name </b></td>
<td align="center"><b class="emphasis-bold"> GIF Size </b></td>
</tr>
<tr>
<td> linux-penguins </td>
<td align="right"> 38,280 </td>
</tr>
<tr>
<td> linux-tinypeng </td>
<td align="right"> 1,249 </td>
</tr>
<tr>
<td> linux_bigcrash </td>
<td align="right"> 298,529 </td>
</tr>
<tr>
<td> linux_lgeorges </td>
<td align="right"> 20,224 </td>
</tr>
<tr>
<td> linux_rasterman </td>
<td align="right"> 4,584 </td>
</tr>
<tr>
<td> sun-tinylogo </td>
<td align="right"> 1,226 </td>
</tr>
<tr>
<td> techweb-scsi-compare </td>
<td align="right"> 27,660 </td>
</tr>
<tr>
<td><b class="emphasis-bold"> TOTAL </b></td>
<td align="right"><b class="emphasis-bold"> 391,752 </b></td>
</tr>
</table>
</p>
</div>
<p>The images ranged in size from just over a kilobyte to nearly 300
kilobytes (the exact byte sizes are given in
<a href="#png.ch09.tbl.2">Table 9-2</a>) and
in dimension from 32 × 32 to 800 × 600. All but the first
and last were interlaced. When converted to PNG with the gif2png
<a name="INDEX-711" />
<a name="INDEX-712" />utility (<a href="chapter05.html">Chapter 5, "Applications: Image Converters"</a>), preserving
interlacing manually and introducing no new text annotations, things
improved somewhat; <a href="#png.ch09.tbl.3">Table 9-3</a> shows the
preliminary results.</p>
<a name="png.ch09.tbl.3" />
<div class="table" align="center">
<p>
<table width="502" border="0">
<tr>
<td>
<b class="emphasis-bold">Table 9-3.</b>
<i>Same Seven GIF Images After Conversion to PNG</i>
</td>
</tr>
</table>
</p>
<p>
<!-- table width="502" border="1" -->
<table border="1">
<tr>
<td align="center"><b class="emphasis-bold"> Name </b></td>
<td align="center"><b class="emphasis-bold"> PNG Size </b></td>
<td align="center"><b class="emphasis-bold"> Change </b></td>
</tr>
<tr>
<td> linux-penguins </td>
<td align="right"> 35,224 </td>
<td align="right"> -8.0% </td>
</tr>
<tr>
<td> linux-tinypeng </td>
<td align="right"> 722 </td>
<td align="right"> -42.2% </td>
</tr>
<tr>
<td> linux_bigcrash </td>
<td align="right"> 283,839 </td>
<td align="right"> -4.9% </td>
</tr>
<tr>
<td> linux_lgeorges </td>
<td align="right"> 20,476 </td>
<td align="right"> +1.2% </td>
</tr>
<tr>
<td> linux_rasterman </td>
<td align="right"> 4,812 </td>
<td align="right"> +5.0% </td>
</tr>
<tr>
<td> sun-tinylogo </td>
<td align="right"> 566 </td>
<td align="right"> -53.8% </td>
</tr>
<tr>
<td> techweb-scsi-compare </td>
<td align="right"> 20,704 </td>
<td align="right"> -25.1% </td>
</tr>
<tr>
<td><b class="emphasis-bold"> TOTAL </b></td>
<td align="right"><b class="emphasis-bold"> 366,343 </b></td>
<td align="right"><b class="emphasis-bold"> -6.5% </b></td>
</tr>
</table>
</p>
</div>
<a name="INDEX-712.01-new" />
<p>Five of the images shrank when converted to PNG--three of them quite
significantly--while two grew. Overall, the set achieved a 6.5% improvement
in byte size. Since gif2png uses the standard settings of the
libpng reference code,<a href="#FOOTNOTE-73">[73]</a>
its results may be considered typical of ``good'' PNG encoders. But the owner
of a web site will often be willing to spend a little more time up front on
compression in return for additional bandwidth savings in the long run. That's
<a name="INDEX-713" />
<a name="INDEX-714" />
where pngcrush (also discussed in
<a href="chapter05.html">Chapter 5, "Applications: Image Converters"</a>)
comes in. <a href="#png.ch09.tbl.4">Table 9-4</a> shows its results;
the percentages are again relative to the original GIF file sizes.</p>
<a name="png.ch09.tbl.4" />
<div class="table" align="center">
<p>
<table width="502" border="0">
<tr>
<td>
<b class="emphasis-bold">Table 9-4.</b>
<i>Same Seven GIF Images After PNG Conversion and Optimization</i>
</td>
</tr>
</table>
</p>
<p>
<!-- table width="502" border="1" -->
<table border="1">
<tr>
<td align="center"><b class="emphasis-bold"> Name </b></td>
<td align="center"><b class="emphasis-bold"> Optimized <br /> PNG Size </b></td>
<td align="center"><b class="emphasis-bold"> Change </b></td>
</tr>
<tr>
<td> linux-penguins </td>
<td align="right"> 34,546 </td>
<td align="right"> -9.8% </td>
</tr>
<tr>
<td> linux-tinypeng </td>
<td align="right"> 710 </td>
<td align="right"> -43.2% </td>
</tr>
<tr>
<td> linux_bigcrash </td>
<td align="right"> 282,948 </td>
<td align="right"> -5.2% </td>
</tr>
<tr>
<td> linux_lgeorges </td>
<td align="right"> 19,898 </td>
<td align="right"> -1.6% </td>
</tr>
<tr>
<td> linux_rasterman </td>
<td align="right"> 4,731 </td>
<td align="right"> +3.2% </td>
</tr>
<tr>
<td> sun-tinylogo </td>
<td align="right"> 550 </td>
<td align="right"> -55.1% </td>
</tr>
<tr>
<td> techweb-scsi-compare </td>
<td align="right"> 19,155 </td>
<td align="right"> -30.7% </td>
</tr>
<tr>
<td><b class="emphasis-bold"> TOTAL </b></td>
<td align="right"><b class="emphasis-bold"> 362,538 </b></td>
<td align="right"><b class="emphasis-bold"> -7.5% </b></td>
</tr>
</table>
</p>
</div>
<blockquote class="footnote">
<a name="FOOTNOTE-73" />
<p>[73] libpng is discussed at length in
<a href="chapter13.html">Chapter 13, "Reading PNG Images"</a>,
<a href="chapter14.html">Chapter 14, "Reading PNG Images Progressively"</a> and
<a href="chapter15.html">Chapter 15, "Writing PNG Images"</a>, which
demonstrate how to use libpng to read and write PNG images.</p>
</blockquote>
<p>So we see that the current state-of-the-art PNG encoder ekes out
another percentage point in the total size, with all but one of the
images now smaller than the original. That lone holdout is worth a
closer look in this case. I already noted that
<em class="emphasis">linux_rasterman.gif</em> was one of the interlaced images; suppose it
had not been? The noninterlaced GIF version is 4,568 bytes, only 16
bytes smaller than the original. But the noninterlaced PNG version is
either 4,067 bytes (gif2png) or 4,000 bytes (pngcrush)--a savings of
11.0% or 12.4% over the noninterlaced GIF. In other words, PNG's
two-dimensional interlacing scheme can have a <em class="emphasis">significant negative
impact on compression</em>, particularly for small images. This is an
important point to consider when creating images: is interlacing
really needed for a 152 × 96 image (as in this case) when the
penalty is more than 18% of the file size?</p>
<p>This example may have been instructive, but seven images do not constitute a
statistically valid sample.<a href="#FOOTNOTE-74">[74]</a>
<a name="INDEX-715" />
<a name="INDEX-716" />
<a name="INDEX-717" />
So let's consider a few more data sets. One real-life example comes from
the course entitled ``Authoring Compelling and Efficient VRML 2.0 Worlds''
at the VRML98 conference in Monterey, California. Though the content of the
course was otherwise outstanding, one slide comparing image formats for 3D
textures was rather alarming from a PNG perspective. It showed the information
displayed in <a href="#png.ch09.tbl.5">Table 9-5</a>,
together with the textures themselves (which are omitted here):</p>
<a name="png.ch09.tbl.5" />
<div class="table" align="center">
<p>
<table width="502" border="0">
<tr>
<td>
<b class="emphasis-bold">Table 9-5.</b>
<i>Original PNG, GIF, and JPEG Comparison from VRML98 Course</i>
</td>
</tr>
</table>
</p>
<p>
<table width="502" border="1">
<tr>
<td align="middle"><b class="emphasis-bold">Name</b></td>
<td align="middle"><b class="emphasis-bold">Dimensions</b></td>
<td align="middle"><b class="emphasis-bold">Type</b></td>
<td align="middle"><b class="emphasis-bold">JPEG Size</b></td>
<td align="middle"><b class="emphasis-bold">GIF Size</b></td>
<td align="middle"><b class="emphasis-bold">PNG Size</b></td>
</tr>
<tr>
<td> linoleum1 </td>
<td align="center">128 × 128</td>
<td align="center">grayscale</td>
<td align="right">10,956</td>
<td align="right">7,055</td>
<td align="right">16,008</td>
</tr>
<tr>
<td> doggie </td>
<td align="center">128 × 256</td>
<td align="center">color</td>
<td align="right">9,897</td>
<td align="right">24,605</td>
<td align="right">89,022</td>
</tr>
<tr>
<td> fog </td>
<td align="center">128 × 128</td>
<td align="center">grayscale + alpha</td>
<td align="right">--</td>
<td align="right">--</td>
<td align="right">26,732</td>
</tr>
<tr>
<td> circlefade </td>
<td align="center">128 × 128</td>
<td align="center">grayscale + alpha</td>
<td align="right">--</td>
<td align="right">--</td>
<td align="right">15,735</td>
</tr>
<tr>
<td> buttfly </td>
<td align="center">128 × 128</td>
<td align="center">color + transparency</td>
<td align="right">--</td>
<td align="right">4,367</td>
<td align="right">--</td>
</tr>
<?x-space 5p?>
</table>
</p>
</div>
<blockquote class="footnote">
<a name="FOOTNOTE-74" /><p>[74] That would be a small understatement.</p>
</blockquote>
<p>Even with no more details than are shown here, at least one problem is
apparent: in row 2, the JPEG image is 24 bits deep, while the GIF is only
8 bits. Judging by the size of the corresponding PNG, one might assume
(correctly) that the PNG is also 24 bits. So on the one hand, PNG is being
compared with an image of the same depth that uses lossy compression, while
on the other it is being compared with an image only one-third as deep, which
also amounts to lossy compression. That hurts.</p>
<p>As it turned out, there were other problems: the PNG images were created
with an image editor not known for its compression capabilities, and some
of the PNGs were interlaced even though their GIF counterparts were not.
(And since this was a VRML course, I should note that no VRML browser in
existence actually uses interlacing to render textures progressively, so
there is generally no point in creating such images.) The upshot is that
all of these factors--JPEG's lossy compression, mixing 24-bit and 8-bit
images, mixing interlaced and noninterlaced images, and using a particularly
poor encoder to compress the PNGs--worked against our favorite image format.</p>
<p>After evening the playing field by using the GIFs as the source images for
the PNGs, turning off interlacing, and using a combination of conversion and
encoding tools (including pngcrush), the results were considerably
better for PNG, as shown in
<a href="#png.ch09.tbl.6">Table 9-6</a>.
</p>
<a name="png.ch09.tbl.6" />
<div class="table" align="center">
<p>
<table width="502" border="0">
<tr>
<td>
<b class="emphasis-bold">Table 9-6.</b>
<i>Updated PNG, GIF, and JPEG Comparison for VRML98 Course Images</i>
</td>
</tr>
</table>
</p>
<p>
<table width="502" border="1">
<tr>
<td align="middle"><b class="emphasis-bold">Name</b></td>
<td align="middle"><b class="emphasis-bold">JPEG Size</b></td>
<td align="middle"><b class="emphasis-bold">GIF Size</b></td>
<td align="middle"><b class="emphasis-bold">Original<br />PNG Size</b></td>
<td align="middle"><b class="emphasis-bold">Optimized<br />PNG Size</b></td>
<td align="middle"><b class="emphasis-bold">PNG<br />Change</b></td>
</tr>
<tr>
<td>linoleum1</td>
<td align="right">10,956</td>
<td align="right">7,055</td>
<td align="right">16,008</td>
<td align="right"><b class="emphasis-bold">6,753</b></td>
<td align="right">-57.8%</td>
</tr>
<tr>
<td>doggie</td>
<td align="right"><b class="emphasis-bold">9,897</b></td>
<td align="right">24,605</td>
<td align="right">89,022</td>
<td align="right">22,555</td>
<td align="right">-74.7%</td>
</tr>
<tr>
<td>fog</td>
<td align="right">--</td>
<td align="right">--</td>
<td align="right">26,732</td>
<td align="right"><b class="emphasis-bold">16,221</b></td>
<td align="right">-39.3%</td>
</tr>
<tr>
<td>circlefade</td>
<td align="right">--</td>
<td align="right">--</td>
<td align="right">15,735</td>
<td align="right"><b class="emphasis-bold">6,638</b></td>
<td align="right">-57.8%</td>
</tr>
<tr>
<td>buttfly</td>
<td align="right">--</td>
<td align="right">4,367</td>
<td align="right">--</td>
<td align="right"><b class="emphasis-bold">3,965</b></td>
<td align="right">--</td>
</tr>
<?x-space 5p?>
</table>
</p>
</div>
<p><a name="INDEX-718" />Here, I've marked the smallest version of each image
in boldface type; the only one that isn't a PNG is the color JPEG, which is
hardly surprising. What is interesting is that the grayscale JPEG
(<em class="emphasis">linoleum1.jpg</em>) is
larger than both the GIF and optimized PNG versions, despite the presumed
benefits of lossy compression. There are at least three reasons for this.
First, GIF and PNG both get an automatic factor-of-three savings from the fact
that each pixel is only 1 byte deep instead of 3 bytes. Second, JPEG
is at a relative disadvantage when dealing with grayscale images, because most
of its compression benefits arise from how it treats the color components
of an image. Third, this particular image is more artificial than natural,
with quite a few relatively sharp features, which makes it particularly
ill suited to JPEG-style compression.</p>
<p>But perhaps the most striking feature of
<a href="#png.ch09.tbl.6">Table 9-6</a> is just how poorly the original
encoder did on its PNG
images. Realizable savings of 40% to 75% are unusual, but thanks to
poor encoding software, they are not as unusual as one might hope.</p>
<p>As another real example (but one that is perhaps more representative
of what a typical web site might expect), the owner of
<a href="http://www.feynman.com/">http://www.feynman.com/</a> found that
when he converted 54
nonanimated GIFs to PNGs, the collection grew in size from 270,431
bytes to 327,590 bytes. Insofar as all of the original images had
depths of 8 bits or less--and even the worst PNG encoder will, on
average, do as well or better than GIF on colormapped PNG
images--the most likely explanation for the 21% increase in size is
that the conversion utility produced 24-bit RGB PNGs. Indeed, the
owner indicated that he had used ImageMagick's convert utility, older
versions of which reportedly had the unfortunate habit of creating
24-bit PNGs unless explicitly given the <b class="emphasis-bold">-depth 8</b>
option. (This problem seems to have been fixed in more recent versions,
but even current versions include 160 bytes' worth of text and background
chunks per image.) When the original GIFs were converted to PNG with gif2png
instead, the total size dropped to 215,668 bytes, for a 20% overall
savings over GIF. Individually, the GIFs were smaller in 15 of the 54
cases, but never by more than 340 bytes. Of the 39 images in which
the PNG version was smaller, one-third of them differed by more than a
kilobyte, and one was 14 KB smaller.</p>
<p><a name="INDEX-719" />For the last GIF comparison, I downloaded the World Wide Web
Consortium's icon collection, consisting of 448 noncorrupted GIF
images. Of these, 43 had embedded text comments and 39 were
interlaced. Most of the images were icon-sized (as would be
expected), at 64 × 64 or smaller, but there were a handful of
larger images, too. The total size of the files was 1,810,239 bytes.
Conversion to PNG via <em class="emphasis">gif2png</em>, handling the interlaced and
noninterlaced images separately in order to preserve their status,
resulted in a total PNG size of 1,587,337 bytes, or a 12.3% reduction.
Additional compression via pngcrush resulted in a total of 1,554,965
bytes, or a 14.1% reduction (relative to the GIF size). Out of the
448 images, PNG won the size comparison in 285 cases, lost in 161
cases, and tied in 2 cases. As in the previous test, however, the
magnitude of the differences was the critical factor: GIF won by more
than a kilobyte in only 1 case, while PNG won by that amount in 37
cases--4 of which were greater than 10 KB, 1 more than 64 KB. The
average difference for the 285 cases in which PNG was smaller was 940
bytes; for the 161 GIF cases, it was a mere 78 bytes.
</p>
<p><a name="INDEX-720" />
<a name="INDEX-721" />Finally, I've mentioned an upcoming JPEG standard for lossless compression
a couple of times; it's worth a quick look, too. JPEG-LS, as the standard
will be known,<a href="#FOOTNOTE-75">[75]</a>
is based on Hewlett-Packard's LOCO-I algorithm. As this is written, it
is implemented in version 0.90 of HP's locoe encoder, available only
in binary form for the HP-UX, Solaris, IRIX, and 32-bit Windows
platforms. (An independent implementation is available as C source
code from the University of British Columbia.) In a comparison
<a name="INDEX-722" />performed by Adam Costello, the HP encoder was tested against pnmtopng
and pngcrush on the eight standard color images in the Waterloo
<?x-need 10?>BragZone's ColorSet. pnmtopng is of interest only for speed reasons;
even though it is moderately fast, locoe was considerably faster. I
have omitted its size results from the comparison since, as expected,
pngcrush outperformed it in all cases, though at a considerable cost
in speed.
</p><blockquote class="footnote">
<a name="FOOTNOTE-75" /><p>[75] In December 1998 it became an ISO Draft International Standard, the final
voting stage before becoming a full International Standard. It will
officially be known as ISO/IEC 14495-1 upon approval. It has already been
approved as International Telecommunication Union (ITU) Recommendation T.87.</p>
</blockquote>
<p>The results were fascinating. In the five test images categorized by the
University of Waterloo as ``natural,'' JPEG-LS beat PNG by between 5% and
11%--not a huge difference, but certainly significant. However, in the
three images marked ``artistic,'' PNG proved superior by wide margins, with
one image more than three times smaller than the corresponding JPEG-LS version.
These results are summarized in
<a href="#png.ch09.tbl.7">Table 9-7</a>; once again, the byte size of the
winning format for each image is highlighted in boldface type.
</p>
<a name="png.ch09.tbl.7" />
<div class="table" align="center">
<p>
<table width="502" border="0">
<tr>
<td>
<b class="emphasis-bold">Table 9-7.</b>
<i>PNG and JPEG-LS Comparison for Waterloo BragZone Color Images</i>
</td>
</tr>
</table>
</p>
<p>
<table width="502" border="1">
<tr>
<td align="center"><b class="emphasis-bold">Classification</b></td>
<td align="center"><b class="emphasis-bold">Name</b></td>
<td align="center"><b class="emphasis-bold">Total<br />Pixels</b></td>
<td align="center"><b class="emphasis-bold">JPEG-LS<br />Size</b></td>
<td align="center"><b class="emphasis-bold">PNG<br />IDAT Size</b></td>
<td align="center"><b class="emphasis-bold">Relative<br />Difference</b></td>
</tr>
<tr>
<td align="center" rowspan=5>``natural''</td>
<td>lena</td>
<td align="right">262,144</td>
<td align="right"><b class="emphasis-bold">445,799</b></td>
<td align="right">475,430</td>
<td align="right">+6.6%</td>
</tr>
<tr>
<td>monarch</td>
<td align="right">393,216</td>
<td align="right"><b class="emphasis-bold">555,012</b></td>
<td align="right">615,260</td>
<td align="right">+10.9%</td>
</tr>
<tr>
<td>peppers</td>
<td align="right">262,144</td>
<td align="right"><b class="emphasis-bold">385,047</b></td>
<td align="right">425,560</td>
<td align="right">+10.5%</td>
</tr>
<tr>
<td>sail</td>
<td align="right">393,216</td>
<td align="right"><b class="emphasis-bold">767,374</b></td>
<td align="right">808,606</td>
<td align="right">+5.4%</td>
</tr>
<tr>
<td>tulips</td>
<td align="right">393,216</td>
<td align="right"><b class="emphasis-bold">616,536</b></td>
<td align="right">680,881</td>
<td align="right">+10.4%</td>
</tr>
<?x-space 2p?><?x-space 2p?><tr>
<td align="center" rowspan=3>``artistic''</td>
<td>clegg</td>
<td align="right">716,320</td>
<td align="right">653,299</td>
<td align="right"><b class="emphasis-bold">484,589</b></td>
<td align="right">-25.8%</td>
</tr>
<tr>
<td>frymire</td>
<td align="right">1,235,390</td>
<td align="right">935,285</td>
<td align="right"><b class="emphasis-bold">251,865</b></td>
<td align="right">-73.1%</td>
</tr>
<tr>
<td>serrano</td>
<td align="right">499,426</td>
<td align="right">293,532</td>
<td align="right"><b class="emphasis-bold">106,765</b></td>
<td align="right">-63.6%</td>
</tr>
<?x-space 5p?>
</table>
</p>
</div>
<p>Note that in the final column I used the JPEG-LS size as the
reference, which effectively works against PNG--had I used PNG
instead, the <em class="emphasis">frymire</em> image, for example, would show JPEG-LS as
271.3% larger, which looks much more impressive! Also note that I
used the size of the PNG IDAT data for comparison rather than the
actual PNG file size; this was done because locoe appears to encode
raw JPEG data, with none of the overhead of standard JPEG file formats
like JFIF and SPIFF.</p>
<p>In any case, the results are only slightly more statistically valid than
the first comparison of GIF images was. Eight samples, even if they are a
carefully chosen set of standard research images, cannot tell the full
story. And results as intriguing as these certainly deserve more extensive
testing, which will no doubt happen in due course.
<a name="INDEX-723" />
<a name="INDEX-724" />
</p>
</div>
<div class="sect1"><a name="png.ch09.div.4" />
<h2 class="sect1">9.4. Practical Compression Tips</h2>
<p><a name="INDEX-725" />
<a name="INDEX-726" />I could hardly end this chapter without some practical pointers on optimizing
PNG compression, both for users and for programmers. Herewith are some
rough guidelines, arranged in descending order of effectiveness. Note that,
as with any set of rules, there will always be exceptions.</p>
<a name="png.ch09.div.4.1" /><div class="sect2">
<h3 class="sect2">9.4.1. Tips for Users</h3>
<p>Following is a list of tips for users of PNG-supporting software:</p>
<dl>
<dt><b><em class="emphasis">Use the correct image format</em></b></dt><dd><p><a name="INDEX-727" />
<a name="INDEX-728" />
<a name="INDEX-729" />If you have photographic images and their quality as JPEGs is
acceptable, use JPEG! JPEG will almost always be smaller than PNG,
especially for color images. Conversely, if you have images with just
a few colors and/or sharp edges (such as text and simple graphics),
JPEG is almost never the correct solution; use PNG or GIF instead.
For binary transparency, also use PNG or GIF; for partial transparency
or lossless RGB, use PNG or TIFF; for animations, use MNG or GIF.</p></dd>
<dt><b><em class="emphasis">Use the correct pixel depth</em></b></dt><dd><p><a name="INDEX-730" />
<a name="INDEX-731" />
<a name="INDEX-732" />For example, don't convert a GIF (which, from a practical perspective,
always has a depth of 8 bits or less) to a 24-bit PNG; that will
automatically boost the file size by a factor of three. Similarly, if
given the option, don't save a grayscale image as RGB; save it as
grayscale or, at worst, as a palette-based PNG. Likewise, don't use a
full alpha channel if single-color transparency (à la GIF) would
suffice; it doubles the size of grayscale images and adds 33% to the
size of RGB.</p></dd>
<dt><b><em class="emphasis">Corollary: Quantize and dither truecolor images to a palette if quality is acceptable</em></b></dt><dd><p><a name="INDEX-733" />
<a name="INDEX-734" />
<a name="INDEX-735" />
<a name="INDEX-736" />
<a name="INDEX-737" />Likewise, quantize and dither RGBA or gray+alpha
PNGs to a palette, if possible. This is something that only you, the user, can
judge; no reasonable image application will ever quantize (which is a lossy
transformation) unless instructed to do so by you. This is not an issue for
GIF, which realistically supports <em class="emphasis">only</em> colormapped images (i.e., your
choice of GIF as an output format amounts to an explicit instruction to
quantize) nor is it an issue for JPEG, which supports only grayscale and
truecolor. Only PNG supports colormapped, grayscale, and truecolor images, as
well as alpha channels.</p></dd>
<dt><b><em class="emphasis">Use interlacing with care</em></b></dt><dd><p><a name="INDEX-738" />Interlacing is a way to transmit the useful
parts of an image more quickly, particularly on the Web, so that the end user
can click on a hot-linked region before the image is fully downloaded, if she
so chooses. But as I saw earlier, PNG's two-dimensional interlacing scheme can
degrade compression by 15% in some cases, especially for small images. Since
small images are transmitted over the network fairly quickly anyway, they
usually do not need to be interlaced.</p></dd>
<dt><b><em class="emphasis">Use the correct tools</em></b></dt><dd><p>In the first six chapters, I discussed a number of PNG-supporting
applications and noted their limitations wherever possible; use that
as a guide when choosing your tools, assuming you have a choice. Even
if your program generally compresses PNG images well, consider using
an optimizer such as pngcrush on everything when you're done;<a href="#FOOTNOTE-76">[76]</a>
definitely do so if your program is not known for its compression
performance. For converting GIFs to PNGs, the dedicated gif2png is
the most capable solution, even given its permanently beta version
number; it preserves both transparency and embedded text comments.</p><blockquote class="footnote">
<a name="FOOTNOTE-76" /><p>[76] It is one of my favorite tools, in case that wasn't already
apparent. As of April 1999, there are still a few optimization tricks
it doesn't do, but its author is addressing those even as this is
written.</p>
</blockquote></dd>
<dt><b><em class="emphasis">Don't include unnecessary information</em></b></dt><dd><p>A lengthy copyright message or
other text can add 100 bytes or more, which is a lot for icons and other small
images.</p></dd>
</dl>
</div>
<a name="png.ch09.div.4.2" /><div class="sect2">
<h3 class="sect2">9.4.2. Tips for Programmers</h3>
<p>Following is a list of tips for programmers:</p>
<dl>
<dt><b><em class="emphasis">Use the correct pixel depth</em></b></dt><dd><p><a name="INDEX-739" />
<a name="INDEX-740" />
<a name="INDEX-741" />Count colors! Or at least do so
when the compression setting is ``best'' and you don't know that the image is
grayscale--it doesn't take that long, and computers are good
at that sort of thing. If there are 256 or fewer colors, write a colormapped
image; doing so will translate to a factor-of-three savings in the PNG file
size relative to an RGB image.</p></dd>
<dt><b><em class="emphasis">Use the correct pixel depth II</em></b></dt><dd><p>If the image is colormapped, don't
assume that the pixels must be 8 bits deep. If there are only one or two
colors, write a 1-bit image. If there are three or four colors, write a
2-bit image. If there are between 5 and 16 colors, write a 4-bit image.
(These are the only useful cases for PNG.) The compression engine cannot
compensate for bloated pixels! Choosing the correct depth for a palette-based
image will reduce the file size by a factor of anywhere from two to eight
relative to an 8-bit image.</p></dd>
<dt><b><em class="emphasis">Use grayscale if possible</em></b></dt><dd><p><a name="INDEX-742" />If you know the image is gray, see if it can be written more compactly
as a grayscale PNG than as a colormapped PNG--this is automatically
true if there are more than 16 shades of gray. Doing so will save up
to 780 bytes by eliminating the palette. But don't assume that 16 or
fewer shades automatically means the image can be written as 4-bit (or
smaller) grayscale. Grayscale necessarily implies that the shades are
evenly distributed from black to white. If, for example, the 16 shades
are bunched up in one part of the gray spectrum, the image must be
written as 8-bit grayscale or 4-bit palette-based. For larger images,
the palette-based approach is almost certainly better; for small ones
it depends, but the 8-bit grayscale case may end up being smaller. Try
both, if possible; it's very fast for small images.</p></dd>
<a name="INDEX-742.01-new" />
<dt><b><em class="emphasis">Set the compression and filtering options intelligently</em></b></dt><dd><p>For programs that use libpng (discussed at length in <a href="part3.html">Part III, "Programming with PNG"</a>), this
is not a serious issue; it will automatically do the right thing if
left to itself. But if you are writing custom PNG code, follow the
guidelines in the PNG specification for matching filter strategies
with image types. In particular, use filter type None for colormapped
images and for grayscale images less than 8 bits deep. Use adaptive
filtering (the ``minimum sum of absolute differences'' heuristic) for
all other cases.</p></dd>
<dt><b><em class="emphasis">Truncate the palette</em></b></dt><dd><p>Unlike GIF, PNG's palette size is determined by
the chunk size, so there is no need to include 256 entries if only 173 are
used in the image. At 3 bytes per entry, wasted slots can make a big
difference in icons and other small images.</p></dd>
<dt><b><em class="emphasis">Truncate the transparency chunk</em></b></dt><dd><p><a name="INDEX-743" />
<a name="INDEX-744" />It is extremely rare for every palette entry to be partially or fully
transparent. If there are any opaque entries--in particular, if all
but one are opaque--reorder the palette so that the opaque entries
are at the end. The transparency entries corresponding to these opaque
colors can then be omitted. The absolute worst possible approach is to
put the single transparent entry at the <em class="emphasis">end</em> of the palette!
Those 255 extra bytes are a lot for a file that would otherwise be 500
(or even 150) bytes long.</p></dd>
<dt><b><em class="emphasis">Do transparency intelligently</em></b></dt><dd><p>Understand how PNG's alpha channels and tRNS chunk work. If the alpha
mask is binary (that is, either fully transparent or fully opaque),
see if the transparent parts correspond to a single color or gray
shade; if so, eliminate the alpha channel from the PNG file and use
<a name="INDEX-745" />the tRNS chunk (``cheap transparency'') instead. Alternatively, see
if the total number of color+alpha combinations is 256 or fewer; if
so, write a colormapped image with a tRNS chunk. If the user requests
that an RGBA image be converted to indexed color, do so intelligently.
The combination of PNG's PLTE and tRNS chunks amounts to a palette
whose entries are RGBA values. The exact same algorithms that
quantize and dither a 24-bit RGB image down to an 8-bit palette-based
image can be used to quantize and dither a 32-bit RGBA or 16-bit
grayscale+alpha image down to an 8-bit RGBA palette. In particular,
you cannot treat color values and transparency values as if they are
separate, unrelated entities; attempting to partition the palette into
a ``color part'' and a ``transparent part'' makes no more sense than
attempting to partition a standard RGB palette into red, green, and
blue parts. If you do cheap transparency poorly, the user will be
forced to use a full alpha channel, quadrupling her file size. For
grayscale, an alpha channel ``merely'' doubles the size. Note that
the icicle image in <a href="fig_C1.html">Figure C-1</a> in the color insert is
actually colormapped. Aside from the garish background--which was actually
generated by the viewing application--the full-resolution half looks pretty
darned good, doesn't it?</p></dd>
<dt><b><em class="emphasis">Don't include unnecessary chunks in small images</em></b></dt><dd><p>Gamma information (or
<a name="INDEX-746" />the sRGB chunk) is always good, but a full ICC profile may quadruple the size
of a small image file. Consider not including a Software text chunk or tIME
chunk, or do so only for images larger than, say, 100 × 100 pixels.
Include dots-per-inch information (pHYs chunk) only if it is actually relevant
to the image; but the user may be the only one who can make that call.</p></dd>
<dt><b><em class="emphasis">Offer the user reasonable options</em></b></dt><dd><p>Don't overwhelm him with unnecessary detail about filters or other
technical jargon. For example, offer a simple checkbox to turn on
interlacing. Offer a simple dial or even just two or three choices
for compression level--<em class="emphasis">fastest</em>, <em class="emphasis">typical</em>, and <em class="emphasis">best,</em>
perhaps. Even though it will make the file bigger, offer to include
at least a few text annotations--Author, Title, Description, and/or
Copyright, for example. On the other hand, offer to omit certain
optional information, such as that described in the previous item.</p></dd>
<dt><b><em class="emphasis">Warn the user about data loss</em></b></dt><dd><p><a name="INDEX-747" />If a region is completely transparent,
don't zero out the underlying color pixels in order to improve compression
unless you've notified the user in some way. Make sure she understands that
quantization and dithering are lossy transformations, but don't make this an
overly scary issue.
<a name="INDEX-748" />
<a name="INDEX-749" />
<a name="INDEX-750" />
</p></dd>
</dl>
</div>
<pre>
</pre>
<hr> <!-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- -->
<a href="chapter08.html"><img width=24 height=13 border=0 align="left"
src="images/prev.png" alt="<-"></a>
<a href="chapter10.html"><img width=24 height=13 border=0 align="right"
src="images/next.png" alt="->"></a>
<div align="center">
<a href="chapter08.html"><font size="-1" color="#000000"
><b>PREVIOUS</b></font></a> <a
href="toc.html"><font size="-1" color="#000000"
><b>CONTENTS</b></font></a> <a
href="chapter10.html"><font size="-1" color="#000000"
><b>NEXT</b></font></a>
</div>
<hr> <!-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- -->
</body></html>
|