/usr/share/gretl/gretlgui.hlp is in gretl-common 1.9.6-1build1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576 1577 1578 1579 1580 1581 1582 1583 1584 1585 1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602 1603 1604 1605 1606 1607 1608 1609 1610 1611 1612 1613 1614 1615 1616 1617 1618 1619 1620 1621 1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 | # add Tests "Add variables to model"
The selected variables are added to the previous model and the new model estimated. A test statistic for the joint significance of the added variables is printed, along with its p-value.
Menu path: Model window, /Tests/Add variables
Script command: <@ref="add">
# addline Graphs "Add line to graph"
This dialog box allows you to add a line, defined via a formula, to a graph. The formula must be an expression acceptable to gnuplot. Use <@lit="x"> to denote the value of the variable on the x-axis. Please note that gnuplot uses <@lit="**"> for exponentiation (raising to a power), and that the decimal character must be given as ".". Examples:
<code>
10+0.35*x
100+5.3*x-0.12*x**2
sin(x)
exp(sqrt(pi*x))
</code>
# adf Tests "Augmented Dickey-Fuller test"
This command needs an integer lag order; if the order is zero a standard (not augmented) Dickey–Fuller test is run. Computes a set of Dickey–Fuller tests on the selected variable, the null hypothesis being that the variable has a unit root. (But if the differencing option is selected, the first difference of the variable is taken prior to testing, and the discussion below must be taken as referring to the transformed variable.)
In all cases the dependent variable is the first difference of the specified variable, <@itl="y">, and the key independent variable is the first lag of <@itl="y">. The model is constructed so that the coefficient on lagged <@itl="y"> equals the root in question minus 1. For example, the model with a constant may be written as
<@fig="adf1">
Under the null hypothesis of a unit root the coefficient on lagged <@itl="y"> equals zero; under the alternative that <@itl="y"> is stationary this coefficient is negative.
If the lag order, <@itl="k">, is greater than 0, then <@itl="k"> lags of the dependent variable are included on the right-hand side of the test regressions, subject to the following qualification. If the box labeled "test down from maximum lag" is checked, the selected lag order is taken as a maximum and the actual lag order used is obtained by testing down, using this algorithm:
<indent>
1. Estimate the Dickey–Fuller regression with <@itl="k"> lags of the dependent variable.
</indent>
<indent>
2. Is the last lag significant? If so, execute the test with lag order <@itl="k">. Otherwise, let <@itl="k"> = <@itl="k"> – 1; if <@itl="k"> equals 0, execute the test with lag order 0, else go to step 1.
</indent>
In the context of step 2 above, "significant" means that the <@itl="t">-statistic for the last lag has an asymptotic two-sided <@itl="p">-value, against the normal distribution, of 0.10 or less.
<@itl="P">-values for the Dickey–Fuller tests are based on MacKinnon (1996). The relevant code is included by kind permission of the author. In the case of the test with linear trend using GLS these <@itl="P">-values are not applicable; critical values from Table 1 in Elliott, Rothenberg and Stock (1996) are shown instead.
Menu path: /Variable/Unit root tests/Augmented Dickey-Fuller test
Script command: <@ref="adf">
# anova Statistics "ANOVA"
Analysis of Variance: <@var="response"> is a series measuring some effect of interest and <@var="treatment"> must be a discrete variable that codes for two or more types of treatment (or non-treatment). For two-way ANOVA, the <@var="block"> variable (which should also be discrete) codes for the values of some control variable.
The null hypothesis for the <@itl="F">-test is that the mean response is invariant with respect to the treatment type, or in words that the treatment has no effect. Strictly speaking, the test is valid only if the variance of the response is the same for all treatment types.
Note that the results shown by this command are in fact a subset of the information given by the following procedure, which is easily implemented in gretl. Create a set of dummy variables coding for all but one of the treatment types. For two-way ANOVA, in addition create a set of dummies coding for all but one of the "blocks". Then regress <@var="response"> on a constant and the dummies using <@ref="ols">. For a one-way design the ANOVA table is printed via the <@opt="--anova"> option to <@lit="ols">. In the two-way case the relevant <@itl="F">-test is found by using the <@ref="omit"> command. For example (assuming <@lit="y"> is the response, <@lit="xt"> codes for the treatment, and <@lit="xb"> codes for blocks):
<code>
# one-way
list dxt = dummify(xt)
ols y 0 dxt --anova
# two-way
list dxb = dummify(xb)
ols y 0 dxt dxb
# test joint significance of dxt
omit dxt --quiet
</code>
Menu path: /Model/Other linear models/ANOVA
Script command: <@ref="anova">
# ar Estimation "Autoregressive estimation"
Computes parameter estimates using the generalized Cochrane–Orcutt iterative procedure; see Section 9.5 of Ramanathan (2002). Iteration is terminated when successive error sums of squares do not differ by more than 0.005 percent or after 20 iterations.
The "list of AR lags" specifies the structure of the error process. For example, the entry "1 3 4" corresponds to the process:
<@fig="arlags">
Menu path: /Model/Time series/Autoregressive estimation
Script command: <@ref="ar">
# ar1 Estimation "AR(1) estimation"
Computes feasible GLS estimates for a model in which the error term is assumed to follow a first-order autoregressive process.
The default method is the Cochrane–Orcutt iterative procedure; see for example section 9.4 of Ramanathan (2002). Iteration is terminated when successive estimates of the autocorrelation coefficient do not differ by more than 0.001 or after 20 iterations.
If the <@opt="--hilu"> option is given, the Hildreth–Lu search procedure is used. The results are then fine-tuned using the Cochrane–Orcutt method, unless the <@opt="--no-corc"> flag is specified. (The latter option is ignored if <@opt="--hilu"> is not specified.)
If the <@opt="--pwe"> option is given, the Prais–Winsten estimator is used. This involves an an iteration similar to Cochrane–Orcutt; the difference is that while Cochrane–Orcutt discards the first observation, Prais–Winsten makes use of it. See, for example, Chapter 13 of Greene's <@itl="Econometric Analysis"> (2000) for details.
Menu path: /Model/Time series/Cochrane-Orcutt
Menu path: /Model/Time series/Hildreth-Lu
Menu path: /Model/Time series/Prais-Winsten
Script command: <@ref="ar1">
# arch Estimation "ARCH model"
This command is retained at present for backward compatibility, but you are better off using the maximum likelihood estimator offered by the <@ref="garch"> command; for a plain ARCH model, set the first GARCH parameter to 0.
Estimates the given model specification allowing for ARCH (Autoregressive Conditional Heteroskedasticity). The model is first estimated via OLS, then an auxiliary regression is run, in which the squared residual from the first stage is regressed on its own lagged values. The final step is weighted least squares estimation, using as weights the reciprocals of the fitted error variances from the auxiliary regression. (If the predicted variance of any observation in the auxiliary regression is not positive, then the corresponding squared residual is used instead).
The <@lit="alpha"> values displayed below the coefficients are the estimated parameters of the ARCH process from the auxiliary regression.
See also <@ref="garch"> and <@ref="modtest"> (the <@opt="--arch"> option).
Menu path: /Model/Time series/ARCH
Script command: <@ref="arch">
# arima Estimation "ARMA model"
Estimates an ARMA model, with or without exogenous regressors. If the order of differencing is greater than zero the model becomes ARIMA. If the data have a frequency greater than 1 the option of including a seasonal component is presented.
If you wish to include only specified AR or MA lags in the model (as opposed to all lags up to a given order) check the box to the right of the spinner and type a list of lags, separated by spaces, into the entry field. Alternatively, if you have defined a matrix containing the desired set of lags you can type its name into the entry field.
The default is to use the "native" gretl ARMA functionality, with estimation by exact ML using the Kalman filter; estimation via conditional ML is available as an option. (If X-12-ARIMA is installed you have the option of using it instead of native code.) For details regarding these options, please see <@pdf="the Gretl User's Guide">.
The AIC value given in connection with ARIMA models is calculated according to the definition used in X-12-ARIMA, namely
<@fig="aic">
where <@fig="ell"> is the log-likelihood and <@itl="k"> is the total number of parameters estimated. Note that X-12-ARIMA does not produce information criteria such as AIC when estimation is by conditional ML.
The AR and MA roots shown in connection with ARMA estimation are based on the following representation of an ARMA(p, q) process:
<mono>
(1 - a_1*L - a_2*L^2 - ... - a_p*L^p)Y =
c + (1 + b_1*L + b_2*L^2 + ... + b_q*L^q) e_t
</mono>
The AR roots are therefore the solutions to
<mono>
1 - a_1*z - a_2*z^2 - ... - a_p*L^p = 0
</mono>
and stability requires that these roots lie outside the unit circle.
The "frequency" figure printed in connection with AR and MA roots is the λ value that solves <@itl="z"> = <@itl="r"> * exp(i*2*π*λ) where <@itl="z"> is the root in question and <@itl="r"> is its modulus.
Menu path: /Model/Time series/ARIMA
Other access: Main window pop-up menu (single selection)
Script command: <@ref="arima">
# bfgs-config Estimation "BFGS options"
This dialog allows you to control some aspects of the operation of the BFGS maximizer. In case the maximizer fails to converge it may help matters, in some cases, to increase the number of iterations allowed and/or to increase (make more permissive) the convergence tolerance. However, you should be suspicious of results obtained using a high tolerance and should consider the possibility that the model you are estimating is misspecified.
For most applications we recommend use of the regular BFGS maximizer but for some problems the "limited memory" variant of the algorithm, L-BFGS-B, may produce more rapid convergence. When L-BFGS-B is selected, you have the option of setting the number of corrections used in the limited memory matrix (between 3 and 20, with a default of 8).
# bootstrap Tests "Bootstrap options"
In this dialog you get to choose:
<indent>
• The variable/coefficient to examine. (You can test only one coefficient at a time using this method.)
</indent>
<indent>
• The sort of analysis to perform. The default (95 percent) confidence interval is based directly on the quantiles of the bootstrap coefficient estimates. The "studentized" version is as per Davidson and MacKinnon's <@itl="Economic Theory and Methods"> (ETM), chapter 5: at each bootstrap replication a <@itl="t">-ratio is formed as (a) the difference between the current and the baseline coefficient estimate, divided by (b) the baseline estimated standard error. Then the confidence interval is formed based on the quantiles of this t-ratio, as explained in ETM. The P-value option is based on the distribution of the bootstrap <@itl="t">-ratio: it is the proportion of the replications where the absolute value of this statistic exceeds the absolute value of the baseline <@itl="t">-ratio.
</indent>
<indent>
• Resampled residuals versus simulate normal errors. In the first case the original residuals (rescaled as suggested in ETM) are resampled with replacement. In the second case pseudo-random normal values are generated with the original residual variance.
</indent>
<indent>
• The number of replications to perform. Note that when you're constructing a 95 percent confidence interval it is desirable that 0.05(<@itl="B"> + 1)/2 is an integer (where <@itl="B"> is the number of replications). So gretl may adjust the chosen number of replications to ensure this is the case.
</indent>
<indent>
• Whether or not to produce a graph of the bootstrap distribution. This option employs gretl's kernel density estimation facility.
</indent>
# boxplot Graphs "Boxplots"
These plots display the distribution of a variable. The central box encloses the middle 50 percent of the data, i.e. it is bounded by the first and third quartiles. The "whiskers" extend to the minimum and maximum values. A line is drawn across the box at the median. A "+" sign is used to indicate the mean. If the option of showing a confidence interval for the median is selected, this is computed via the bootstrap method and shown in the form of dashed horizontal lines above and/or below the median.
The "factorized" option allows you to examine the distribution of a chosen variable conditional on the value of some discrete factor. For example, if a data set contains wages and a gender dummy variable you can select the wage variable as the target and gender as the factor, to see side-by-side boxplots of male and female wages.
Menu path: /View/Graph specified vars/Boxplots
Script command: <@ref="boxplot">
# bwfilter Transformations "The Butterworth filter"
The Butterworth filter is an appromixation to an ideal square-wave filter which allows frequencies over a certain range to pass at full strength while stopping all others.
Higher values of the order parameter, <@itl="n">, produce a closer approximation to the ideal filter, in principle, but at the possible cost of numerical instability. The "cutoff" value sets the boundary between the pass band and the stop band. It is expressed in degrees, and must be greater than 0 and less than 180° (or π radians, corresponding to the highest frequency in the data). Smaller values of the cutoff produce a smoother trend.
Inspecting the periodogram of the target series is a useful preliminary when you wish to apply this filter. See <@pdf="the Gretl User's Guide"> for details.
Menu path: /Variable/Filter/Butterworth
# chow Tests "Chow test"
This command needs either an observation number (or date, with dated data), or the name of a dummy variable.
Must follow an OLS regression. If an observation number or date is given, provides a test for the null hypothesis of no structural break at the given split point. The procedure is to create a dummy variable which equals 1 from the split point specified by <@var="obs"> to the end of the sample, 0 otherwise, and also interaction terms between this dummy and the original regressors. If a dummy variable is given, tests the null hypothesis of structural homogeneity with respect to that dummy. Again, interaction terms are added. In either case an augmented regression is run including the additional terms.
By default an <@itl="F"> statistic is calculated, taking the augmented regression as the unrestricted model and the original as the restricted. But if the original model used a robust estimator for the covariance matrix, the test statistic is a Wald chi-square value based on a robust estimator of the covariance matrix for the augmented regression.
Menu path: Model window, /Tests/Chow test
Script command: <@ref="chow">
# coeffsum Tests "Sum of coefficients"
This command needs a list of variables, selected from the set of independent variables in a given model.
Calculates the sum of the coefficients on the variables in the specified list. Prints this sum along with its standard error and the p-value for the null hypothesis that the sum is zero.
Note the difference between this and <@ref="omit">, which tests the null hypothesis that the coefficients on a specified subset of independent variables are <@itl="all"> equal to zero.
Menu path: Model window, /Tests/Sum of coefficients
Script command: <@ref="coeffsum">
# coint Tests "Engle-Granger cointegration test"
The Engle–Granger cointegration test. The default procedure is: (1) carry out Dickey–Fuller tests on the null hypothesis that each of the variables listed has a unit root; (2) estimate the cointegrating regression; and (3) run a DF test on the residuals from the cointegrating regression. If the box labeled "skip initial DF tests" is checked, however, the first of these steps is omitted.
If the lag order, <@itl="k">, is greater than 0, then <@itl="k"> lags of the dependent variable are included on the right-hand side of each test regression, unless the box labeled "test down from maximum lag" is checked: in that case the selected lag order is taken as a maximum and the actual lag order used is obtained by testing down. See the <@ref="adf"> command for details of this procedure.
By default, the cointegrating regression contains a constant. If you wish to suppress the constant, or to add a linear or quadratic trend, select the appropriate option from the set of radio buttons in the Cointegration dialog box.
<@itl="P-">values for this test are based on MacKinnon (1996). The relevant code is included by kind permission of the author.
Menu path: /Model/Time series/Cointegration test/Engle-Granger
Script command: <@ref="coint">
# coint2 Tests "Johansen cointegration test"
Carries out the Johansen test for cointegration among the listed variables for the selected lag order. For details of this test see, for example, Hamilton, <@itl="Time Series Analysis"> (1994), Chapter 20. P-values are computed via Doornik's (1998) gamma approximation.
The inclusion of deterministic terms in the model is controlled by the drop-down option list. The default is to include an "unrestricted constant", which allows for the presence of a non-zero intercept in the cointegrating relations as well as a trend in the levels of the endogenous variables. In the literature stemming from the work of Johansen (see for example his 1995 book) this is often referred to as "case 3". The other four options produce cases 1, 2, 4 and 5 respectively. The meaning of these cases and the criteria for selecting a case are explained in <@pdf="the Gretl User's Guide">.
You may control for exogenous variables by adding them to the lower list box. By default these enter the model in unrestricted form (indicated by a <@lit="U"> next to the name of the variable). If you want a certain exogenous variable to be restricted to the cointegrating space, right-click on it and select "Restricted" from the pop-up menu. The symbol next to the variable will change to R.
If the data are quarterly or monthly, a check box is shown that allows you to include a set of centered seasonal dummy variables. In all cases, an additional check box ("Show details") allows for the printing of the auxiliary regressions that form the starting point of the Johansen maximum likelihood estimation procedure.
The following table is offered as a guide to the interpretation of the results shown for the test, for the 3-variable case. <@lit="H0"> denotes the null hypothesis, <@lit="H1"> the alternative hypothesis, and <@lit="c"> the number of cointegrating relations.
<mono>
Rank Trace test Lmax test
H0 H1 H0 H1
---------------------------------------
0 c = 0 c = 3 c = 0 c = 1
1 c = 1 c = 3 c = 1 c = 2
2 c = 2 c = 3 c = 2 c = 3
---------------------------------------
</mono>
See also the <@ref="vecm"> command.
Menu path: /Model/Time series/Cointegration test/Johansen
Script command: <@ref="coint2">
# compact Dataset "Compact data"
When you add to a dataset a series that is of higher frequency, it is necessary to "compact" the new series. For instance, a monthly series will have to be compacted to fit into a quarterly dataset.
In addition, you may sometimes want to compact an entire dataset to a lower frequency (perhaps, prior to adding a lower-frequency variable to the dataset).
Gretl offers four options for compacting:
<indent>
• Averaging: The value written to the dataset will be the arithmetic mean of the relevant series values. For instance the value written for the first quarter of 1990 will be the average of the values for January, February and March of 1990.
</indent>
<indent>
• Summing: The value written to the dataset will be the sum of the relevant higher-frequency values. For example, the first-quarter value will be the sum of the January, February and March values.
</indent>
<indent>
• End-of-period values: The value written to the dataset is the last relevant value from the higher-frequency data. For example, the first quarter of 1990 will get the March 1990 value.
</indent>
<indent>
• Start-of-period values: The value written to the dataset is the first relevant value from the higher-frequency data. For example, the first quarter of 1990 will get the January 1990 value.
</indent>
In the case of compacting an entire dataset, the choice you make in this dialog box sets the default method. But if you have set a compaction method for an individual variable (menu item "Variable/Edit attributes") that method is used rather than the default. If the compaction method is already set for all variables, the choice of a default compaction method is not presented.
# controlled Graphs "Scatterplot with control"
This command requires the selection of three variables, one for the X axis, one for the Y axis, and one for which you wish to control (call it Z). The plot shows adjusted Y against adjusted X, where the adjusted version of the variable is the residual from an OLS regression on Z.
Example: You have data on wages, experience and education level for a sample of people. You wish to plot wages against education, controlling for experience. In that case you select wages for the Y axis, education for the X axis, and experience as the control. The plot shows wages against education, with both variables "purged" of the effect of experience.
# corr Statistics "Correlation coefficients"
Prints the pairwise correlation coefficients (Pearson's product-moment correlation) for the selected variables. The default behavior is to use all available observations for computing each pairwise coefficient, but if the option box is checked the sample is limited (if necessary) so that the same set of observations is used for all the coefficients. This option has an effect only if there are differing numbers of missing values for the variables used.
Menu path: /View/Correlation matrix
Other access: Main window pop-up menu (multiple selection)
Script command: <@ref="corr">
# corrgm Statistics "Correlogram"
Prints the values of the autocorrelation function for <@var="series">, which may be specified by name or number. The values are defined as <@fig="autocorr"> where <@itl="u"><@sub="t"> is the <@itl="t">th observation of the variable <@itl="u"> and <@itl="s"> is the number of lags.
The partial autocorrelations (calculated using the Durbin–Levinson algorithm) are also shown: these are net of the effects of intervening lags. In addition the Ljung–Box <@itl="Q"> statistic is printed. This may be used to test the null hypothesis that the series is "white noise"; it is asymptotically distributed as chi-square with degrees of freedom equal to the number of lags used. Unless the <@opt="--quiet"> option is given, a plot of the correlogram is printed.
If a <@var="order"> value is specified the length of the correlogram is limited to at most that number of lags, otherwise the length is determined automatically, as a function of the frequency of the data and the number of observations.
Menu path: /Variable/Correlogram
Other access: Main window pop-up menu (single selection)
Script command: <@ref="corrgm">
# count-model Estimation "Models for count data"
The dependent variable is taken to represent a count of the occurrence of events of some sort, and must have only non-negative integer values. By default the Poisson distribution is used, but the drop-down selector gives the options of using the Negative Binomial distribution. (The variant NegBin 2 is commonly used in econometrics, but the lesser used NegBin 1 is also available.)
Optionally, you may add an "offset" variable to the specification. This is a scale variable, the log of which is added to the linear regression function (implicitly, with a coefficient of 1.0). This makes sense if you expect the number of occurrences of the event in question to be proportional, other things equal, to some known factor. For example, the number of traffic accidents might be supposed to be proportional to traffic volume, other things equal, and in that case traffic volume could be specified as an "offset" in a model of the accident rate. The offset variable must be strictly positive.
By default, standard errors are computed using a numerical approximation to the Hessian at convergence. But if the "Robust standard errors" box is checked then QML standard errors are calculated, using a "sandwich" of the inverse of the Hessian and the outer product of the gradient.
# curve Graphs "Plot a curve"
This dialog box allows you to create a gnuplot graph by specifying a formula. This must be an expression acceptable to gnuplot. Use <@lit="x"> to denote the value of the variable on the x-axis. Please note that gnuplot uses <@lit="**"> for exponentiation (raising to a power), and that the decimal character must be given as ".". Examples:
<code>
10+0.35*x
100+5.3*x-0.12*x**2
sin(x)
exp(sqrt(pi*x))
</code>
To put an additional line onto a graph created in this way, click on the graph and select "Edit", select the "Lines" tab in the graph editing dialog, and use the "Add line" button.
# cusum Tests "CUSUM test"
Must follow the estimation of a model via OLS. Performs the CUSUM test—or if the <@opt="--squares"> option is given, the CUSUMSQ test—for parameter stability. A series of one-step ahead forecast errors is obtained by running a series of regressions: the first regression uses the first <@itl="k"> observations and is used to generate a prediction of the dependent variable at observation <@itl="k"> + 1; the second uses the first <@itl="k"> + 1 observations and generates a prediction for observation <@itl="k"> + 2, and so on (where <@itl="k"> is the number of parameters in the original model).
The cumulated sum of the scaled forecast errors, or the squares of these errors, is printed and graphed. The null hypothesis of parameter stability is rejected at the 5 percent significance level if the cumulated sum strays outside of the 95 percent confidence band.
In the case of the CUSUM test, the Harvey–Collier <@itl="t">-statistic for testing the null hypothesis of parameter stability is also printed. See Greene's <@itl="Econometric Analysis"> for details. For the CUSUMSQ test, the 95 percent confidence band is calculated using the algorithm given in Edgerton and Wells (1994).
Menu path: Model window, /Tests/CUSUM(SQ)
Script command: <@ref="cusum">
# datasort Dataset "Sorting data"
The selected variable is used as a sort key for the entire data set. The observations on all variables are re-ordered by increasing value of the key variable, or by decreasing value if you select the "Descending" option.
# density Statistics "Kernel density estimation"
Kernel density estimation proceeds by defining a set of evenly spaced reference points, over a suitable range in relation to the range of the data, and attributing a density to each reference point based on the actual observations in the vicinity.
The formula used to compute the estimated density at each reference point, <@itl="x">, is
<@fig="kernel1">
where <@itl="n"> denotes the number of data points, <@itl="h"> is a "bandwidth" parameter, and <@itl="k">() is the kernel function. The larger the value of the bandwidth parameter, the smoother the estimated density.
You are given the choice of using a Gaussian kernel (the standard normal density) or the Epanechnikov kernel. By default, the bandwidth is that suggested as a rule of thumb by Silverman (1986), namely
<@fig="kernel2">
where <@itl="s"> denotes the standard deviation of the data and IQR denotes the inter-quartile range. You can widen or shrink the bandwidth via the "bandwidth adjustment factor": the actual bandwidth used is obtained by multiplying the Silverman value by the adjustment factor.
For a good introductory discussion of kernel density estimation see Chapter 15 of Davidson and MacKinnon's <@itl="Econometric Theory and Methods">.
# dfgls Tests "The ADF-GLS test"
The ADF-GLS test is a variant of the Dickey–Fuller test for a unit root, for the case where the variable to be tested is assumed to have a non-zero mean or to exhibit a linear trend. The difference is that the de-meaning or de-trending of the variable is done using the GLS procedure suggested by Elliott, Rothenberg and Stock (1996). This gives a test of greater power than the standard Dickey–Fuller approach.
See also the <@ref="adf"> command and the <@opt="--gls"> option.
Menu path: /Variable/ADF-GLS test
# dialog Estimation "Model dialog box"
To select the dependent variable, highlight a variable in the list on the left and press the "Choose" button pointing to the Dependent variable slot. If you check the "Set as default" box, the selected variable will be pre-selected as dependent when the model dialog is next opened. Short-cut: double-click on a variable on the left to select it as the dependent variable and also set it as the default.
To select independent variables, highlight them on the left and press the "Add" button (or click the right mouse button). You can highlight several contiguous variables by dragging with the mouse. You can highlight a group of non-contiguous variables by clicking on them with the <@lit="Ctrl"> key pressed.
# dpanel Estimation "Dynamic panel models"
Carries out estimation of dynamic panel data models (that is, panel models including one or more lags of the dependent variable) using either the GMM-DIF or GMM-SYS method.
The dependent variable and regressors should be given in levels form; they will be differenced automatically (since this estimator uses differencing to cancel out the individual effects).
As regards the handling of instruments, please see the documentation for the script version of this command. Currently you cannot specify instruments explicitly in the GUI: all the independent variables are taken to be strictly exogenous.
By default the results of 1-step estimation are reported (with robust standard errors). You may select 2-step estimation as an option. In both cases tests for autocorrelation of orders 1 and 2 are provided, as well as the Sargan overidentification test and a Wald test for the joint significance of the regressors. Note that in this differenced model first-order autocorrelation is not a threat to the validity of the model, but second-order autocorrelation violates the maintained statistical assumptions.
For further details and examples, please see <@pdf="the Gretl User's Guide">.
Menu path: /Model/Panel/Dynamic panel model
Script command: <@ref="dpanel">
# expand Dataset "Expand data"
If you wish to add to a dataset a series that is of lower frequency, it is necessary to "expand" the new series. For instance, a quarterly series will have to be expanded to fit into a monthly dataset. In addition, you may sometimes want to expand an entire dataset to a higher frequency (perhaps, prior to adding a higher-frequency variable to the dataset).
Expansion of data should be considered an "expert" option; you need to know what you are doing. When combining series of differing original frequencies within one dataset, you should probably consider compacting the higher-frequency data rather than expanding the lower-frequency series.
That said, gretl offers two options: higher-frequency values can be interpolated using the method of Chow and Lin (1971), or the values of the lower-frequency series can be repeated as many times as required.
The Chow-Lin method is regression-based, using a constant and quadratic trend and assuming a first-order autoregressive process for the disturbances. Four degrees of freedom are used up by this procedure. As for the repetition of values, suppose we have a quarterly series with the value 35.5 in 1990:1, the first quarter of 1990. On expansion to monthly, the value 35.5 will be assigned to the observations for January, February and March of 1990. The expanded variable is therefore useless for fine-grained time-series analysis, outside of the special case where you know that the variable in question does in fact remain constant over the sub-periods.
# export Dataset "Export data"
You may export data in Comma-Separated Values (CSV) format: such data may be opened in spreadsheets and many other application programs. If you select this option you will get some further options regarding the specific format of the CSV file.
You also have the option of exporting data in the form of a "native" gretl datafile, or (if the data are suitable) exporting to a gretl database. See <@url="gretl.sourceforge.net/gretl_data.html"> for an account of gretl databases.
You may also export data in a format suitable for use with the following programs:
<indent>
• GNU R (<@url="www.r-project.org">)
</indent>
<indent>
• GNU octave (<@url="www.gnu.org/software/octave">)
</indent>
<indent>
• JMulTi (<@url="www.jmulti.de">)
</indent>
<indent>
• PcGive (<@url="www.pcgive.com">)
</indent>
If you wish to export data by copying to the clipboard rather than writing to a file on disk, select the series you want to copy in the main window, right-click, and select "Copy to clibboard". (Only CSV format is supported in this context.)
# factorized Graphs "Factorized plot"
This command requires the selection of three variables, the last of which must be a dummy variable (values 1 or 0). The Y variable is plotted against the X variable, with the data points colored differently depending on the value of the third.
Example: You have data on wages and educational attainment for a sample of people; you also have a dummy variable with value 1 for men and 0 for women (as in Ramanathan's <@lit="data7-2">). A "factorized plot" of <@lit="WAGE"> against <@lit="EDUC"> using the <@lit="GENDER"> dummy as factor will show the data points for men in one color and those for women in another (with a legend to identify them).
# fcast Prediction "Generate forecasts"
Must follow an estimation command. Forecasts are generated for the specified range of observations. Depending on the nature of the model, standard errors may also be generated (see below).
The choice between a static and a dynamic forecast applies only in the case of dynamic models, with an autoregressive error process or including one or more lagged values of the dependent variable as regressors. Static forecasts are one step ahead, based on realized values from the previous period, while dynamic forecasts employ the chain rule of forecasting. For example, if a forecast for <@itl="y"> in 2008 requires as input a value of <@itl="y"> for 2007, a static forecast is impossible without actual data for 2007. A dynamic forecast for 2008 is possible if a prior forecast can be substituted for <@itl="y"> in 2007.
The default is to give a static forecast for any portion of the forecast range that lies within the sample range over which the model was estimated, and a dynamic forecast (if relevant) out of sample. The <@lit="dynamic"> option requests a dynamic forecast from the earliest possible date, and the <@opt="static"> option requests a static forecast even out of sample.
<code>
fcast --plot=fc.pdf
</code>
will generate a graphic in PDF format. Absolute pathnames are respected, otherwise files are written to the gretl working directory.
The nature of the forecast standard errors (if available) depends on the nature of the model and the forecast. For static linear models standard errors are computed using the method outlined by Davidson and MacKinnon (2004); they incorporate both uncertainty due to the error process and parameter uncertainty (summarized in the covariance matrix of the parameter estimates). For dynamic models, forecast standard errors are computed only in the case of a dynamic forecast, and they do not incorporate parameter uncertainty. For nonlinear models, forecast standard errors are not presently available.
Menu path: Model window, /Analysis/Forecasts
Script command: <@ref="fcast">
# fractint Statistics "Fractional integration"
Tests the specified series for fractional integration ("long memory"). The null hypothesis is that the integration order of the series is zero. By default the local Whittle estimator (Robinson, 1995) is used but if the <@opt="--gph"> option is given the GPH test (Geweke and Porter-Hudak, 1983) is performed instead. If the <@opt="--all"> flag is given then the results of both tests are printed.
For details on this sort of test, see Phillips and Shimotsu (2004).
If the optional <@var="order"> argument is not given the order for the test(s) is set automatically as the lesser of <@itl="T">/2 and <@itl="T"><@sup="0.6">.
The results can be retrieved using the accessors <@lit="$test"> and <@lit="$pvalue">. These values are based on the Local Whittle Estimator unless the <@opt="--gph"> option is given.
Menu path: /Variable/Unit root tests/Fractional integration
Script command: <@ref="fractint">
# freq Statistics "Frequency distribution"
In the frequency plot dialog box you can control the characteristics of the plot in either of two ways.
First, you may choose the number of bins. In this case the width and placement of the bins are calculated automatically.
Alternatively, you may specify the lower limit of the left-most bin, and the width of the bins. In this case the number of bins is calculated automatically.
If you wish to align the bins on round numbers, here is one way to proceed: start by specifying the number of bins you want, and take a look at the plot that is produced. If it's not to your liking, take note of the modification that is required (for example, make the left-most bin start at 100 and impose a bin width of 200). Then make a second pass where you specify the left-hand limit and bin width.
This dialog also allows you to select a theoretical distribution to be plotted against the data: either the normal or the gamma. If the normal option is selected the Doornik–Hansen test for normality is computed. If the gamma option is selected, gretl computes Locke's nonparametric test for the null hypothesis that the variable follows the gamma distribution. Note that the parameterization of the gamma distribution used in gretl is (shape, scale).
Menu path: /Variable/Frequency distribution
Script command: <@ref="freq">
# garch Estimation "GARCH model"
Estimates a GARCH model (GARCH = Generalized Autoregressive Conditional Heteroskedasticity), either a univariate model or, if independent variables are selected, including the given exogenous variables. The conditional variance equation is shown below.
<@fig="garch_h">
The parameter <@var="p"> therefore represents the Generalized (or "AR") order, while <@var="q"> represents the regular ARCH (or "MA") order. If <@var="p"> is non-zero, <@var="q"> must also be non-zero otherwise the model is unidentified. However, you can estimate a regular ARCH model by setting <@var="q"> to a positive value and <@var="p"> to zero. The sum of <@var="p"> and <@var="q"> must be no greater than 5.
By default native gretl code is used in estimation of GARCH models, but you also have the option of using the algorithm of Fiorentini, Calzolari and Panattoni (1996). The former uses the BFGS maximizer while the latter uses the information matrix to maximize the likelihood, with fine-tuning via the Hessian.
Several variant estimates of the coefficient covariance matrix are available with this command. By default, the Hessian is used unless the "Robust standard errors" box is checked, in which case the QML (White) covariance matrix is used. Other possibilities (e.g. the information matrix, or the Bollerslev–Wooldridge estimator) can be specified using the <@ref="set"> command.
The estimated conditional variance, along with the residuals and various other model statistics, can be accessed and added to the dataset using the "Model data" menu in the window where the model is displayed. If the box marked "Standardize the residuals" is checked, the residuals are divided by the square root of te conditional variance.
Menu path: /Model/Time series/GARCH
Script command: <@ref="garch">
# genr Dataset "Generate a new variable"
Use this box to define a new variable, on the pattern <@var="name"> = <@var="formula">. The formula should be a well-formed combination of variable names, constants, operators and functions (details below). To ensure you get the type of variable you want, you can prefix the formula with a type-name, e.g. <@lit="scalar">, <@lit="series"> or <@lit="matrix">. For example, to create a series that has a constant value of 10, you can type
<code>
series c = 10
</code>
(otherwise <@lit="c = 10"> would create a scalar variable).
Supported <@itl="arithmetical operators"> are, in order of precedence: <@lit="^"> (exponentiation); <@lit="*">, <@lit="/"> and <@lit="%"> (modulus or remainder); <@lit="+"> and <@lit="-">.
The available <@itl="Boolean operators"> are (again, in order of precedence): <@lit="!"> (negation), <@lit="&&"> (logical AND), <@lit="||"> (logical OR), <@lit=">">, <@lit="<">, <@lit="=">, <@lit=">="> (greater than or equal), <@lit="<="> (less than or equal) and <@lit="!="> (not equal). The Boolean operators can be used in constructing dummy variables: for instance <@lit="(x > 10)"> returns 1 if <@lit="x"> > 10, 0 otherwise.
Built-in constants are <@lit="pi"> and <@lit="NA">. The latter is the missing value code: you can initialize a variable to the missing value with <@lit="scalar x = NA">.
The <@lit="genr"> command supports a wide range of mathematical and statistical functions, including all the common ones plus several that are special to econometrics. In addition it offers access to numerous internal variables that are defined in the course of running regressions, doing hypothesis tests, and so on. For a listing of functions and accessors, see <@gfr="the Gretl function reference">.
Besides the operators and functions noted above there are some special uses of <@lit="genr">:
<indent>
• <@lit="genr time"> creates a time trend variable (1,2,3,…) called <@lit="time">. <@lit="genr index"> does the same thing except that the variable is called <@lit="index">.
</indent>
<indent>
• <@lit="genr dummy"> creates dummy variables up to the periodicity of the data. In the case of quarterly data (periodicity 4), the program creates <@lit="dq1"> = 1 for first quarter and 0 in other quarters, <@lit="dq2"> = 1 for the second quarter and 0 in other quarters, and so on. With monthly data the dummies are named <@lit="dm1">, <@lit="dm2">, and so on. With other frequencies the names are <@lit="dummy_1">, <@lit="dummy_2">, etc.
</indent>
<indent>
• <@lit="genr unitdum"> and <@lit="genr timedum"> create sets of special dummy variables for use with panel data. The first codes for the cross-sectional units and the second for the time period of the observations.
</indent>
<@itl="Note">: In the command-line program, <@lit="genr"> commands that retrieve model-related data always reference the model that was estimated most recently. This is also true in the GUI program, if one uses <@lit="genr"> in the "gretl console" or enters a formula using the "Define new variable" option under the Variable menu in the main window. With the GUI, however, you have the option of retrieving data from any model currently displayed in a window (whether or not it's the most recent model). You do this under the "Model data" menu in the model's window.
The special variable <@lit="obs"> serves as an index of the observations. For instance <@lit="genr dum = (obs=15)"> will generate a dummy variable that has value 1 for observation 15, 0 otherwise. You can also use this variable to pick out particular observations by date or name. For example, <@lit="genr d = (obs>1986:4)">, <@lit="genr d = (obs>"2008/04/01")">, or <@lit="genr d = (obs="CA")">. If daily dates or observation labels are used in this context, they should be enclosed in double quotes. Quarterly and monthly dates (with a colon) may be used unquoted. Note that in the case of annual time series data, the year is not distinguishable syntactically from a plain integer; therefore if you wish to compare observations against <@lit="obs"> by year you must use the function <@lit="obsnum"> to convert the year to a 1-based index value, as in <@lit="genr d = (obs>obsnum(1986))">.
Scalar values can be pulled from a series in the context of a <@lit="genr"> formula, using the syntax <@var="varname"><@lit="["><@var="obs"><@lit="]">. The <@var="obs"> value can be given by number or date. Examples: <@lit="x[5]">, <@lit="CPI[1996:01]">. For daily data, the form <@var="YYYY/MM/DD"> should be used, e.g. <@lit="ibm[1970/01/23]">.
An individual observation in a series can be modified via <@lit="genr">. To do this, a valid observation number or date, in square brackets, must be appended to the name of the variable on the left-hand side of the formula. For example, <@lit="genr x[3] = 30"> or <@lit="genr x[1950:04] = 303.7">.
Menu path: /Add/Define new variable
Other access: Main window pop-up menu
Script command: <@ref="genr">
# genrand Programming "Generating random variables"
In this dialog you must give a name for the variable to be created, plus some additional information depending on the distribution.
<indent>
• Uniform: the lower and upper bounds for the distribution.
</indent>
<indent>
• Normal: the mean and (positive) standard deviation.
</indent>
<indent>
• Chi-square and Student's t: the degrees of freedom, which must be positive.
</indent>
<indent>
• F: both numerator and denominator degrees of freedom.
</indent>
<indent>
• gamma: shape and scale parameters (both positive).
</indent>
<indent>
• Binomial: the "success" probability and the integer number of trials.
</indent>
<indent>
• Poisson: the positive mean (which also equals the variance).
</indent>
If you want to generate repeatable sequences of pseudo-random numbers, you can set the seed, under the Tools menu.
# genseed Programming "Setting the seed for random numbers"
The "seed" controls the starting point for the sequence of pseudo-random numbers generated in a given gretl session. By default the seed is set when the program is started, using the system time. This ensures that you get a different sequence of random numbers each time you run the program. If you want to obtain repeatable sequences, you need to set the seed manually (and take note of the value you used).
Note that whenever you click "OK" in this dialog box, the generator is re-started, using the given seed. So, for example, if you (a) set the seed to (say) 147; (b) generate a series from the standard normal distribution; (c) revisit this dialog and click "OK" again with the seed still at 147; then (d) generate a second series from the standard normal distribution, the two generated series will be identical.
# gmm Estimation "GMM estimation"
Performs Generalized Method of Moments (GMM) estimation using the BFGS (Broyden, Fletcher, Goldfarb, Shanno) algorithm. You must specify one or more commands for updating the relevant quantities (typically GMM residuals), one or more sets of orthogonality conditions, an initial matrix of weights, and a listing of the parameters to be estimated, all enclosed between the tags <@lit="gmm"> and <@lit="end gmm">. Any options should be appended to the <@lit="end gmm"> line.
Please see <@pdf="the Gretl User's Guide"> for details on this command. Here we just illustrate with a simple example.
<code>
gmm e = y - X*b
orthog e ; W
weights V
params b
end gmm
</code>
In the example above we assume that <@lit="y"> and <@lit="X"> are data matrices, <@lit="b"> is an appropriately sized vector of parameter values, <@lit="W"> is a matrix of instruments, and <@lit="V"> is a suitable matrix of weights. The statement
<code>
orthog e ; W
</code>
indicates that the residual vector <@lit="e"> is in principle orthogonal to each of the instruments composing the columns of <@lit="W">.
Menu path: /Model/GMM
Script command: <@ref="gmm">
# graphing Graphs "Graphing"
Gretl calls a separate program, namely gnuplot, to generate graphs. Gnuplot is a very full-featured graphing program with myriad options. Gretl gives you direct access, via a graphical interface, to a subset of these options and it tries to choose sensible values for you; it also allows you to take complete control over graph details if you wish.
With a graph displayed, you can click on the graph window for a pop-up menu with the following options:
<indent>
• Save as postscript: save the graph in encapsulated postscript (EPS) format
</indent>
<indent>
• Save as PNG: save in Portable Network Graphics format
</indent>
<indent>
• Save to session as icon: the graph will appear in iconic form when you select "Icon view" from the Session menu
</indent>
<indent>
• Zoom: lets you select an area within the graph for closer inspection
</indent>
<indent>
• Print: (on the Gnome desktop and MS Windows only) lets you print the graph directly
</indent>
<indent>
• Copy to clipboard: (MS Windows only) lets you paste the graph into Windows applications such as MS Word
</indent>
<indent>
• Edit: opens a controller for the plot which lets you adjust various aspects of its appearance
</indent>
<indent>
• Close: closes the graph window
</indent>
If you know something about gnuplot and wish to get finer control over the appearance of a graph than is available via the graphical controller ("Edit" option), you have two further options:
<indent>
• Once the graph is saved as a session icon, you can right-click on its icon for a further pop-up menu. One of the options here is "Edit plot commands", which opens an editing window with the actual gnuplot commands displayed. You can edit these commands and either save them for future processing or send them to gnuplot (with the execute toolbar icon in the plot commands editing window).
</indent>
<indent>
• Another way to save the plot commands (or to save the displayed plot in formats other than EPS or PNG) is to use "Edit" item on a graph's pop-up menu to invoke the graphical controller, then click on the "Output to file" tab in the controller. You are then presented with a drop-down menu of formats in which to save the graph.
</indent>
To find out more about gnuplot, see http://www.gnuplot.info
# graphpg Graphs "Gretl graph page"
The session "graph page" will work only if you have the LaTeX typesetting system installed, and are able to generate and view PDF or PostScript output.
In the session icon window, you can drag up to eight graphs onto the graph page icon. When you double-click on the graph page (or right-click and select "Display"), a page containing the selected graphs will be composed and opened in a suitable viewer. From there you should be able to print the page.
To clear the graph page, right-click on its icon and select "Clear".
In script (or console) mode, you can add a graph to the graph page by issuing the command <@lit="graphpg add"> after saving a named graph, as in
<code>
grf1 <- gnuplot Y X
graphpg add
</code>
Also in script mode you can call for display of the graph page using the command <@lit="graphpg show">, and can clear the page via <@lit="graphpg free">.
Note that on systems other than MS Windows, you may have to adjust the setting for the program used to view PDF or PostScript files. Find that under the "Programs" tab in the gretl Preferences dialog box (under the Tools menu in the main window).
Script command: <@ref="graphpg">
# 3-D Graphs "3-dimensional plots"
This feature works best if you have gnuplot 3.8 or higher installed. In that case you can manipulate the 3-D plot with the mouse (rotate it, and expand or shrink the axes).
In composing a 3-D plot, note that the Z-axis will be shown as the vertical axis. Thus if you have some dependent variable that you think may be influenced by two independent variables, you should put the dependent variable on the Z-axis, and the independent variables on the X and Y axes.
Unlike most other gretl graphs, 3-D plots are controlled by gnuplot rather than gretl itself. The gretl graph-editing menu is not available.
# gui-htest Tests "Test statistic calculator"
Gretl's test calculator computes test statistics and p-values for various common hypothesis tests concerning one or two populations. The required input takes the form of sample statistics derived from one or two samples, depending on the test chosen. These statistics can be typed in as numerical values. Alternatively, if you have a data file open, you can get gretl to calculate sample statistics for a selected variable or variables (in the case of means and variances, but not in the case of proportions).
If you want to base your test on a variable in the data set, first activate this option by checking the box titled "Use variable from dataset". Then the drop-down list of variables will become active and you can select a variable. When you select a variable from the list, the relevant statistics are automatically entered in the boxes below.
In addition to the simple selection of a variable, you have the option of specifying a restriction on the selected variable (that is, defining a sub-sample). For example, suppose you have wage data in a variable called "wage" and you also have a dummy variable called "gender" that equals 1 for males and 0 for females (or vice versa). Then, in the test for the difference of two means, you could select "wage" in both slots, but add to the top slot "(gender=0)" and to the bottom "(gender=1)". This would then give you a test for the difference between mean male income and mean female income. Note that when you type a restriction in this way, you must then press the Enter key to have the sample statistics calculated.
The sub-sampling restriction must be placed in parentheses following the selected variable, and in general the restriction takes the form "var2 op value," where var2 is the name of a variable in the current data set, val is a numerical value, and op is a comparison operator chosen from =, !=, <, >, <= or >= (respectively equality, inequality, less than, greater than, less than or equal, and greater than or equal). The spaces around the operator are optional.
# gui-htest-np Tests "Nonparametric tests"
Under the "Difference test" tab you can carry out a nonparametric test for a difference between two populations or groups, the specific test depending on the option selected.
Sign test: This test is based on the fact that if two samples, <@itl="x"> and <@itl="y">, are drawn randomly from the same distribution, the probability that <@itl="x"><@sub="i"> > <@itl="y"><@sub="i">, for each observation <@itl="i">, should equal 0.5. The test statistic is <@itl="w">, the number of observations for which <@itl="x"><@sub="i"> > <@itl="y"><@sub="i">. Under the null hypothesis this follows the Binomial distribution with parameters (<@itl="n">, 0.5), where <@itl="n"> is the number of observations.
Rank sum test: The Wilcoxon rank-sum test is performed. This test proceeds by ranking the observations from both samples jointly, from smallest to largest, then finding the sum of the ranks of the observations from one of the samples. The two samples do not have to be of the same size, and if they differ the smaller sample is used in calculating the rank-sum. Under the null hypothesis that the samples are drawn from populations with the same median, the probability distribution of the rank-sum can be computed for any given sample sizes; and for reasonably large samples a close Normal approximation exists.
Signed rank test: The Wilcoxon signed-rank test is performed. This is designed for matched data pairs such as, for example, the values of a variable for a sample of individuals before and after some treatment. The test proceeds by finding the differences between the paired observations, <@itl="x"><@sub="i"> – <@itl="y"><@sub="i">, ranking these differences by absolute value, then assigning to each pair a signed rank, the sign agreeing with the sign of the difference. One then calculates <@itl="W"><@sub="+">, the sum of the positive signed ranks. As with the rank-sum test, this statistic has a well-defined distribution under the null that the median difference is zero, which converges to the Normal for samples of reasonable size.
Under the "Runs test" tab you can carry out a test for the randomness of a given variable, based on the number of runs of consecutive positive or negative values. If you select the option "Use first difference", the variable is differenced prior to the analysis and hence the runs are interpreted as runs of increasing or decreasing values of the original variable. The test statistic is based on a normal approximation to the distribution of the number of runs under the null of randomness.
# hausman Tests "Panel diagnostics"
This test is available only after estimating an OLS model using panel data (see also <@lit="setobs">). It tests the simple pooled model against the principal alternatives, the fixed effects and random effects models.
The fixed effects model allows the intercept of the regression to vary across the cross-sectional units. An <@itl="F">-test is reported for the null hypotheses that the intercepts do not differ. The random effects model decomposes the residual variance into two parts, one part specific to the cross-sectional unit and the other specific to the particular observation. (This estimator can be computed only if the number of cross-sectional units in the data set exceeds the number of parameters to be estimated.) The Breusch–Pagan LM statistic tests the null hypothesis that the pooled OLS estimator is adequate against the random effects alternative.
The pooled OLS model may be rejected against both of the alternatives, fixed effects and random effects. Provided the unit- or group-specific error is uncorrelated with the independent variables, the random effects estimator is more efficient than the fixed effects estimator; otherwise the random effects estimator is inconsistent and the fixed effects estimator is to be preferred. The null hypothesis for the Hausman test is that the group-specific error is not so correlated (and therefore the random effects model is preferable). A low p-value for this test counts against the random effects model and in favor of fixed effects.
Menu path: Model window, /Tests/Panel diagnostics
Script command: <@ref="hausman">
# hccme Estimation "Robust standard errors"
You are offered several variant calculations for standard errors that are robust in the presence of heteroskedasticity (and, in the case of the HAC estimator, autocorrelation).
HC0 produces the original "White's standard errors"; HC1, HC2, HC3 and HC3a are subsequent variations that are generally reckoned to produce superior (more reliable) results. For details of the estimators, see MacKinnon and White (Journal of Econometrics, 1985) or Davidson and MacKinnon, Econometric Theory and Methods (Oxford, 2004). The labels given here are those used by Davidson and MacKinnon. Variant "HC3a" is the jackknife, as described in MacKinnon and White; HC3 is a close approximation to the jackknife.
If you use the HAC estimator for OLS on time-series data, you are able to fine-tune the lag-length using the <@lit="set"> command. Please see the gretl manual or the script commands help file for details.
When estimating a model via OLS using panel data, the default robust estimator of the covariance matrix is that given by Arellano. The alternative is Beck and Katz's Panel Corrected Standard Errors (PCSE). The latter take into account heteroskedasticity but not autocorrelation.
Two robust estimators of the covariance matrix are offered for GARCH models: QML is the Quasi-Maximum Likelihood Estimator, and BW is the Bollerslev-Wooldridge estimator.
# hsk Estimation "Heteroskedasticity-corrected estimates"
This command is applicable where heteroskedasticity is present in the form of an unknown function of the regressors which can be approximated by a quadratic relationship. In that context it offers the possibility of consistent standard errors and more efficient parameter estimates as compared with OLS.
The procedure involves (a) OLS estimation of the model of interest, followed by (b) an auxiliary regression to generate an estimate of the error variance, then finally (c) weighted least squares, using as weight the reciprocal of the estimated variance.
In the auxiliary regression (b) we regress the log of the squared residuals from the first OLS on the original regressors and their squares. The log transformation is performed to ensure that the estimated variances are non-negative. Call the fitted values from this regression <@itl="u"><@sup="*">. The weight series for the final WLS is then formed as 1/exp(<@itl="u"><@sup="*">).
Menu path: /Model/Other linear models/Heteroskedasticity corrected
Script command: <@ref="hsk">
# hurst Statistics "Hurst exponent"
Calculates the Hurst exponent (a measure of persistence or long memory) for a time-series variable having at least 128 observations.
The Hurst exponent is discussed by Mandelbrot. In theoretical terms it is the exponent, <@itl="H">, in the relationship
<@fig="hurst">
where RS is the "rescaled range" of the variable <@itl="x"> in samples of size <@itl="n"> and <@itl="a"> is a constant. The rescaled range is the range (maximum minus minimum) of the cumulated value or partial sum of <@itl="x"> over the sample period (after subtraction of the sample mean), divided by the sample standard deviation.
As a reference point, if <@itl="x"> is white noise (zero mean, zero persistence) then the range of its cumulated "wandering" (which forms a random walk), scaled by the standard deviation, grows as the square root of the sample size, giving an expected Hurst exponent of 0.5. Values of the exponent significantly in excess of 0.5 indicate persistence, and values less than 0.5 indicate anti-persistence (negative autocorrelation). In principle the exponent is bounded by 0 and 1, although in finite samples it is possible to get an estimated exponent greater than 1.
In gretl, the exponent is estimated using binary sub-sampling: we start with the entire data range, then the two halves of the range, then the four quarters, and so on. For sample sizes smaller than the data range, the RS value is the mean across the available samples. The exponent is then estimated as the slope coefficient in a regression of the log of RS on the log of sample size.
Menu path: /Variable/Hurst exponent
Script command: <@ref="hurst">
# intreg Estimation "Interval regression model"
Estimates an interval regression model. This model arises when the dependent variable is imperfectly observed for some (possibly all) observations. In other words, the data generating process is assumed to be
<@itl="y* = x b + u">
but we only observe <@itl="m <= y* <= M"> (the interval may be left- or right-unbounded). Note that for some observations <@itl="m"> may equal <@itl="M">. The variables <@var="minvar"> and <@var="maxvar"> must contain <@lit="NA">s for left- and right-unbounded observations, respectively.
In the model specification dialog, <@var="minvar"> and <@var="maxvar"> are indentified as the Lower bound variable and the Upper bound variable respectively.
The model is estimated by maximum likelihood, assuming normality of the disturbance term.
By default, standard errors are computed using the negative inverse of the Hessian. If the "Robust standard errors" box is checked, then QML or Huber–White standard errors are calculated instead. In this case the estimated covariance matrix is a "sandwich" of the inverse of the estimated Hessian and the outer product of the gradient.
Menu path: /Model/Nonlinear models/Interval regression
Script command: <@ref="intreg">
# irfboot Graphs "Impulse response plots"
If you select the bootstrap option when plotting impulse responses, gretl computes a confidence interval for the responses using the bootstrap method. The residuals from the original VAR (or VECM) are resampled with replacement; an artificial dataset is constructed based on the original parameter estimates and the resampled residuals; the system is re-estimated and the impulse responses are re-evaluated. This is repeated 999 times and the α/2 and 1 – α/2 quantiles for the responses are found and plotted along with the point estimates. This option is not currently available for restricted VECMs.
This dialog also supports reordering of the variables for the Cholesky decomposition of the cross-equation covariance matrix. The default is given by the order in which the variables are entered into the model specification, but the up and down arrows can be used to promote or demote a selected variable.
# kalman Estimation "Kalman filter"
Opens a block of statements to set up a Kalman filter. This block should end with the line <@lit="end kalman">, to which the options shown above may be appended. The intervening lines specify the matrices that compose the filter. For example,
<code>
kalman
obsy y
obsymat H
statemat F
statevar Q
end kalman
</code>
Please see <@pdf="the Gretl User's Guide"> for details.
See also <@xrf="kfilter">, <@xrf="ksimul">, <@xrf="ksmooth">.
Script command: <@ref="kalman">
# kpss Tests "KPSS stationarity test"
Computes the KPSS test (Kwiatkowski, Phillips, Schmidt and Shin, Journal of Econometrics, 1992) for stationarity of the given variable (or its first difference, if the differencing option is selected). The null hypothesis is that the variable in question is stationary, either around a level or, if the "include a trend" box is checked, around a deterministic linear trend.
The selected lag order determines the size of the window used for Bartlett smoothing. If the "show regression results" box is checked the results of the auxiliary regression are printed, along with the estimated variance of the random walk component of the variable.
The critical values shown for the test statistic are based on the response surfaces estimated by Sephton (Economics Letters, 1995), which are more accurate for small samples than the values given in the original KPSS article. When the test statistic lies between the 10 percent and 1 percent critical values a p-value is shown; this is obtained by linear interpolation and should not be taken too literally.
Menu path: /Variable/Unit root tests/KPSS test
Script command: <@ref="kpss">
# lad Estimation "Least Absolute Deviation estimation"
Calculates a regression that minimizes the sum of the absolute deviations of the observed from the fitted values of the dependent variable. Coefficient estimates are derived using the Barrodale–Roberts simplex algorithm; a warning is printed if the solution is not unique.
Standard errors are derived using the bootstrap procedure with 500 drawings. The covariance matrix for the parameter estimates, printed when the <@opt="--vcv"> flag is given, is based on the same bootstrap.
Menu path: /Model/Robust estimation/Least Absolute Deviation
Script command: <@ref="lad">
# lags-dialog Estimation "Lag selection box"
In this dialog you can select the lag order for the independent variables in a time-series model, and in some cases for the dependent variable also. (But note that the common lag order for vector models such as VARs and VECMs is handled separately, via a selection spinner in the main model dialog box.)
The spinners on the left let you select a range of consecutive lags for any given variable. To specify non-consecutive lags, click the check box next to the entry field titled "specific lags". This activates the entry box, into which you can type a list of lags, separated by spaces.
The row marked "default" offers a quick way to set a common lag specification for all the independent variables: values set in that row are copied to all the others (apart from the dependent variable, if present).
The dependent variable is treated specially: the minimum lag must be zero, which places the current value of the variable on the left-hand side of the model. Any higher lags appear with the independent variables on the right-hand side of the model.
Values selected in this dialog are remembered for the duration of your session with a given dataset.
# leverage Tests "Influential observations"
Must immediately follow an <@lit="ols"> command. Calculates the leverage (<@itl="h">, which must lie in the range 0 to 1) for each data point in the sample on which the previous model was estimated. Displays the residual (<@itl="u">) for each observation along with its leverage and a measure of its influence on the estimates, <@fig="influence">. "Leverage points" for which the value of <@itl="h"> exceeds 2<@itl="k">/<@itl="n"> (where <@itl="k"> is the number of parameters being estimated and <@itl="n"> is the sample size) are flagged with an asterisk. For details on the concepts of leverage and influence see Davidson and MacKinnon (1993), Chapter 2.
DFFITS values are also shown: these are "studentized residuals" (predicted residuals divided by their standard errors) multiplied by <@fig="dffit">. For discussions of studentized residuals and DFFITS see chapter 12 of Maddala's Introduction to Econometrics or Belsley, Kuh and Welsch (1980).
Briefly, a "predicted residual" is the difference between the observed value of the dependent variable at observation <@itl="t">, and the fitted value for observation <@itl="t"> obtained from a regression in which that observation is omitted (or a dummy variable with value 1 for observation <@itl="t"> alone has been added); the studentized residual is obtained by dividing the predicted residual by its standard error.
The "+" icon at the top of the leverage test window brings up a dialog box that allows you to save one or more of the test variables to the current data set.
Menu path: Model window, /Tests/Influential observations
Script command: <@ref="leverage">
# levinlin Tests "Levin-Lin-Chu test"
Carries out the panel unit-root test described by Levin, Lin and Chu (2002). The null hypothesis is that all of the individual time series exhibit a unit root, and the alternative is that none of the series has a unit root. (That is, a common AR(1) coefficient is assumed, although in other respects the statistical properties of the series are allowed to vary across individuals.)
Menu path: /Variable/Unit root tests/Levin-Lin-Chu test
Script command: <@ref="levinlin">
# logistic Estimation "Logistic regression"
Logistic regression: carries out an OLS regression using the logistic transformation of the dependent variable,
<@fig="logistic1">
You are presented with a dialog box that allows you to specify a different maximum if you wish. The supplied <@itl="y"><@sup="*"> value must be greater than all of the observed values of the dependent variable.
The fitted values and residuals from the regression are automatically transformed using
<@fig="logistic2">
where <@itl="x"> represents either a fitted value or a residual from the OLS regression using the transformed dependent variable. The reported values are therefore comparable with the original dependent variable.
Note that if the dependent variable is binary, you should use the <@ref="logit"> command instead.
Menu path: /Model/Nonlinear models/Logistic
Script command: <@ref="logistic">
# logit Estimation "Logit regression"
If the dependent variable is a binary variable (all values are 0 or 1) maximum likelihood estimates of the coefficients on <@var="indepvars"> are obtained via the "binary response model regression" (BRMR) method outlined by Davidson and MacKinnon (2004). As the model is nonlinear the slopes depend on the values of the independent variables. By default the slopes with respect to each of the independent variables are calculated (at the means of those variables) and these slopes replace the usual p-values in the regression output. This behavior can be suppressed my giving the <@opt="--p-values"> option. The chi-square statistic tests the null hypothesis that all coefficients are zero apart from the constant.
By default, standard errors are computed using the negative inverse of the Hessian. If the "Robust standard errors" box is checked, then QML or Huber–White standard errors are calculated instead. In this case the estimated covariance matrix is a "sandwich" of the inverse of the estimated Hessian and the outer product of the gradient. See chapter 10 of Davidson and MacKinnon for details.
If the dependent variable is not binary but is discrete, then by default it is interpreted as an ordinal response, and Ordered Logit estimates are obtained. However, if the <@opt="--multinomial"> option is given, the dependent variable is interpreted as an unordered response, and Multinomial Logit estimates are produced. (In either case, if the variable selected as dependent is not discrete an error is flagged.) In the multinomial case, the accessor <@lit="$mnlprobs"> is available after estimation, to get a matrix containing the estimated probabilities of the outcomes at each observation (observations in rows, outcomes in columns).
If you want to use logit for analysis of proportions (where the dependent variable is the proportion of cases having a certain characteristic, at each observation, rather than a 1 or 0 variable indicating whether the characteristic is present or not) you should not use the <@lit="logit"> command, but rather construct the logit variable, as in
<code>
genr lgt_p = log(p/(1 - p))
</code>
and use this as the dependent variable in an OLS regression. See chapter 12 of Ramanathan (2002).
Menu path: /Model/Nonlinear models/Logit
Script command: <@ref="logit">
# mahal Statistics "Mahalanobis distances"
The Mahalanobis distance is the distance between two points in a <@itl="k">-dimensional space, scaled by the statistical variation in each dimension of the space. For example, if <@itl="p"> and <@itl="q"> are two observations on a set of <@itl="k"> variables with covariance matrix <@itl="C">, then the Mahalanobis distance between the observations is given by
<@fig="mahal">
where (<@itl="p"> – <@itl="q">) is a <@itl="k">-vector. This reduces to Euclidean distance if the covariance matrix is the identity matrix.
The space for which distances are computed is defined by the selected variables. For each observation in the current sample range, the distance is computed between the observation and the centroid of the selected variables. This distance is the multidimensional counterpart of a standard <@itl="z">-score, and can be used to judge whether a given observation "belongs" with a group of other observations.
If the number of variables selected is 4 or less, the covariance matrix and its inverse are printed. Clicking the "+" button at the top of the window displaying the distances give you the option of adding the distances to the dataset as a new variable.
Menu path: /View/Mahalanobis distances
Script command: <@ref="mahal">
# markers Dataset "Add case markers"
This command needs the name of a file containing "case markers", that is, short identifying strings for the individual observations in the data set (for example, country or city names or codes). These marker strings should be no more than 8 characters long. The file should contain one marker per line, and there should be just as many markers as observations in the current dataset. If these conditions are met and the specified file is found, the case markers will be added; they will be visible when you choose "Display values" under gretl's Data menu.
# meantest Tests "Difference of means"
By default the test statistic is calculated on the assumption that the variances are equal for the two variables; with the <@opt="--unequal-vars"> option the variances are assumed to be different. This will make a difference to the test statistic only if there are different numbers of non-missing observations for the two variables.
Calculates the t statistic for the null hypothesis that the population means are equal for two selected variables, and shows its p-value. The command may be called with or without the assumption that the variances are equal for the two variables (although this will make a difference to the test statistic only if there are different numbers of non-missing observations for the two variables.)
Menu path: /Model/Bivariate tests/Difference of means
Script command: <@ref="meantest">
# missing Dataset "Missing data values"
Set a numerical value that will be interpreted as "missing" or "not available", either for a particular data series (under the Variable menu) or globally for the entire data set (under the Sample menu).
Gretl has its own internal coding for missing values, but sometimes imported data may employ a different code. For example, if a particular series is coded such that a value of -1 indicates "not applicable", you can select "Set missing value code" under the Variable menu and type in the value "-1" (without the quotes). Gretl will then read the -1s as missing observations.
# mle Estimation "Maximum likelihood estimation"
Performs Maximum Likelihood (ML) estimation using the BFGS (Broyden, Fletcher, Goldfarb, Shanno) algorithm. You must specify the log-likelihood function; it is recommended that you also supply expressions for the derivatives of this function with respect to each of the parameters if possible.
Simple example: Suppose we have a series <@lit="X"> with values 0 or 1 and we wish to obtain the maximum likelihood estimate of the probability, <@lit="p">, that <@lit="X"> = 1. (In this simple case we can guess in advance that the ML estimate of <@lit="p"> will simply equal the proportion of Xs equal to 1 in the sample.)
The parameter <@lit="p"> must first be added to the dataset and given an initial value. This can be done using the genr command or via menu choices. Appropriate "genr" lines may be typed into the MLE specification window prior to the specification of the log-likelihood function.
In the MLE window we type the following lines:
<code>
loglik = X*log(p) + (1-X)*log(1-p)
deriv p = X/p - (1-X)/(1-p)
</code>
The first line specifies the log-likelihood function, and the next line supplies the derivative of that function with respect to the parameter p. If no "deriv" lines are given, a numerical approximation to the derivatives is computed.
If the parameter p was not previously declared we could preface the above lines with something like the following:
<code>
genr p = 0.5
</code>
By default, standard errors are based on the Outer Product of the Gradient. If the robust standard errors box is checked, a QML estimator is used (namely, a sandwich of the negative inverse of the Hessian and the covariance matrix of the gradient). The Hessian is approximated numerically.
Menu path: /Model/Maximum likelihood
Script command: <@ref="mle">
# modeltab Utilities "The model table"
In econometric research it is common to estimate several models with a common dependent variable—the models differing in respect of which independent variables are included, or perhaps in respect of the estimator used. In this situation it is convenient to present the regression results in the form of a table, where each column contains the results (coefficient estimates and standard errors) for a given model, and each row contains the estimates for a given variable across the models.
Gretl provides a means of constructing such a table (and copying it in plain text, LaTeX or Rich Text Format). Here is how to do it:
<indent>
1. Estimate a model which you wish to include in the table, and in the model display window, under the File menu, select "Save to session as icon" or "Save as icon and close".
</indent>
<indent>
2. Repeat step 1 for the other models to be included in the table (up to a total of six models).
</indent>
<indent>
3. When you are done estimating the models, open the icon view of your gretl session (by selecting "icon view" under the Session menu in the main gretl window, or by clicking the "session icon view" icon on the gretl toolbar).
</indent>
<indent>
4. In session icon view, there is an icon labeled "Model table". Decide which model you wish to appear in the left-most column of the model table and add it to the table, either by dragging its icon onto the Model table icon, or by right-clicking on the model icon and selecting "Add to model table" from the pop-up menu.
</indent>
<indent>
5. Repeat step 4 for the other models you wish to include in the table. The second model selected will appear in the second column from the left, and so on.
</indent>
<indent>
6. When you are finished composing the model table, display it by double-clicking on its icon. Under the Edit menu in the window which appears, you have the option of copying the table to the clipboard in various formats.
</indent>
<indent>
7. If the ordering of the models in the table is not what you wanted, right-click on the model table icon and select "Clear table". Then go back to step 4 above and try again.
</indent>
Menu path: Session window, Model table icon
Script command: <@ref="modeltab">
# mpols Estimation "Multiple-precision OLS"
Computes OLS estimates for the specified model using multiple precision floating-point arithmetic. This command is available only if gretl is compiled with support for the Gnu Multiple Precision (GMP) library. By default 256 bits of precision are used for the calculations, but this can be increased via the environment variable <@lit="GRETL_MP_BITS">. For example, when using the bash shell one could issue the following command, before starting gretl, to set a precision of 1024 bits.
<code>
export GRETL_MP_BITS=1024
</code>
Menu path: /Model/Other linear models/High precision OLS
Script command: <@ref="mpols">
# negbin Estimation "Negative Binomial regression"
Estimates a Negative Binomial model. The dependent variable is taken to represent a count of the occurrence of events of some sort, and must have only non-negative integer values. By default the model NegBin 2 is used, in which the conditional variance of the count is given by μ(1 + αμ), where μ denotes the conditional mean. But if the <@opt="--model1"> option is given the conditional variance is μ(1 + α).
The optional <@lit="offset"> series works in the same way as for the <@ref="poisson"> command. The Poisson model is a restricted form of the Negative Binomial in which α = 0 by construction.
By default, standard errors are computed using a numerical approximation to the Hessian at convergence. But if the <@opt="--opg"> option is given the covariance matrix is based on the Outer Product of the Gradient (OPG), or if the <@opt="--robust"> option is given QML standard errors are calculated, using a "sandwich" of the inverse of the Hessian and the OPG.
Menu path: /Model/Nonlinear models/Count data...
Script command: <@ref="negbin">
# nls Estimation "Nonlinear Least Squares"
Performs Nonlinear Least Squares (NLS) estimation using a modified version of the Levenberg–Marquardt algorithm. You must supply a function specification; it is recommended but not required that you also supply expressions for the derivatives of this function with respect to each of the parameters if possible. If you do not supply derivatives you should instead give a list of the parameters to be estimated (separated by spaces or commas), preceded by the keyword <@lit="params">; these can be either scalars, or vectors, or any combination of the two.
Example: Suppose we have a data set with variables <@itl="C"> and <@itl="Y"> (e.g. <@lit="greene11_3.gdt">) and we wish to estimate a nonlinear consumption function of the form
<@fig="greene_Cfunc">
The parameters alpha, beta and gamma must first be added to the dataset and given initial values. This can be done using the genr command or via menu choices. Appropriate "genr" lines may be typed into the NLS specification window prior to the function specification.
In the NLS window we type the following lines:
<code>
C = alpha + beta * Y^gamma
deriv alpha = 1
deriv beta = Y^gamma
deriv gamma = beta * Y^gamma * log(Y)
</code>
The first line specifies the regression function, and the next three lines supply the derivatives of that function with respect to each of the parameters in turn. If the "deriv" lines are not given, a numerical approximation to the Jacobian is computed.
If the parameters alpha, beta and gamma were not previously declared we could preface the above lines with something like the following:
<code>
genr alpha = 1
genr beta = 1
genr gamma = 1
</code>
For further details on NLS estimation please see <@pdf="the Gretl User's Guide">.
Menu path: /Model/Nonlinear models/Nonlinear Least Squares
Script command: <@ref="nls">
# normtest Tests "Normality test"
Carries out a test for normality for the given <@var="series">. The specific test is controlled by the option flags (but if no flag is given, the Doornik–Hansen test is performed). Note: the Doornik–Hansen and Shapiro–Wilk tests are recommended over the others, on account of their superior small-sample properties.
The test statistic and its p-value may be retrieved using the accessors <@lit="$test"> and <@lit="$pvalue">. Please note that if the <@opt="--all"> option is given, the result recorded is that from the Doornik–Hansen test.
Menu path: /Variable/Normality test
Script command: <@ref="normtest">
# nulldata Dataset "Creating a blank dataset"
Establishes a "blank" data set, containing only a constant and an index variable, with periodicity 1 and the specified number of observations. This may be used for simulation purposes: some of the <@lit="genr"> commands (e.g. <@lit="genr uniform()">, <@lit="genr normal()">) will generate dummy data from scratch to fill out the data set. This command may be useful in conjunction with <@lit="loop">. See also the "seed" option to the <@ref="set"> command.
By default, this command cleans out all data in gretl's current workspace. If you give the <@opt="--preserve"> option, however, any currently defined matrices are retained.
Menu path: /File/New data set
Script command: <@ref="nulldata">
# ols Estimation "Ordinary Least Squares"
Computes ordinary least squares (OLS) estimates for the specified model.
Besides coefficient estimates and standard errors, the program also prints p-values for <@itl="t"> (two-tailed) and <@itl="F">-statistics. A p-value below 0.01 indicates statistical significance at the 1 percent level and is marked with <@lit="***">. <@lit="**"> indicates significance between 1 and 5 percent and <@lit="*"> indicates significance between the 5 and 10 percent levels. Model selection statistics (the Akaike Information Criterion or AIC and Schwarz's Bayesian Information Criterion) are also printed. The formula used for the AIC is that given by Akaike (1974), namely minus two times the maximized log-likelihood plus two times the number of parameters estimated.
saves the residuals under the name <@lit="uh">. See the "accessors" section of the gretl function reference for details.
Menu path: /Model/Ordinary Least Squares
Other access: Beta-hat button on toolbar
Script command: <@ref="ols">
# omit Tests "Omit variables"
This command re-estimates the given model after omitting the specified variables, or after sequentially omitting insignificant variables if the relevant box is available and is checked. Besides the usual model output, it prints a test for the joint significance of the omitted variables. The null hypothesis is that the true coefficients on all the omitted variables equal zero.
Sequential elimination works as follows: at each step the variable with the highest p-value is omitted, until all remaining variables have a p-value no greater than some cutoff. The default cutoff is 10 percent (two-sided); this can be adjusted via the spin button. By default this process operates on all variables in the model (apart from the constant). If you want to confine it to a subset of the variables, check the box labeled "Test only selected variables" and make a selection.
Menu path: Model window, /Tests/Omit variables
Script command: <@ref="omit">
# online Dataset "Access online databases"
Gretl is able to access databases at Wake Forest University (your computer must be connected to the internet for this to work).
Under the "File, Browse databases" menu, select the item "on database server". A window should appear, showing a listing of the gretl databases available at Wake Forest. (Depending on your location and the speed of your internet connection, this may take a few seconds.) Along with the name of the database and a short description, there will appear a "Local status" entry: this shows whether you have the database installed locally (on the hard drive of your computer) and if so, whether or not it is up to date with the version on the server.
If you have a given database installed locally, and it is up to date, there is no advantage in accessing it via the server. But for a database that is not already installed and up to date, you may wish to get a listing of the data series: click on "Get series listing". This brings up a further window, from which you can display the values of a chosen data series, graph those values, or import them into gretl's workspace. These tasks can be accomplished using the "Series" menu, or via the popup menu that appears when you click the right mouse button on a given series. You can also search the listing for a variable of interest (the "Find" menu item).
If you want faster access to the data, or wish to access the database offline, then select the line showing the database you want, in the initial database window, and press the "Install" button. This will download the database in compressed format, then uncompress it and install it on your hard drive. Thereafter you should be able to find it under the "File, Browse databases, gretl native" menu.
# panel Estimation "Panel models"
Estimates a panel model. By default the fixed effects estimator is used; this is implemented by subtracting the group or unit means from the original data.
If the "Random effects" button is checked, random effects GLS estimates are computed, using the method of Swamy and Arora.
For more details on panel estimation, please see <@pdf="the Gretl User's Guide">.
Menu path: /Model/Panel
Script command: <@ref="panel">
# panel-between Estimation "Between groups model"
This dialog allows you to enter a specification for the "between model" in the context of panel data. This regression uses the group-means of the data, thereby ignoring the variation within the groups. This model is rarely of great interest in its own right, but may be useful for purposes of comparison (for example, with the fixed effects model).
# panel-mode Dataset "Panel data organization"
This dialog offers up to three options with regard to defining a data set as a panel. The first two options require that the data set is already organized in a panel format (although this may not yet be recognized by gretl). The third option requires that the data set contains variables that represent the panel structure.
<@itl="Stacked time series">: Let there be <@var="N"> cross-sectional units in the data set, and let <@var="T"> = the number of time-series observations per unit. By selecting this option you are telling gretl that the data set is currently composed of <@var="N"> consecutive blocks of <@var="T"> time-series observations, one for each cross-sectional unit. The next step will be to specify the value of <@var="N">.
<@itl="Stacked cross sections">: You are telling gretl that the data set is currently composed of <@var="T"> consecutive blocks of <@var="N"> cross-sectional observations, one for each time period. The next step, again, will be to specify the value of <@var="N">.
If the total number of observations in the current dataset is prime, the above options are not available.
<@itl="Use index variables">: You are saying that the data set is currently organized any old way (it doesn't matter how), but that it contains two variables that index the cross-sectional units and the time periods respectively. The next step will be to select those two variables. Panel index variables must have nothing but non-negative integer values, with no missing values. If there are no such variables in the dataset this option is not available.
# panel-wls Estimation "Groupwise weighted least squares"
Groupwise weighted least squares for panel data. Computes weighted least squares (WLS) estimates, with the weights based on the estimated error variances for the respective cross-sectional units in the sample.
If the iteration option is selected, the procedure is iterated: at each round the residuals are re-computed using the current WLS parameter estimates, which gives rise to a new set of estimates of the error variances, and a hence a new set of weights. Iteration stops when the maximum difference in the parameter estimates from one round to the next falls below 0.0001 or the number of iterations reaches 20. If the iteration converges, the resulting estimates are Maximum Likelihood.
# pca Statistics "Principal Components Analysis"
Principal Components Analysis. Prints the eigenvalues of the correlation matrix (or the covariance matrix if the option box is checked) for the variables in <@var="varlist">, along with the proportion of the joint variance accounted for by each component. Also prints the corresponding eigenvectors (or "component loadings").
In the window displaying the results, you have the option of saving the principal components to the dataset as series.
Menu path: /View/Principal components
Other access: Main window pop-up (multiple selection)
Script command: <@ref="pca">
# pergm Statistics "Periodogram"
Computes and displays (and if not in batch mode, graphs) the spectrum of the specified series. By default the sample periodogram is given, but optionally a Bartlett lag window is used in estimating the spectrum (see, for example, Greene's <@itl="Econometric Analysis"> for a discussion of this). The default width of the Bartlett window is twice the square root of the sample size but this can be set manually using the <@var="bandwidth"> parameter, up to a maximum of half the sample size.
If the <@opt="--log"> option is given the spectrum is represented on a logarithmic scale.
The (mutually exclusive) options <@opt="--radians"> and <@opt="--degrees"> influence the appearance of the frequency axis when the periodogram is graphed. By default the frequency is scaled by the number of periods in the sample, but these options cause the axis to be labeled from 0 to π radians or from 0 to 180°, respectively.
Menu path: /Variable/Periodogram
Other access: Main window pop-up menu (single selection)
Script command: <@ref="pergm">
# polyweights Transformations "Polynomial trend fitting"
In fitting a polynomial trend to a time series it may be desirable to give extra weight to the observations at the start and end of the sample. (Points in the middle of the sample range have neighbours on both sides that are likely to be pulling the fit in the same general direction.)
The weighting schemes offered here (quadratic, cosine-bell and steps) can be used to this effect. If you select one of these schemes two additional settings must be chosen: first, what maximum weight should be used (the minimum, baseline weight is 1.0)? Second, what central fraction of the sample should be given a uniform (minimal) weighting?
Suppose, for example, you select a maximum weight of 3.0 and a central fraction of 0.4. This means that the middle 40 percent of the data get a weight of 1.0. If the steps shape is selected the first and last 30 percent of the observations get a weight of 3.0; otherwise, for the first 30 percent of observations the weights decline gradually from 3.0 to 1.0; and for the last 30 percent the weights increase from 1.0 to 3.0.
# poisson Estimation "Poisson estimation"
Estimates a poisson regression. The dependent variable is taken to represent the occurrence of events of some sort, and must take on only non-negative integer values.
If a discrete random variable <@itl="Y"> follows the Poisson distribution, then
<@fig="poisson1">
for <@itl="y"> = 0, 1, 2,…. The mean and variance of the distribution are both equal to <@itl="v">. In the Poisson regression model, the parameter <@itl="v"> is represented as a function of one or more independent variables. The most common version (and the only one supported by gretl) has
<@fig="poisson2">
or in other words the log of <@itl="v"> is a linear function of the independent variables.
Optionally, you may add an "offset" variable to the specification. This is a scale variable, the log of which is added to the linear regression function (implicitly, with a coefficient of 1.0). This makes sense if you expect the number of occurrences of the event in question to be proportional, other things equal, to some known factor. For example, the number of traffic accidents might be supposed to be proportional to traffic volume, other things equal, and in that case traffic volume could be specified as an "offset" in a Poisson model of the accident rate. The offset variable must be strictly positive.
By default, standard errors are computed using the negative inverse of the Hessian. If the <@opt="--robust"> flag is given, then QML or Huber–White standard errors are calculated instead. In this case the estimated covariance matrix is a "sandwich" of the inverse of the estimated Hessian and the outer product of the gradient.
See also <@ref="negbin">.
Menu path: /Model/Nonlinear models/Count data...
Script command: <@ref="poisson">
# probit Estimation "Probit model"
If the dependent variable is a binary variable (all values are 0 or 1) maximum likelihood estimates of the coefficients on <@var="indepvars"> are obtained via the "binary response model regression" (BRMR) method outlined by Davidson and MacKinnon (2004). As the model is nonlinear the slopes depend on the values of the independent variables. By default the slopes with respect to each of the independent variables are calculated (at the means of those variables) and these slopes replace the usual p-values in the regression output. This behavior can be suppressed my giving the <@opt="--p-values"> option. The chi-square statistic tests the null hypothesis that all coefficients are zero apart from the constant.
By default, standard errors are computed using the negative inverse of the Hessian. If the "Robust standard errors" box is checked, then QML or Huber–White standard errors are calculated instead. In this case the estimated covariance matrix is a "sandwich" of the inverse of the estimated Hessian and the outer product of the gradient. See chapter 10 of Davidson and MacKinnon for details.
If the dependent variable is not binary but is discrete, then Ordered Probit estimates are obtained. (If the variable selected as dependent is not discrete, an error is flagged.)
Probit for analysis of proportions is not implemented in gretl at this point.
Menu path: /Model/Nonlinear models/Probit
Script command: <@ref="probit">
# qlrtest Tests "Quandt likelihood ratio test"
For a model estimated on time-series data via OLS, performs the Quandt likelihood ratio (QLR) test for a structural break at an unknown point in time, with 15 percent trimming at the beginning and end of the sample period.
For each potential break point within the central 70 percent of the observations, a Chow test is performed (see <@ref="chow">). The QLR test statistic is the maximum of the <@itl="F"> values from these tests. It follows a non-standard distribution, the critical values of which are taken from Stock and Watson's <@itl="Introduction to Econometrics"> (2003). If the QLR statistic exceeds the critical value at the chosen level of significance, one can infer that the parameters of the model are not constant. This statistic can be used to detect forms of instability other than a single discrete break (such as multiple breaks or a slow drifting of the parameters).
Menu path: Model window, /Tests/QLR test
Script command: <@ref="qlrtest">
# qqplot Graphs "Q-Q plot"
With just one series selected, displays a plot of the empirical quantiles of the given series against the quantiles of the normal distribution. The series must include at least 20 valid observations in the current sample range. By default the empirical quantiles are plotted against quantiles of the normal distribution having the same mean and variance as the sample data, but two alternatives are available: the data may be standardized (converted to z-scores) before plotting, or the "raw" empirical quantiles may be plotted against the quantiles of the standard normal distribution.
Given two series arguments, <@var="y"> and <@var="x">, displays a plot of the empirical quantiles of <@var="y"> against those of <@var="x">. The data values are not standardized.
Menu path: /Variable/Normal Q-Q plot
Menu path: /View/Graph specified vars/Q-Q plot
Script command: <@ref="qqplot">
# quantreg Estimation "Quantile regression"
Quantile regression. By default standard errors are computed according to the asymptotic formula given by Koenker and Bassett (<@itl="Econometrica">, 1978), but if the "robust" box is checked we use the heteroskedasticity-robust variant from Koenker and Zhao (<@itl="Journal of Nonparametric Statistics">, 1994).
If the "Compute confidence intervals" option is checked gretl will calculate confidence intervals for the coefficients, in place of standard errors. The "robust" check-box still has an effect: if it is not checked, the intervals are computed on the assumption of IID errors; with it, gretl uses the robust estimator developed by Koenker and Machado (<@itl="Journal of the American Statistical Association">, 1999). Note that these intervals are not just "plus or minus so many standard errors"; in general, they are asymmetrical about the point estimates of the coefficients.
You may give a list of quantiles (see the drop-down list for some pre-defined possibilities). In that case gretl will calculate quantile estimates and either standard errors or confidence intervals for each of the specified values.
To Follow up on the references given above, please see <@pdf="the Gretl User's Guide">.
Menu path: /Model/Robust estimation/Quantile regression
Script command: <@ref="quantreg">
# reset Tests "Ramsey's RESET"
Must follow the estimation of a model via OLS. Carries out Ramsey's RESET test for model specification (non-linearity) by adding the square and/or the cube of the fitted values to the regression and calculating the <@itl="F"> statistic for the null hypothesis that the parameters on the added terms are zero.
Menu path: Model window, /Tests/Ramsey's RESET
Script command: <@ref="reset">
# restrict-model Tests "Restrictions on a model"
Each restriction in the set should be expressed as an equation, with a linear combination of parameters on the left and a numeric value to the right of the equals sign. Parameters may be referenced in the form <@lit="b["><@var="i"><@lit="]">, where <@var="i"> represents the position in the list of regressors (starting at 1), or <@lit="b["><@var="varname"><@lit="]">, where <@var="varname"> is the name of the regressor in question.
The <@lit="b"> terms in the equation representing a restriction may be prefixed with a numeric multiplier, using <@lit="*"> to represent multiplication, for example <@lit="3.5*b[4]">.
Here is an example of a set of restrictions:
<code>
b[1] = 0
b[2] - b[3] = 0
b[4] + 2*b[5] = 1
</code>
# restrict-system Tests "Restrictions on a system of equations"
Each restriction in the set should be expressed as an equation, with a linear combination of parameters on the left and a numeric value to the right of the equals sign. Parameters are referenced using <@lit="b"> plus two numbers in square brackets. The leading number represents the position of the equation within the system and the second number indicates position in the list of regressors, starting at 1 in both cases. For example <@lit="b[2,1]"> denotes the first parameter in the second equation, and <@lit="b[3,2]"> the second parameter in the third equation.
The <@lit="b"> terms in the equation representing a restriction may be prefixed with a numeric multiplier, using <@lit="*"> to represent multiplication, for example <@lit="3.5*b[1,4]">.
Here is an example of a set of restrictions:
<code>
b[1,1] = 0
b[1,2] - b[2,2] = 0
b[3,4] + 2*b[3,5] = 1
</code>
# restrict-vecm Tests "Restrictions on a VECM"
Use this command to place linear restrictions on the cointegrating relations (beta) and/or adjustment coefficients (alpha) in a vector error-correction model (VECM).
Each restriction should be expressed as an equation, with a linear combination of parameters to the left of the equals sign and a numerical value on the right. Restrictions on beta may be non-homogeneous (non-zero on the right), but alpha restrictions must be homogeneous (zero on the right).
If the VECM is of rank 1, the elements of beta are referenced in the form <@lit="b["><@var="i"><@lit="]">, where <@var="i"> represents position in the cointegrating vector, starting at 1. For example, <@lit="b[2]"> denotes the second element in beta. If the rank is greater than 1, use <@lit="b"> plus two numbers in square brackets. For example, <@lit="b[2,1]"> denotes the first element in the second cointegrating vector.
To reference elements of alpha, use <@lit="a"> instead of <@lit="b">.
The parameter identifiers in the equation representing a restriction may be prefixed with a numeric multiplier, using <@lit="*"> to represent multiplication, for example <@lit="3.5*b[4]">.
Here is an example of a set of restrictions on a VECM of rank 1.
<code>
b[1] + b[2] = 0
b[1] + b[3] = 0
</code>
See also <@pdf="the Gretl User's Guide">.
# rmplot Graphs "Range-mean plot"
Range–mean plot: this command creates a simple graph to help in deciding whether a time series, <@itl="y">(t), has constant variance or not. We take the full sample t=1,...,T and divide it into small subsamples of arbitrary size <@itl="k">. The first subsample is formed by <@itl="y">(1),...,<@itl="y">(k), the second is <@itl="y">(k+1), ..., <@itl="y">(2k), and so on. For each subsample we calculate the sample mean and range (= maximum minus minimum), and we construct a graph with the means on the horizontal axis and the ranges on the vertical. So each subsample is represented by a point in this plane. If the variance of the series is constant we would expect the subsample range to be independent of the subsample mean; if we see the points approximate an upward-sloping line this suggests the variance of the series is increasing in its mean; and if the points approximate a downward sloping line this suggests the variance is decreasing in the mean.
Besides the graph, gretl displays the means and ranges for each subsample, along with the slope coefficient for an OLS regression of the range on the mean and the p-value for the null hypothesis that this slope is zero. If the slope coefficient is significant at the 10 percent significance level then the fitted line from the regression of range on mean is shown on the graph. The <@itl="t">-statistic for the null, and the corresponding p-value, are recorded and may be retrieved using the accessors <@lit="$test"> and <@lit="$pvalue"> respectively.
Menu path: /Variable/Range-mean graph
Script command: <@ref="rmplot">
# runs Tests "Runs test"
Carries out the nonparametric "runs" test for randomness of the specified <@var="series">, where runs are defined as sequences of consecutive positive or negative values. If you want to test for randomness of deviations from the median, for a variable named <@lit="x1"> with a non-zero median, you can do the following:
<code>
genr signx1 = x1 - median(x1)
runs signx1
</code>
If the <@opt="--difference"> option is given, the variable is differenced prior to the analysis, hence the runs are interpreted as sequences of consecutive increases or decreases in the value of the variable.
If the <@opt="--equal"> option is given, the null hypothesis incorporates the assumption that positive and negative values are equiprobable, otherwise the test statistic is invariant with respect to the "fairness" of the process generating the sequence, and the test focuses on independence alone.
Menu path: /Tools/Nonparametric tests
Script command: <@ref="runs">
# sampling Dataset "Setting the sample"
The Sample menu offers several ways of selecting a sub-sample from the current dataset.
If you choose "Sample/Restrict based on criterion..." you need to supply a Boolean (logical) expression, of the same sort that you would use to define a dummy variable. For example the expression "sqft > 1400" will select only cases for which the variable sqft has a value greater than 1400. Conditions may be concatenated using the logical operators "&&" (AND) and "||" (OR), and may be negated using "!" (NOT). If the dataset already contains dummy variables, you are also given the option of selecting one of these to define the sample (observations with a value of 1 for the selected dummy will be included, and others excluded).
The menu item "Sample/Drop all obs with missing values" redefines the sample to exclude all observations for which values of one or more variables are missing (leaving only complete cases).
To select observations for which a particular variable has no missing values, use "Restrict based on criterion..." and supply the Boolean condition "!missing(varname)" (replace "varname" with the name of the variable you want to use).
If the observations are labeled, you can exclude particular observations using, for example, <@lit="obs!="France""> as the Boolean criterion. The observation name must be enclosed in double quotes.
One point should be noted about defining a sample based on a dummy variable, a Boolean expression, or on the missing values criterion: Any "structural" information in the data header file (regarding the time series or panel nature of the data) is lost. You may reimpose structure with "Sample/Set frequency, startobs...".
Please see <@pdf="the Gretl User's Guide"> for further details.
# save-labels Utilities "Save or remove series labels"
If you choose Export here, gretl will write a file containing the descriptive labels of any series in the current dataset that have such labels. This is a plain text file with one line per variable. The line will be empty for variables that have no descriptive label.
If you choose Remove, the descriptive labels will be removed for all series that have such labels. This would be appropriate only if the current labels have somehow been added in error.
# add-labels Utilities "Add series labels"
If you choose Yes here, you are offered a file-open dialog box to select a plain text file containing descriptive labels for the series in the current dataset. The file should contain one label per line; a blank line means no label. Gretl will attempt to read as many labels as there are series in the dataset, excluding the constant.
# save-script Utilities "Save commands?"
If you choose Yes here, gretl will write a file containing a record of the commands you executed in the current session. Most commands that you execute via "point and click" have a "script" counterpart, and it is these script commands that will be saved. You could take the file as the basis for writing a gretl command script.
If you don't care to be prompted to save a record of commands on exit, uncheck the tick box in the save commands dialog.
# save-session Utilities "Save this gretl session?"
If you choose Yes here, gretl will write a file containing a "snapshot" of the current session, including a copy of the working dataset along with any models, graphs or other objects that you have saved "as icons". You can re-open this file later to recreate the state of gretl as of the time you quit the session (see the "File/Session files" menu).
If you mostly work with gretl using command scripts (which we recommend for "serious" econometric work) you probably don't need to save the session, but you should be sure to save any changes to your script that you wish to keep. You may also want to save any changes to your dataset, unless these are of a sort that can easily be recreated by running a script.
If you work with scripts and don't care to be prompted to save your session on exit, uncheck the tick box in the save session dialog.
# scatters Graphs "Multiple pairwise graphs"
Generates pairwise graphs of the selected "Y-axis variable" against each of the selected "X-axis variables" in turn. (Or you can select several variables for the Y-axis and one for the X-axis.) Scanning a set of such plots can be a useful step in exploratory data analysis. The maximum number of plots is six; any extra variables will be ignored.
Menu path: /View/Multiple graphs
Script command: <@ref="scatters">
# setinfo Dataset "Edit attributes of variable"
In this dialog box you can:
* Rename a (series) variable.
* Add or edit a description of the variable: this appears next to the variable name in the gretl main window.
* Add or edit the "display name" for the variable (if the variable is a series, not a scalar). This string (maximum 19 characters) is shown in place of the variable name when the variable is displayed in a graph. Thus for instance you can associate a more comprehensible string such as "T-bill rate" with a cryptically named variable such as "tb3".
* (For time-series data) set the compaction method for the variable. This method will be used if you decide to reduce the frequency of the dataset, or if you update the variable by importing from a database where the variable is at a higher frequency than in the working dataset.
* Mark a variable as discrete (for series with integer values only). This affects the way the variable is handled when you ask for a frequency plot.
Menu path: /Variable/Edit attributes
Other access: Main window pop-up menu
Script command: <@ref="setinfo">
# setmiss Dataset "Missing value code"
Set a numerical value that will be interpreted as "missing" or "not applicable", either for a particular data series (under the Variable menu) or globally for the entire data set (under the Sample menu).
Gretl has its own internal coding for missing values, but sometimes imported data may employ a different code. For example, if a particular series is coded such that a value of -1 indicates "not applicable", you can select "Set missing value code" under the Variable menu and type in the value "-1" (without the quotes). Gretl will then read the -1s as missing observations.
Menu path: /Data/Set missing value code
Script command: <@ref="setmiss">
# spearman Statistics "Spearmans's rank correlation"
Prints Spearman's rank correlation coefficient for a specified pair of variables. The variables do not have to be ranked manually in advance; the function takes care of this.
The automatic ranking is from largest to smallest (i.e. the largest data value gets rank 1). If you need to invert this ranking, create a new variable which is the negative of the original. For example:
<code>
genr altx = -x
spearman altx y
</code>
Menu path: /Model/Robust estimation/Rank correlation
Script command: <@ref="spearman">
# store Dataset "Save data"
Saves either the entire dataset or, if a <@var="varlist"> is supplied, a specified subset of the series in the current dataset, to the file given by <@var="filename">.
By default the data are saved in "native" gretl format, but the option flags permit saving in several alternative formats. CSV (Comma-Separated Values) data may be read into spreadsheet programs, and can also be manipulated using a text editor. The formats of Octave, R and PcGive are designed for use with the respective programs. Gzip compression may be useful for large datasets. See <@pdf="the Gretl User's Guide"> for details on the various formats.
The option flags <@opt="--omit-obs"> and <@opt="--no-header"> are applicable only when saving data in CSV format. By default, if the data are time series or panel, or if the dataset includes specific observation markers, the CSV file includes a first column identifying the observations (e.g. by date). If the <@opt="--omit-obs"> flag is given this column is omitted. The <@opt="--no-header"> flag suppresses the usual printing of the names of the variables at the top of the columns.
The option of saving in gretl database format is intended to help with the construction of large sets of series, possibly having mixed frequencies and ranges of observations. At present this option is available only for annual, quarterly or monthly time-series data. If you save to a file that already exists, the default action is to append the newly saved series to the existing content of the database. In this context it is an error if one or more of the variables to be saved has the same name as a variable that is already present in the database. The <@opt="--overwrite"> flag has the effect that, if there are variable names in common, the newly saved variable replaces the variable of the same name in the original dataset.
The <@opt="--comment"> option is available when saving data as a database or in CSV format. The required parameter is a double-quoted one-line string, attached to the option flag with an equals sign. The string is inserted as a comment into the database index file or at the top of the CSV output.
Menu path: /File/Save data; /File/Export data
Script command: <@ref="store">
# system Estimation "Systems of equations"
In this window you can define a system of equations and choose an estimator for the system. Four sorts of statement may be given here, as follows:
<indent>
• <@ref="equation">: specify an equation within the system. At least two such statements must be provided.
</indent>
<indent>
• <@lit="instr">: for a system to be estimated via Three-Stage Least Squares, a list of instruments (by variable name or number). Alternatively, you can put this information into the <@lit="equation"> line using the same syntax as in the <@ref="tsls"> command.
</indent>
<indent>
• <@lit="endog">: for a system of simultaneous equations, a list of endogenous variables. This is primarily intended for use with FIML estimation, but with Three-Stage Least Squares this approach may be used instead of giving an <@lit="instr"> list; then all the variables not identified as endogenous will be used as instruments.
</indent>
<indent>
• <@lit="identity">: for use with FIML, an identity linking two or more of the variables in the system. This sort of statement is ignored when an estimator other than FIML is used.
</indent>
Menu path: /Model/Simultaneous equations
Script command: <@ref="system">
# tobit Estimation "Tobit model"
Estimates a Tobit model, which may be appropriate when the dependent variable is "censored". For example, positive and zero values of purchases of durable goods on the part of individual households are observed, and no negative values, yet decisions on such purchases may be thought of as outcomes of an underlying, unobserved disposition to purchase that may be negative in some cases.
By default it is assumed that the dependent variable is censored at zero on the left and is uncensored on the right. However you can use the entry boxes marked "left bound" and "right bound" to specify a different pattern of censoring. Enter either a numerical value or <@lit="NA"> for no censoring.
The Tobit model is a special case of interval regression, which is supported via the <@ref="intreg"> command.
Menu path: /Model/Nonlinear models/Tobit
Script command: <@ref="tobit">
# transpos Dataset "Transpose data"
Transposes the current data set. That is, each observation (row) in the current data set will be treated as a variable (column), and each variable as an observation. This command may be useful if data have been read from some external source in which the rows of the data table represent variables.
See also <@ref="dataset">.
Menu path: /Data/Transpose data
# tsls Estimation "Instrumental variables regression"
This command requires the selection of two lists of variables: the independent variables to appear in the given model and a set of instruments. Note that any exogenous regressors should appear in both lists.
Output for two-stage least squares estimates includes the Hausman test and, if the model is over-identified, the Sargan over-identification test. In the Hausman test, the null hypothesis is that OLS estimates are consistent, or in other words estimation by means of instrumental variables is not really required. A model of this sort is over-identified if there are more instruments than are strictly required. The Sargan test is based on an auxiliary regression of the residuals from the two-stage least squares model on the full list of instruments. The null hypothesis is that all the instruments are valid, and suspicion is thrown on this hypothesis if the auxiliary regression has a significant degree of explanatory power. For a good explanation of both tests see chapter 8 of Davidson and MacKinnon (2004).
For both TSLS and LIML estimation, an additional test result is shown provided that the model is estimated under the assumption of i.i.d. errors (that is, the <@opt="--robust"> option is not selected). This is a test for weakness of the instruments. Weak instruments can lead to serious problems in IV regression: biased estimates and/or incorrect size of hypothesis tests based on the covariance matrix, with rejection rates well in excess of the nominal significance level (Stock, Wright and Yogo, 2002). The test statistic is the first-stage <@itl="F">-test if the model contains just one endogenous regressor, otherwise it is the smallest eigenvalue of the matrix counterpart of the first stage <@itl="F">. Critical values based on the Monte Carlo analysis of Stock and Yogo (2003) are shown when available.
The R-squared value printed for models estimated via two-stage least squares is the square of the correlation between the dependent variable and the fitted values.
Menu path: /Model/Other linear models/Two-Stage Least Squares
Script command: <@ref="tsls">
# var Estimation "Vector Autoregression"
This command requires specification of:
<indent>
• - the lag order, that is, the number of lags of each variable that should be included in the system;
</indent>
<indent>
• - any exogenous variables (but note that a constant is included automatically unless you specify otherwise, a trend can be added using the trend checkbox, and seasonal dummy variables can be added using the seasonals checkbox); and
</indent>
<indent>
• - a list of endogenous variables, lags of which will be included on the right-hand side of each equation (note: do not include lagged variables in this list -- they will be added automatically).
</indent>
A separate regression will be run for each variable in the system. Output for each equation includes F-tests for zero restrictions on all lags of each of the variables and an F-test for the maximum lag, along with (optionally) forecast variance decompositions and impulse response functions.
Forecast variance decompositions and impulse responses are based on the Cholesky decomposition of the contemporaneous covariance matrix, and in this context the order in which the (stochastic) variables are given matters. The first variable in the list is assumed to be "most exogenous" within-period. The horizon for variance decompositions and impulse responses can be set using the <@ref="set"> command.
Menu path: /Model/Time series/Vector autoregression
Script command: <@ref="var">
# VAR-lagselect Tests "VAR lag-length selection"
In this dialog box you specify a VAR as usual, but use the lag order spin button to set the maximum number of lags to test.
Output will consist of a table showing the values of the Akaike (AIC), Schwartz (BIC) and Hannan–Quinn (HQC) information criteria computed from VARs of order 1 to the chosen maximum. This is intended to help with the selection of the optimal lag order.
# VAR-omit Tests "Test exogenous variables in VAR"
Use this dialog box to specify a subset of exogenous variables in a VAR. These variables will be omitted from the original VAR, and the system re-estimated.
A Likelihood Ratio test is reported, where the null hypothesis is that the true parameter values are zero, in all equations of the VAR, for the omitted variables. The test is based on the difference between the log-determinant of the variance matrix for the unrestricted system, and that for the restricted system with the selected variables omitted.
# vartest Tests "Difference of variances"
Calculates the <@itl="F"> statistic for the null hypothesis that the population variances are equal for the two selected variables, and shows its p-value.
Menu path: /Model/Bivariate tests/Difference of variances
Script command: <@ref="vartest">
# vecm Estimation "Vector Error Correction Model"
A VECM is a form of vector autoregression or VAR (see <@ref="var">), applicable where the variables in the model are individually integrated of order 1 (that is, are random walks, with or without drift), but exhibit cointegration. This command is closely related to the Johansen test for cointegration (see <@ref="coint2">).
The lag order selected in the VECM dialog box is that of the VAR system. The number of lags in the VECM itself (where the dependent variable is given as a first difference) is one less than this number.
The "cointegration rank" represents the number of cointegrating vectors. This must be greater than zero and less than or equal to (generally, less than) the number of endogenous variables selected.
In the "Endogenous variables" box you select the vector of endogenous variables, in levels. The inclusion of deterministic terms in the model is controlled by the option buttons. The default is to include an "unrestricted constant", which allows for the presence of a non-zero intercept in the cointegrating relations as well as a trend in the levels of the endogenous variables. In the literature stemming from the work of Johansen (see for example his 1995 book) this is often referred to as "case 3". The other four options produce cases 1, 2, 4 and 5 respectively. The meaning of these cases and the criteria for selecting a case are explained in <@pdf="the Gretl User's Guide">.
In the "Exogenous variables" box you may add specific exogenous variables. By default these enter the model in unrestricted form (indicated by a <@lit="U"> next to the name of the variable). If you want a certain exogenous variable to be restricted to the cointegrating space, right-click on it and select "Restricted" from the pop-up menu. The symbol next to the variable will change to R.
If the data are quarterly or monthly, a check box is shown that allows you to include a set of centered seasonal dummy variables. In all cases, an additional check box ("Show details") allows for the printing of the auxiliary regressions that form the starting point of the Johansen maximum likelihood estimation procedure.
Menu path: /Model/Time series/VECM
Script command: <@ref="vecm">
# wls Estimation "Weighted Least Squares"
Let "wtvar" denote the variable selected in the "Weight variable" box. An OLS regression is run, where the dependent variable is the product of the positive square root of wtvar and the selected dependent variable, and the independent variables are also multiplied by the square root of wtvar. Statistics such as <@itl="R">-squared are based on the weighted data. If wtvar is a dummy variable, weighted least squares estimation is equivalent to eliminating all observations with value zero for wtvar.
Menu path: /Model/Other linear models/Weighted Least Squares
Script command: <@ref="wls">
# working-dir Utilities "Working directory"
The "working directory" is where gretl looks by default when reading or writing data files or scripts via the file Open and Save dialogs.
In addition the working directory is the default location for
<indent>
• reading files via the script commands <@lit="append">, <@lit="open">, <@lit="run"> and <@lit="include">; and
</indent>
<indent>
• writing files via the commands <@lit="eqnprint">, <@lit="tabprint">, <@lit="gnuplot">, <@lit="outfile"> and <@lit="store">.
</indent>
The option of having gretl use the current directory (as determined via the shell) at start-up may be useful to people who are in the habit of launching gretl from a command prompt rather than a menu or icon.
This dialog also allows you to set the behavior of the GUI file selector: when you open or save a file in a given folder, should the selector remember and return to the same folder on the next invocation? Or should the selector always visit the chosen working directory?
Menu path: /File/Working directory
# x12a Utilities "X-12-ARIMA"
There are two procedural options here, controlled by the lower set of radio-buttons.
If you select "Execute X-12-ARIMA directly" then gretl writes a command file for X-12-ARIMA and calls the x12a program to execute the commands. In this case you have the option of producing a graph and/or saving selected output series to the gretl dataset.
If you select "Make X-12-ARIMA command file" gretl writes a command file for X-12-ARIMA, as above, but then opens this file in an editor window. In that window you are able to make changes and to save the file under a chosen name. You are also able to send the file for execution by x12a (by clicking the "Run" button on the editor window toolbar) and view the output. But in this case you do not have the option of saving data as gretl series or producing a gretl graph.
# xcorrgm Statistics "Cross-correlogram"
Prints and graphs the cross-correlogram for variables <@var="var1"> and <@var="var2">, which may be specified by name or number. The values are the sample correlation coefficients between the current value of <@var="var1"> and successive leads and lags of <@var="var2">.
If an <@var="order"> value is specified the length of the cross-correlogram is limited to at most that number of leads and lags, otherwise the length is determined automatically, as a function of the frequency of the data and the number of observations.
Menu path: /View/Cross-correlogram
Other access: Main window pop-up menu (multiple selection)
Script command: <@ref="xcorrgm">
# xtab Statistics "Cross-tabulate variables"
Displays a contingency table or cross-tabulation for each combination of the selected variables. Note that all the variables must be discrete.
By default, frequency count values are shown in the cells and on the margins of the table. However, you can choose to display either row or column percentages instead.
By default, cells with a zero count are shown as empty, but you can choose to show zero values explicitly.
Pearson's chi-square test for independence is displayed if the expected frequency under independence is at least 1.0e-7 for all cells. A common rule of thumb for the validity of this statistic is that at least 80 percent of cells should have expected frequencies of 5 or greater; if this criterion is not met a warning is printed.
If the contingency table is 2 by 2, Fisher's Exact Test for independence is computed. Note that this test is based on the assumption that the row and column totals are fixed, which may or may not be approriate depending on how the data were generated. The left p-value should be used when the alternative to independence is negative association (values tend to cluster in the lower left and upper right cells); the right p-value should be used if the alternative is positive association. The two-tailed p-value for this test is calculated by method (b) in section 2.1 of Agresti (1992): it is the sum of the probabilities of all possible tables having the given row and column totals and having a probability less than or equal to that of the observed table.
Script command: <@ref="xtab">
|