This file is indexed.

/usr/share/gretl/gretlgui.hlp is in gretl-common 1.9.6-1build1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

   1
   2
   3
   4
   5
   6
   7
   8
   9
  10
  11
  12
  13
  14
  15
  16
  17
  18
  19
  20
  21
  22
  23
  24
  25
  26
  27
  28
  29
  30
  31
  32
  33
  34
  35
  36
  37
  38
  39
  40
  41
  42
  43
  44
  45
  46
  47
  48
  49
  50
  51
  52
  53
  54
  55
  56
  57
  58
  59
  60
  61
  62
  63
  64
  65
  66
  67
  68
  69
  70
  71
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
# add Tests "Add variables to model"

The selected variables are added to the previous model and the new model estimated. A test statistic for the joint significance of the added variables is printed, along with its p-value. 

Menu path: Model window, /Tests/Add variables

Script command: <@ref="add">

# addline Graphs "Add line to graph"

This dialog box allows you to add a line, defined via a formula, to a graph. The formula must be an expression acceptable to gnuplot. Use <@lit="x"> to denote the value of the variable on the x-axis. Please note that gnuplot uses <@lit="**"> for exponentiation (raising to a power), and that the decimal character must be given as ".". Examples: 

<code>          
	10+0.35*x
	100+5.3*x-0.12*x**2
	sin(x)
	exp(sqrt(pi*x))
</code>

# adf Tests "Augmented Dickey-Fuller test"

This command needs an integer lag order; if the order is zero a standard (not augmented) Dickey–Fuller test is run. Computes a set of Dickey–Fuller tests on the selected variable, the null hypothesis being that the variable has a unit root. (But if the differencing option is selected, the first difference of the variable is taken prior to testing, and the discussion below must be taken as referring to the transformed variable.) 

In all cases the dependent variable is the first difference of the specified variable, <@itl="y">, and the key independent variable is the first lag of <@itl="y">. The model is constructed so that the coefficient on lagged <@itl="y"> equals the root in question minus 1. For example, the model with a constant may be written as 

  <@fig="adf1">

Under the null hypothesis of a unit root the coefficient on lagged <@itl="y"> equals zero; under the alternative that <@itl="y"> is stationary this coefficient is negative. 

If the lag order, <@itl="k">, is greater than 0, then <@itl="k"> lags of the dependent variable are included on the right-hand side of the test regressions, subject to the following qualification. If the box labeled "test down from maximum lag" is checked, the selected lag order is taken as a maximum and the actual lag order used is obtained by testing down, using this algorithm: 

<indent>
1. Estimate the Dickey–Fuller regression with <@itl="k"> lags of the dependent variable. 
</indent>

<indent>
2. Is the last lag significant? If so, execute the test with lag order <@itl="k">. Otherwise, let <@itl="k"> = <@itl="k"> – 1; if <@itl="k"> equals 0, execute the test with lag order 0, else go to step 1. 
</indent>

In the context of step 2 above, "significant" means that the <@itl="t">-statistic for the last lag has an asymptotic two-sided <@itl="p">-value, against the normal distribution, of 0.10 or less. 

<@itl="P">-values for the Dickey–Fuller tests are based on MacKinnon (1996). The relevant code is included by kind permission of the author. In the case of the test with linear trend using GLS these <@itl="P">-values are not applicable; critical values from Table 1 in Elliott, Rothenberg and Stock (1996) are shown instead. 

Menu path: /Variable/Unit root tests/Augmented Dickey-Fuller test

Script command: <@ref="adf">

# anova Statistics "ANOVA"

Analysis of Variance: <@var="response"> is a series measuring some effect of interest and <@var="treatment"> must be a discrete variable that codes for two or more types of treatment (or non-treatment). For two-way ANOVA, the <@var="block"> variable (which should also be discrete) codes for the values of some control variable. 

The null hypothesis for the <@itl="F">-test is that the mean response is invariant with respect to the treatment type, or in words that the treatment has no effect. Strictly speaking, the test is valid only if the variance of the response is the same for all treatment types. 

Note that the results shown by this command are in fact a subset of the information given by the following procedure, which is easily implemented in gretl. Create a set of dummy variables coding for all but one of the treatment types. For two-way ANOVA, in addition create a set of dummies coding for all but one of the "blocks". Then regress <@var="response"> on a constant and the dummies using <@ref="ols">. For a one-way design the ANOVA table is printed via the <@opt="--anova"> option to <@lit="ols">. In the two-way case the relevant <@itl="F">-test is found by using the <@ref="omit"> command. For example (assuming <@lit="y"> is the response, <@lit="xt"> codes for the treatment, and <@lit="xb"> codes for blocks): 

<code>          
	# one-way
	list dxt = dummify(xt)
	ols y 0 dxt --anova
	# two-way
	list dxb = dummify(xb)
	ols y 0 dxt dxb
	# test joint significance of dxt
	omit dxt --quiet
</code>

Menu path: /Model/Other linear models/ANOVA

Script command: <@ref="anova">

# ar Estimation "Autoregressive estimation"

Computes parameter estimates using the generalized Cochrane–Orcutt iterative procedure; see Section 9.5 of Ramanathan (2002). Iteration is terminated when successive error sums of squares do not differ by more than 0.005 percent or after 20 iterations. 

The "list of AR lags" specifies the structure of the error process. For example, the entry "1 3 4" corresponds to the process: 

  <@fig="arlags">

Menu path: /Model/Time series/Autoregressive estimation

Script command: <@ref="ar">

# ar1 Estimation "AR(1) estimation"

Computes feasible GLS estimates for a model in which the error term is assumed to follow a first-order autoregressive process. 

The default method is the Cochrane–Orcutt iterative procedure; see for example section 9.4 of Ramanathan (2002). Iteration is terminated when successive estimates of the autocorrelation coefficient do not differ by more than 0.001 or after 20 iterations. 

If the <@opt="--hilu"> option is given, the Hildreth–Lu search procedure is used. The results are then fine-tuned using the Cochrane–Orcutt method, unless the <@opt="--no-corc"> flag is specified. (The latter option is ignored if <@opt="--hilu"> is not specified.) 

If the <@opt="--pwe"> option is given, the Prais–Winsten estimator is used. This involves an an iteration similar to Cochrane–Orcutt; the difference is that while Cochrane–Orcutt discards the first observation, Prais–Winsten makes use of it. See, for example, Chapter 13 of Greene's <@itl="Econometric Analysis"> (2000) for details. 

Menu path: /Model/Time series/Cochrane-Orcutt
Menu path: /Model/Time series/Hildreth-Lu
Menu path: /Model/Time series/Prais-Winsten

Script command: <@ref="ar1">

# arch Estimation "ARCH model"

This command is retained at present for backward compatibility, but you are better off using the maximum likelihood estimator offered by the <@ref="garch"> command; for a plain ARCH model, set the first GARCH parameter to 0. 

Estimates the given model specification allowing for ARCH (Autoregressive Conditional Heteroskedasticity). The model is first estimated via OLS, then an auxiliary regression is run, in which the squared residual from the first stage is regressed on its own lagged values. The final step is weighted least squares estimation, using as weights the reciprocals of the fitted error variances from the auxiliary regression. (If the predicted variance of any observation in the auxiliary regression is not positive, then the corresponding squared residual is used instead). 

The <@lit="alpha"> values displayed below the coefficients are the estimated parameters of the ARCH process from the auxiliary regression. 

See also <@ref="garch"> and <@ref="modtest"> (the <@opt="--arch"> option). 

Menu path: /Model/Time series/ARCH

Script command: <@ref="arch">

# arima Estimation "ARMA model"

Estimates an ARMA model, with or without exogenous regressors. If the order of differencing is greater than zero the model becomes ARIMA. If the data have a frequency greater than 1 the option of including a seasonal component is presented. 

If you wish to include only specified AR or MA lags in the model (as opposed to all lags up to a given order) check the box to the right of the spinner and type a list of lags, separated by spaces, into the entry field. Alternatively, if you have defined a matrix containing the desired set of lags you can type its name into the entry field. 

The default is to use the "native" gretl ARMA functionality, with estimation by exact ML using the Kalman filter; estimation via conditional ML is available as an option. (If X-12-ARIMA is installed you have the option of using it instead of native code.) For details regarding these options, please see <@pdf="the Gretl User's Guide">. 

The AIC value given in connection with ARIMA models is calculated according to the definition used in X-12-ARIMA, namely 

  <@fig="aic">

where <@fig="ell"> is the log-likelihood and <@itl="k"> is the total number of parameters estimated. Note that X-12-ARIMA does not produce information criteria such as AIC when estimation is by conditional ML. 

The AR and MA roots shown in connection with ARMA estimation are based on the following representation of an ARMA(p, q) process: 

<mono>          
	(1 - a_1*L - a_2*L^2 - ... - a_p*L^p)Y =
          c + (1 + b_1*L + b_2*L^2 + ... + b_q*L^q) e_t
</mono>

The AR roots are therefore the solutions to 

<mono>          
         1 - a_1*z - a_2*z^2 - ... - a_p*L^p = 0
</mono>

and stability requires that these roots lie outside the unit circle. 

The "frequency" figure printed in connection with AR and MA roots is the λ value that solves <@itl="z"> = <@itl="r"> * exp(i*2*π*λ) where <@itl="z"> is the root in question and <@itl="r"> is its modulus. 

Menu path: /Model/Time series/ARIMA
Other access: Main window pop-up menu (single selection)

Script command: <@ref="arima">

# bfgs-config Estimation "BFGS options"

This dialog allows you to control some aspects of the operation of the BFGS maximizer. In case the maximizer fails to converge it may help matters, in some cases, to increase the number of iterations allowed and/or to increase (make more permissive) the convergence tolerance. However, you should be suspicious of results obtained using a high tolerance and should consider the possibility that the model you are estimating is misspecified. 

For most applications we recommend use of the regular BFGS maximizer but for some problems the "limited memory" variant of the algorithm, L-BFGS-B, may produce more rapid convergence. When L-BFGS-B is selected, you have the option of setting the number of corrections used in the limited memory matrix (between 3 and 20, with a default of 8). 

# bootstrap Tests "Bootstrap options"

In this dialog you get to choose: 

<indent>
• The variable/coefficient to examine. (You can test only one coefficient at a time using this method.) 
</indent>

<indent>
• The sort of analysis to perform. The default (95 percent) confidence interval is based directly on the quantiles of the bootstrap coefficient estimates. The "studentized" version is as per Davidson and MacKinnon's <@itl="Economic Theory and Methods"> (ETM), chapter 5: at each bootstrap replication a <@itl="t">-ratio is formed as (a) the difference between the current and the baseline coefficient estimate, divided by (b) the baseline estimated standard error. Then the confidence interval is formed based on the quantiles of this t-ratio, as explained in ETM. The P-value option is based on the distribution of the bootstrap <@itl="t">-ratio: it is the proportion of the replications where the absolute value of this statistic exceeds the absolute value of the baseline <@itl="t">-ratio. 
</indent>

<indent>
• Resampled residuals versus simulate normal errors. In the first case the original residuals (rescaled as suggested in ETM) are resampled with replacement. In the second case pseudo-random normal values are generated with the original residual variance. 
</indent>

<indent>
• The number of replications to perform. Note that when you're constructing a 95 percent confidence interval it is desirable that 0.05(<@itl="B"> + 1)/2 is an integer (where <@itl="B"> is the number of replications). So gretl may adjust the chosen number of replications to ensure this is the case. 
</indent>

<indent>
• Whether or not to produce a graph of the bootstrap distribution. This option employs gretl's kernel density estimation facility. 
</indent>

# boxplot Graphs "Boxplots"

These plots display the distribution of a variable. The central box encloses the middle 50 percent of the data, i.e. it is bounded by the first and third quartiles. The "whiskers" extend to the minimum and maximum values. A line is drawn across the box at the median. A "+" sign is used to indicate the mean. If the option of showing a confidence interval for the median is selected, this is computed via the bootstrap method and shown in the form of dashed horizontal lines above and/or below the median. 

The "factorized" option allows you to examine the distribution of a chosen variable conditional on the value of some discrete factor. For example, if a data set contains wages and a gender dummy variable you can select the wage variable as the target and gender as the factor, to see side-by-side boxplots of male and female wages. 

Menu path: /View/Graph specified vars/Boxplots

Script command: <@ref="boxplot">

# bwfilter Transformations "The Butterworth filter"

The Butterworth filter is an appromixation to an ideal square-wave filter which allows frequencies over a certain range to pass at full strength while stopping all others. 

Higher values of the order parameter, <@itl="n">, produce a closer approximation to the ideal filter, in principle, but at the possible cost of numerical instability. The "cutoff" value sets the boundary between the pass band and the stop band. It is expressed in degrees, and must be greater than 0 and less than 180° (or π radians, corresponding to the highest frequency in the data). Smaller values of the cutoff produce a smoother trend. 

Inspecting the periodogram of the target series is a useful preliminary when you wish to apply this filter. See <@pdf="the Gretl User's Guide"> for details. 

Menu path: /Variable/Filter/Butterworth

# chow Tests "Chow test"

This command needs either an observation number (or date, with dated data), or the name of a dummy variable. 

Must follow an OLS regression. If an observation number or date is given, provides a test for the null hypothesis of no structural break at the given split point. The procedure is to create a dummy variable which equals 1 from the split point specified by <@var="obs"> to the end of the sample, 0 otherwise, and also interaction terms between this dummy and the original regressors. If a dummy variable is given, tests the null hypothesis of structural homogeneity with respect to that dummy. Again, interaction terms are added. In either case an augmented regression is run including the additional terms. 

By default an <@itl="F"> statistic is calculated, taking the augmented regression as the unrestricted model and the original as the restricted. But if the original model used a robust estimator for the covariance matrix, the test statistic is a Wald chi-square value based on a robust estimator of the covariance matrix for the augmented regression. 

Menu path: Model window, /Tests/Chow test

Script command: <@ref="chow">

# coeffsum Tests "Sum of coefficients"

This command needs a list of variables, selected from the set of independent variables in a given model. 

Calculates the sum of the coefficients on the variables in the specified list. Prints this sum along with its standard error and the p-value for the null hypothesis that the sum is zero. 

Note the difference between this and <@ref="omit">, which tests the null hypothesis that the coefficients on a specified subset of independent variables are <@itl="all"> equal to zero. 

Menu path: Model window, /Tests/Sum of coefficients

Script command: <@ref="coeffsum">

# coint Tests "Engle-Granger cointegration test"

The Engle–Granger cointegration test. The default procedure is: (1) carry out Dickey–Fuller tests on the null hypothesis that each of the variables listed has a unit root; (2) estimate the cointegrating regression; and (3) run a DF test on the residuals from the cointegrating regression. If the box labeled "skip initial DF tests" is checked, however, the first of these steps is omitted. 

If the lag order, <@itl="k">, is greater than 0, then <@itl="k"> lags of the dependent variable are included on the right-hand side of each test regression, unless the box labeled "test down from maximum lag" is checked: in that case the selected lag order is taken as a maximum and the actual lag order used is obtained by testing down. See the <@ref="adf"> command for details of this procedure. 

By default, the cointegrating regression contains a constant. If you wish to suppress the constant, or to add a linear or quadratic trend, select the appropriate option from the set of radio buttons in the Cointegration dialog box. 

<@itl="P-">values for this test are based on MacKinnon (1996). The relevant code is included by kind permission of the author. 

Menu path: /Model/Time series/Cointegration test/Engle-Granger

Script command: <@ref="coint">

# coint2 Tests "Johansen cointegration test"

Carries out the Johansen test for cointegration among the listed variables for the selected lag order. For details of this test see, for example, Hamilton, <@itl="Time Series Analysis"> (1994), Chapter 20. P-values are computed via Doornik's (1998) gamma approximation. 

The inclusion of deterministic terms in the model is controlled by the drop-down option list. The default is to include an "unrestricted constant", which allows for the presence of a non-zero intercept in the cointegrating relations as well as a trend in the levels of the endogenous variables. In the literature stemming from the work of Johansen (see for example his 1995 book) this is often referred to as "case 3". The other four options produce cases 1, 2, 4 and 5 respectively. The meaning of these cases and the criteria for selecting a case are explained in <@pdf="the Gretl User's Guide">. 

You may control for exogenous variables by adding them to the lower list box. By default these enter the model in unrestricted form (indicated by a <@lit="U"> next to the name of the variable). If you want a certain exogenous variable to be restricted to the cointegrating space, right-click on it and select "Restricted" from the pop-up menu. The symbol next to the variable will change to R. 

If the data are quarterly or monthly, a check box is shown that allows you to include a set of centered seasonal dummy variables. In all cases, an additional check box ("Show details") allows for the printing of the auxiliary regressions that form the starting point of the Johansen maximum likelihood estimation procedure. 

The following table is offered as a guide to the interpretation of the results shown for the test, for the 3-variable case. <@lit="H0"> denotes the null hypothesis, <@lit="H1"> the alternative hypothesis, and <@lit="c"> the number of cointegrating relations. 

<mono>          
         Rank     Trace test         Lmax test
                  H0     H1          H0     H1
         ---------------------------------------
          0      c = 0  c = 3       c = 0  c = 1
          1      c = 1  c = 3       c = 1  c = 2
          2      c = 2  c = 3       c = 2  c = 3
         ---------------------------------------
</mono>

See also the <@ref="vecm"> command. 

Menu path: /Model/Time series/Cointegration test/Johansen

Script command: <@ref="coint2">

# compact Dataset "Compact data"

When you add to a dataset a series that is of higher frequency, it is necessary to "compact" the new series. For instance, a monthly series will have to be compacted to fit into a quarterly dataset. 

In addition, you may sometimes want to compact an entire dataset to a lower frequency (perhaps, prior to adding a lower-frequency variable to the dataset). 

Gretl offers four options for compacting: 

<indent>
• Averaging: The value written to the dataset will be the arithmetic mean of the relevant series values. For instance the value written for the first quarter of 1990 will be the average of the values for January, February and March of 1990. 
</indent>

<indent>
• Summing: The value written to the dataset will be the sum of the relevant higher-frequency values. For example, the first-quarter value will be the sum of the January, February and March values. 
</indent>

<indent>
• End-of-period values: The value written to the dataset is the last relevant value from the higher-frequency data. For example, the first quarter of 1990 will get the March 1990 value. 
</indent>

<indent>
• Start-of-period values: The value written to the dataset is the first relevant value from the higher-frequency data. For example, the first quarter of 1990 will get the January 1990 value. 
</indent>

In the case of compacting an entire dataset, the choice you make in this dialog box sets the default method. But if you have set a compaction method for an individual variable (menu item "Variable/Edit attributes") that method is used rather than the default. If the compaction method is already set for all variables, the choice of a default compaction method is not presented. 

# controlled Graphs "Scatterplot with control"

This command requires the selection of three variables, one for the X axis, one for the Y axis, and one for which you wish to control (call it Z). The plot shows adjusted Y against adjusted X, where the adjusted version of the variable is the residual from an OLS regression on Z. 

Example: You have data on wages, experience and education level for a sample of people. You wish to plot wages against education, controlling for experience. In that case you select wages for the Y axis, education for the X axis, and experience as the control. The plot shows wages against education, with both variables "purged" of the effect of experience. 

# corr Statistics "Correlation coefficients"

Prints the pairwise correlation coefficients (Pearson's product-moment correlation) for the selected variables. The default behavior is to use all available observations for computing each pairwise coefficient, but if the option box is checked the sample is limited (if necessary) so that the same set of observations is used for all the coefficients. This option has an effect only if there are differing numbers of missing values for the variables used. 

Menu path: /View/Correlation matrix
Other access: Main window pop-up menu (multiple selection)

Script command: <@ref="corr">

# corrgm Statistics "Correlogram"

Prints the values of the autocorrelation function for <@var="series">, which may be specified by name or number. The values are defined as <@fig="autocorr"> where <@itl="u"><@sub="t"> is the <@itl="t">th observation of the variable <@itl="u"> and <@itl="s"> is the number of lags. 

The partial autocorrelations (calculated using the Durbin–Levinson algorithm) are also shown: these are net of the effects of intervening lags. In addition the Ljung–Box <@itl="Q"> statistic is printed. This may be used to test the null hypothesis that the series is "white noise"; it is asymptotically distributed as chi-square with degrees of freedom equal to the number of lags used. Unless the <@opt="--quiet"> option is given, a plot of the correlogram is printed. 

If a <@var="order"> value is specified the length of the correlogram is limited to at most that number of lags, otherwise the length is determined automatically, as a function of the frequency of the data and the number of observations. 

Menu path: /Variable/Correlogram
Other access: Main window pop-up menu (single selection)

Script command: <@ref="corrgm">

# count-model Estimation "Models for count data"

The dependent variable is taken to represent a count of the occurrence of events of some sort, and must have only non-negative integer values. By default the Poisson distribution is used, but the drop-down selector gives the options of using the Negative Binomial distribution. (The variant NegBin 2 is commonly used in econometrics, but the lesser used NegBin 1 is also available.) 

Optionally, you may add an "offset" variable to the specification. This is a scale variable, the log of which is added to the linear regression function (implicitly, with a coefficient of 1.0). This makes sense if you expect the number of occurrences of the event in question to be proportional, other things equal, to some known factor. For example, the number of traffic accidents might be supposed to be proportional to traffic volume, other things equal, and in that case traffic volume could be specified as an "offset" in a model of the accident rate. The offset variable must be strictly positive. 

By default, standard errors are computed using a numerical approximation to the Hessian at convergence. But if the "Robust standard errors" box is checked then QML standard errors are calculated, using a "sandwich" of the inverse of the Hessian and the outer product of the gradient. 

# curve Graphs "Plot a curve"

This dialog box allows you to create a gnuplot graph by specifying a formula. This must be an expression acceptable to gnuplot. Use <@lit="x"> to denote the value of the variable on the x-axis. Please note that gnuplot uses <@lit="**"> for exponentiation (raising to a power), and that the decimal character must be given as ".". Examples: 

<code>          
	10+0.35*x
	100+5.3*x-0.12*x**2
	sin(x)
	exp(sqrt(pi*x))
</code>

To put an additional line onto a graph created in this way, click on the graph and select "Edit", select the "Lines" tab in the graph editing dialog, and use the "Add line" button. 

# cusum Tests "CUSUM test"

Must follow the estimation of a model via OLS. Performs the CUSUM test—or if the <@opt="--squares"> option is given, the CUSUMSQ test—for parameter stability. A series of one-step ahead forecast errors is obtained by running a series of regressions: the first regression uses the first <@itl="k"> observations and is used to generate a prediction of the dependent variable at observation <@itl="k"> + 1; the second uses the first <@itl="k"> + 1 observations and generates a prediction for observation <@itl="k"> + 2, and so on (where <@itl="k"> is the number of parameters in the original model). 

The cumulated sum of the scaled forecast errors, or the squares of these errors, is printed and graphed. The null hypothesis of parameter stability is rejected at the 5 percent significance level if the cumulated sum strays outside of the 95 percent confidence band. 

In the case of the CUSUM test, the Harvey–Collier <@itl="t">-statistic for testing the null hypothesis of parameter stability is also printed. See Greene's <@itl="Econometric Analysis"> for details. For the CUSUMSQ test, the 95 percent confidence band is calculated using the algorithm given in Edgerton and Wells (1994). 

Menu path: Model window, /Tests/CUSUM(SQ)

Script command: <@ref="cusum">

# datasort Dataset "Sorting data"

The selected variable is used as a sort key for the entire data set. The observations on all variables are re-ordered by increasing value of the key variable, or by decreasing value if you select the "Descending" option. 

# density Statistics "Kernel density estimation"

Kernel density estimation proceeds by defining a set of evenly spaced reference points, over a suitable range in relation to the range of the data, and attributing a density to each reference point based on the actual observations in the vicinity. 

The formula used to compute the estimated density at each reference point, <@itl="x">, is 

  <@fig="kernel1">

where <@itl="n"> denotes the number of data points, <@itl="h"> is a "bandwidth" parameter, and <@itl="k">() is the kernel function. The larger the value of the bandwidth parameter, the smoother the estimated density. 

You are given the choice of using a Gaussian kernel (the standard normal density) or the Epanechnikov kernel. By default, the bandwidth is that suggested as a rule of thumb by Silverman (1986), namely 

  <@fig="kernel2">

where <@itl="s"> denotes the standard deviation of the data and IQR denotes the inter-quartile range. You can widen or shrink the bandwidth via the "bandwidth adjustment factor": the actual bandwidth used is obtained by multiplying the Silverman value by the adjustment factor. 

For a good introductory discussion of kernel density estimation see Chapter 15 of Davidson and MacKinnon's <@itl="Econometric Theory and Methods">. 

# dfgls Tests "The ADF-GLS test"

The ADF-GLS test is a variant of the Dickey–Fuller test for a unit root, for the case where the variable to be tested is assumed to have a non-zero mean or to exhibit a linear trend. The difference is that the de-meaning or de-trending of the variable is done using the GLS procedure suggested by Elliott, Rothenberg and Stock (1996). This gives a test of greater power than the standard Dickey–Fuller approach. 

See also the <@ref="adf"> command and the <@opt="--gls"> option. 

Menu path: /Variable/ADF-GLS test

# dialog Estimation "Model dialog box"

To select the dependent variable, highlight a variable in the list on the left and press the "Choose" button pointing to the Dependent variable slot. If you check the "Set as default" box, the selected variable will be pre-selected as dependent when the model dialog is next opened. Short-cut: double-click on a variable on the left to select it as the dependent variable and also set it as the default. 

To select independent variables, highlight them on the left and press the "Add" button (or click the right mouse button). You can highlight several contiguous variables by dragging with the mouse. You can highlight a group of non-contiguous variables by clicking on them with the <@lit="Ctrl"> key pressed. 

# dpanel Estimation "Dynamic panel models"

Carries out estimation of dynamic panel data models (that is, panel models including one or more lags of the dependent variable) using either the GMM-DIF or GMM-SYS method. 

The dependent variable and regressors should be given in levels form; they will be differenced automatically (since this estimator uses differencing to cancel out the individual effects). 

As regards the handling of instruments, please see the documentation for the script version of this command. Currently you cannot specify instruments explicitly in the GUI: all the independent variables are taken to be strictly exogenous. 

By default the results of 1-step estimation are reported (with robust standard errors). You may select 2-step estimation as an option. In both cases tests for autocorrelation of orders 1 and 2 are provided, as well as the Sargan overidentification test and a Wald test for the joint significance of the regressors. Note that in this differenced model first-order autocorrelation is not a threat to the validity of the model, but second-order autocorrelation violates the maintained statistical assumptions. 

For further details and examples, please see <@pdf="the Gretl User's Guide">. 

Menu path: /Model/Panel/Dynamic panel model

Script command: <@ref="dpanel">

# expand Dataset "Expand data"

If you wish to add to a dataset a series that is of lower frequency, it is necessary to "expand" the new series. For instance, a quarterly series will have to be expanded to fit into a monthly dataset. In addition, you may sometimes want to expand an entire dataset to a higher frequency (perhaps, prior to adding a higher-frequency variable to the dataset). 

Expansion of data should be considered an "expert" option; you need to know what you are doing. When combining series of differing original frequencies within one dataset, you should probably consider compacting the higher-frequency data rather than expanding the lower-frequency series. 

That said, gretl offers two options: higher-frequency values can be interpolated using the method of Chow and Lin (1971), or the values of the lower-frequency series can be repeated as many times as required. 

The Chow-Lin method is regression-based, using a constant and quadratic trend and assuming a first-order autoregressive process for the disturbances. Four degrees of freedom are used up by this procedure. As for the repetition of values, suppose we have a quarterly series with the value 35.5 in 1990:1, the first quarter of 1990. On expansion to monthly, the value 35.5 will be assigned to the observations for January, February and March of 1990. The expanded variable is therefore useless for fine-grained time-series analysis, outside of the special case where you know that the variable in question does in fact remain constant over the sub-periods. 

# export Dataset "Export data"

You may export data in Comma-Separated Values (CSV) format: such data may be opened in spreadsheets and many other application programs. If you select this option you will get some further options regarding the specific format of the CSV file. 

You also have the option of exporting data in the form of a "native" gretl datafile, or (if the data are suitable) exporting to a gretl database. See <@url="gretl.sourceforge.net/gretl_data.html"> for an account of gretl databases. 

You may also export data in a format suitable for use with the following programs: 

<indent>
• GNU R (<@url="www.r-project.org">) 
</indent>

<indent>
• GNU octave (<@url="www.gnu.org/software/octave">) 
</indent>

<indent>
• JMulTi (<@url="www.jmulti.de">) 
</indent>

<indent>
• PcGive (<@url="www.pcgive.com">) 
</indent>

If you wish to export data by copying to the clipboard rather than writing to a file on disk, select the series you want to copy in the main window, right-click, and select "Copy to clibboard". (Only CSV format is supported in this context.) 

# factorized Graphs "Factorized plot"

This command requires the selection of three variables, the last of which must be a dummy variable (values 1 or 0). The Y variable is plotted against the X variable, with the data points colored differently depending on the value of the third. 

Example: You have data on wages and educational attainment for a sample of people; you also have a dummy variable with value 1 for men and 0 for women (as in Ramanathan's <@lit="data7-2">). A "factorized plot" of <@lit="WAGE"> against <@lit="EDUC"> using the <@lit="GENDER"> dummy as factor will show the data points for men in one color and those for women in another (with a legend to identify them). 

# fcast Prediction "Generate forecasts"

Must follow an estimation command. Forecasts are generated for the specified range of observations. Depending on the nature of the model, standard errors may also be generated (see below). 

The choice between a static and a dynamic forecast applies only in the case of dynamic models, with an autoregressive error process or including one or more lagged values of the dependent variable as regressors. Static forecasts are one step ahead, based on realized values from the previous period, while dynamic forecasts employ the chain rule of forecasting. For example, if a forecast for <@itl="y"> in 2008 requires as input a value of <@itl="y"> for 2007, a static forecast is impossible without actual data for 2007. A dynamic forecast for 2008 is possible if a prior forecast can be substituted for <@itl="y"> in 2007. 

The default is to give a static forecast for any portion of the forecast range that lies within the sample range over which the model was estimated, and a dynamic forecast (if relevant) out of sample. The <@lit="dynamic"> option requests a dynamic forecast from the earliest possible date, and the <@opt="static"> option requests a static forecast even out of sample. 

<code>          
	fcast --plot=fc.pdf
</code>

will generate a graphic in PDF format. Absolute pathnames are respected, otherwise files are written to the gretl working directory. 

The nature of the forecast standard errors (if available) depends on the nature of the model and the forecast. For static linear models standard errors are computed using the method outlined by Davidson and MacKinnon (2004); they incorporate both uncertainty due to the error process and parameter uncertainty (summarized in the covariance matrix of the parameter estimates). For dynamic models, forecast standard errors are computed only in the case of a dynamic forecast, and they do not incorporate parameter uncertainty. For nonlinear models, forecast standard errors are not presently available. 

Menu path: Model window, /Analysis/Forecasts

Script command: <@ref="fcast">

# fractint Statistics "Fractional integration"

Tests the specified series for fractional integration ("long memory"). The null hypothesis is that the integration order of the series is zero. By default the local Whittle estimator (Robinson, 1995) is used but if the <@opt="--gph"> option is given the GPH test (Geweke and Porter-Hudak, 1983) is performed instead. If the <@opt="--all"> flag is given then the results of both tests are printed. 

For details on this sort of test, see Phillips and Shimotsu (2004). 

If the optional <@var="order"> argument is not given the order for the test(s) is set automatically as the lesser of <@itl="T">/2 and <@itl="T"><@sup="0.6">. 

The results can be retrieved using the accessors <@lit="$test"> and <@lit="$pvalue">. These values are based on the Local Whittle Estimator unless the <@opt="--gph"> option is given. 

Menu path: /Variable/Unit root tests/Fractional integration

Script command: <@ref="fractint">

# freq Statistics "Frequency distribution"

In the frequency plot dialog box you can control the characteristics of the plot in either of two ways. 

First, you may choose the number of bins. In this case the width and placement of the bins are calculated automatically. 

Alternatively, you may specify the lower limit of the left-most bin, and the width of the bins. In this case the number of bins is calculated automatically. 

If you wish to align the bins on round numbers, here is one way to proceed: start by specifying the number of bins you want, and take a look at the plot that is produced. If it's not to your liking, take note of the modification that is required (for example, make the left-most bin start at 100 and impose a bin width of 200). Then make a second pass where you specify the left-hand limit and bin width. 

This dialog also allows you to select a theoretical distribution to be plotted against the data: either the normal or the gamma. If the normal option is selected the Doornik–Hansen test for normality is computed. If the gamma option is selected, gretl computes Locke's nonparametric test for the null hypothesis that the variable follows the gamma distribution. Note that the parameterization of the gamma distribution used in gretl is (shape, scale). 

Menu path: /Variable/Frequency distribution

Script command: <@ref="freq">

# garch Estimation "GARCH model"

Estimates a GARCH model (GARCH = Generalized Autoregressive Conditional Heteroskedasticity), either a univariate model or, if independent variables are selected, including the given exogenous variables. The conditional variance equation is shown below. 

  <@fig="garch_h">

The parameter <@var="p"> therefore represents the Generalized (or "AR") order, while <@var="q"> represents the regular ARCH (or "MA") order. If <@var="p"> is non-zero, <@var="q"> must also be non-zero otherwise the model is unidentified. However, you can estimate a regular ARCH model by setting <@var="q"> to a positive value and <@var="p"> to zero. The sum of <@var="p"> and <@var="q"> must be no greater than 5. 

By default native gretl code is used in estimation of GARCH models, but you also have the option of using the algorithm of Fiorentini, Calzolari and Panattoni (1996). The former uses the BFGS maximizer while the latter uses the information matrix to maximize the likelihood, with fine-tuning via the Hessian. 

Several variant estimates of the coefficient covariance matrix are available with this command. By default, the Hessian is used unless the "Robust standard errors" box is checked, in which case the QML (White) covariance matrix is used. Other possibilities (e.g. the information matrix, or the Bollerslev–Wooldridge estimator) can be specified using the <@ref="set"> command. 

The estimated conditional variance, along with the residuals and various other model statistics, can be accessed and added to the dataset using the "Model data" menu in the window where the model is displayed. If the box marked "Standardize the residuals" is checked, the residuals are divided by the square root of te conditional variance. 

Menu path: /Model/Time series/GARCH

Script command: <@ref="garch">

# genr Dataset "Generate a new variable"

Use this box to define a new variable, on the pattern <@var="name"> = <@var="formula">. The formula should be a well-formed combination of variable names, constants, operators and functions (details below). To ensure you get the type of variable you want, you can prefix the formula with a type-name, e.g. <@lit="scalar">, <@lit="series"> or <@lit="matrix">. For example, to create a series that has a constant value of 10, you can type 

<code>          
	series c = 10
</code>

(otherwise <@lit="c = 10"> would create a scalar variable). 

Supported <@itl="arithmetical operators"> are, in order of precedence: <@lit="^"> (exponentiation); <@lit="*">, <@lit="/"> and <@lit="%"> (modulus or remainder); <@lit="+"> and <@lit="-">. 

The available <@itl="Boolean operators"> are (again, in order of precedence): <@lit="!"> (negation), <@lit="&&"> (logical AND), <@lit="||"> (logical OR), <@lit=">">, <@lit="<">, <@lit="=">, <@lit=">="> (greater than or equal), <@lit="<="> (less than or equal) and <@lit="!="> (not equal). The Boolean operators can be used in constructing dummy variables: for instance <@lit="(x > 10)"> returns 1 if <@lit="x"> > 10, 0 otherwise. 

Built-in constants are <@lit="pi"> and <@lit="NA">. The latter is the missing value code: you can initialize a variable to the missing value with <@lit="scalar x = NA">. 

The <@lit="genr"> command supports a wide range of mathematical and statistical functions, including all the common ones plus several that are special to econometrics. In addition it offers access to numerous internal variables that are defined in the course of running regressions, doing hypothesis tests, and so on. For a listing of functions and accessors, see <@gfr="the Gretl function reference">. 

Besides the operators and functions noted above there are some special uses of <@lit="genr">: 

<indent>
• <@lit="genr time"> creates a time trend variable (1,2,3,…) called <@lit="time">. <@lit="genr index"> does the same thing except that the variable is called <@lit="index">. 
</indent>

<indent>
• <@lit="genr dummy"> creates dummy variables up to the periodicity of the data. In the case of quarterly data (periodicity 4), the program creates <@lit="dq1"> = 1 for first quarter and 0 in other quarters, <@lit="dq2"> = 1 for the second quarter and 0 in other quarters, and so on. With monthly data the dummies are named <@lit="dm1">, <@lit="dm2">, and so on. With other frequencies the names are <@lit="dummy_1">, <@lit="dummy_2">, etc. 
</indent>

<indent>
• <@lit="genr unitdum"> and <@lit="genr timedum"> create sets of special dummy variables for use with panel data. The first codes for the cross-sectional units and the second for the time period of the observations. 
</indent>

<@itl="Note">: In the command-line program, <@lit="genr"> commands that retrieve model-related data always reference the model that was estimated most recently. This is also true in the GUI program, if one uses <@lit="genr"> in the "gretl console" or enters a formula using the "Define new variable" option under the Variable menu in the main window. With the GUI, however, you have the option of retrieving data from any model currently displayed in a window (whether or not it's the most recent model). You do this under the "Model data" menu in the model's window. 

The special variable <@lit="obs"> serves as an index of the observations. For instance <@lit="genr dum = (obs=15)"> will generate a dummy variable that has value 1 for observation 15, 0 otherwise. You can also use this variable to pick out particular observations by date or name. For example, <@lit="genr d = (obs>1986:4)">, <@lit="genr d = (obs>"2008/04/01")">, or <@lit="genr d = (obs="CA")">. If daily dates or observation labels are used in this context, they should be enclosed in double quotes. Quarterly and monthly dates (with a colon) may be used unquoted. Note that in the case of annual time series data, the year is not distinguishable syntactically from a plain integer; therefore if you wish to compare observations against <@lit="obs"> by year you must use the function <@lit="obsnum"> to convert the year to a 1-based index value, as in <@lit="genr d = (obs>obsnum(1986))">. 

Scalar values can be pulled from a series in the context of a <@lit="genr"> formula, using the syntax <@var="varname"><@lit="["><@var="obs"><@lit="]">. The <@var="obs"> value can be given by number or date. Examples: <@lit="x[5]">, <@lit="CPI[1996:01]">. For daily data, the form <@var="YYYY/MM/DD"> should be used, e.g. <@lit="ibm[1970/01/23]">. 

An individual observation in a series can be modified via <@lit="genr">. To do this, a valid observation number or date, in square brackets, must be appended to the name of the variable on the left-hand side of the formula. For example, <@lit="genr x[3] = 30"> or <@lit="genr x[1950:04] = 303.7">. 

Menu path: /Add/Define new variable
Other access: Main window pop-up menu

Script command: <@ref="genr">

# genrand Programming "Generating random variables"

In this dialog you must give a name for the variable to be created, plus some additional information depending on the distribution. 

<indent>
• Uniform: the lower and upper bounds for the distribution. 
</indent>

<indent>
• Normal: the mean and (positive) standard deviation. 
</indent>

<indent>
• Chi-square and Student's t: the degrees of freedom, which must be positive. 
</indent>

<indent>
• F: both numerator and denominator degrees of freedom. 
</indent>

<indent>
• gamma: shape and scale parameters (both positive). 
</indent>

<indent>
• Binomial: the "success" probability and the integer number of trials. 
</indent>

<indent>
• Poisson: the positive mean (which also equals the variance). 
</indent>

If you want to generate repeatable sequences of pseudo-random numbers, you can set the seed, under the Tools menu. 

# genseed Programming "Setting the seed for random numbers"

The "seed" controls the starting point for the sequence of pseudo-random numbers generated in a given gretl session. By default the seed is set when the program is started, using the system time. This ensures that you get a different sequence of random numbers each time you run the program. If you want to obtain repeatable sequences, you need to set the seed manually (and take note of the value you used). 

Note that whenever you click "OK" in this dialog box, the generator is re-started, using the given seed. So, for example, if you (a) set the seed to (say) 147; (b) generate a series from the standard normal distribution; (c) revisit this dialog and click "OK" again with the seed still at 147; then (d) generate a second series from the standard normal distribution, the two generated series will be identical. 

# gmm Estimation "GMM estimation"

Performs Generalized Method of Moments (GMM) estimation using the BFGS (Broyden, Fletcher, Goldfarb, Shanno) algorithm. You must specify one or more commands for updating the relevant quantities (typically GMM residuals), one or more sets of orthogonality conditions, an initial matrix of weights, and a listing of the parameters to be estimated, all enclosed between the tags <@lit="gmm"> and <@lit="end gmm">. Any options should be appended to the <@lit="end gmm"> line. 

Please see <@pdf="the Gretl User's Guide"> for details on this command. Here we just illustrate with a simple example. 

<code>          
	gmm e = y - X*b
	  orthog e ; W
	  weights V
	  params b
	end gmm
</code>

In the example above we assume that <@lit="y"> and <@lit="X"> are data matrices, <@lit="b"> is an appropriately sized vector of parameter values, <@lit="W"> is a matrix of instruments, and <@lit="V"> is a suitable matrix of weights. The statement 

<code>          
	orthog e ; W
</code>

indicates that the residual vector <@lit="e"> is in principle orthogonal to each of the instruments composing the columns of <@lit="W">. 

Menu path: /Model/GMM

Script command: <@ref="gmm">

# graphing Graphs "Graphing"

Gretl calls a separate program, namely gnuplot, to generate graphs. Gnuplot is a very full-featured graphing program with myriad options. Gretl gives you direct access, via a graphical interface, to a subset of these options and it tries to choose sensible values for you; it also allows you to take complete control over graph details if you wish. 

With a graph displayed, you can click on the graph window for a pop-up menu with the following options: 

<indent>
• Save as postscript: save the graph in encapsulated postscript (EPS) format 
</indent>

<indent>
• Save as PNG: save in Portable Network Graphics format 
</indent>

<indent>
• Save to session as icon: the graph will appear in iconic form when you select "Icon view" from the Session menu 
</indent>

<indent>
• Zoom: lets you select an area within the graph for closer inspection 
</indent>

<indent>
• Print: (on the Gnome desktop and MS Windows only) lets you print the graph directly 
</indent>

<indent>
• Copy to clipboard: (MS Windows only) lets you paste the graph into Windows applications such as MS Word 
</indent>

<indent>
• Edit: opens a controller for the plot which lets you adjust various aspects of its appearance 
</indent>

<indent>
• Close: closes the graph window 
</indent>

If you know something about gnuplot and wish to get finer control over the appearance of a graph than is available via the graphical controller ("Edit" option), you have two further options: 

<indent>
• Once the graph is saved as a session icon, you can right-click on its icon for a further pop-up menu. One of the options here is "Edit plot commands", which opens an editing window with the actual gnuplot commands displayed. You can edit these commands and either save them for future processing or send them to gnuplot (with the execute toolbar icon in the plot commands editing window). 
</indent>

<indent>
• Another way to save the plot commands (or to save the displayed plot in formats other than EPS or PNG) is to use "Edit" item on a graph's pop-up menu to invoke the graphical controller, then click on the "Output to file" tab in the controller. You are then presented with a drop-down menu of formats in which to save the graph. 
</indent>

To find out more about gnuplot, see http://www.gnuplot.info 

# graphpg Graphs "Gretl graph page"

The session "graph page" will work only if you have the LaTeX typesetting system installed, and are able to generate and view PDF or PostScript output. 

In the session icon window, you can drag up to eight graphs onto the graph page icon. When you double-click on the graph page (or right-click and select "Display"), a page containing the selected graphs will be composed and opened in a suitable viewer. From there you should be able to print the page. 

To clear the graph page, right-click on its icon and select "Clear". 

In script (or console) mode, you can add a graph to the graph page by issuing the command <@lit="graphpg add"> after saving a named graph, as in 

<code>          
	grf1 <- gnuplot Y X
	graphpg add
</code>

Also in script mode you can call for display of the graph page using the command <@lit="graphpg show">, and can clear the page via <@lit="graphpg free">. 

Note that on systems other than MS Windows, you may have to adjust the setting for the program used to view PDF or PostScript files. Find that under the "Programs" tab in the gretl Preferences dialog box (under the Tools menu in the main window). 

Script command: <@ref="graphpg">

# 3-D Graphs "3-dimensional plots"

This feature works best if you have gnuplot 3.8 or higher installed. In that case you can manipulate the 3-D plot with the mouse (rotate it, and expand or shrink the axes). 

In composing a 3-D plot, note that the Z-axis will be shown as the vertical axis. Thus if you have some dependent variable that you think may be influenced by two independent variables, you should put the dependent variable on the Z-axis, and the independent variables on the X and Y axes. 

Unlike most other gretl graphs, 3-D plots are controlled by gnuplot rather than gretl itself. The gretl graph-editing menu is not available. 

# gui-htest Tests "Test statistic calculator"

Gretl's test calculator computes test statistics and p-values for various common hypothesis tests concerning one or two populations. The required input takes the form of sample statistics derived from one or two samples, depending on the test chosen. These statistics can be typed in as numerical values. Alternatively, if you have a data file open, you can get gretl to calculate sample statistics for a selected variable or variables (in the case of means and variances, but not in the case of proportions). 

If you want to base your test on a variable in the data set, first activate this option by checking the box titled "Use variable from dataset". Then the drop-down list of variables will become active and you can select a variable. When you select a variable from the list, the relevant statistics are automatically entered in the boxes below. 

In addition to the simple selection of a variable, you have the option of specifying a restriction on the selected variable (that is, defining a sub-sample). For example, suppose you have wage data in a variable called "wage" and you also have a dummy variable called "gender" that equals 1 for males and 0 for females (or vice versa). Then, in the test for the difference of two means, you could select "wage" in both slots, but add to the top slot "(gender=0)" and to the bottom "(gender=1)". This would then give you a test for the difference between mean male income and mean female income. Note that when you type a restriction in this way, you must then press the Enter key to have the sample statistics calculated. 

The sub-sampling restriction must be placed in parentheses following the selected variable, and in general the restriction takes the form "var2 op value," where var2 is the name of a variable in the current data set, val is a numerical value, and op is a comparison operator chosen from =, !=, <, >, <= or >= (respectively equality, inequality, less than, greater than, less than or equal, and greater than or equal). The spaces around the operator are optional. 

# gui-htest-np Tests "Nonparametric tests"

Under the "Difference test" tab you can carry out a nonparametric test for a difference between two populations or groups, the specific test depending on the option selected. 

Sign test: This test is based on the fact that if two samples, <@itl="x"> and <@itl="y">, are drawn randomly from the same distribution, the probability that <@itl="x"><@sub="i"> > <@itl="y"><@sub="i">, for each observation <@itl="i">, should equal 0.5. The test statistic is <@itl="w">, the number of observations for which <@itl="x"><@sub="i"> > <@itl="y"><@sub="i">. Under the null hypothesis this follows the Binomial distribution with parameters (<@itl="n">, 0.5), where <@itl="n"> is the number of observations. 

Rank sum test: The Wilcoxon rank-sum test is performed. This test proceeds by ranking the observations from both samples jointly, from smallest to largest, then finding the sum of the ranks of the observations from one of the samples. The two samples do not have to be of the same size, and if they differ the smaller sample is used in calculating the rank-sum. Under the null hypothesis that the samples are drawn from populations with the same median, the probability distribution of the rank-sum can be computed for any given sample sizes; and for reasonably large samples a close Normal approximation exists. 

Signed rank test: The Wilcoxon signed-rank test is performed. This is designed for matched data pairs such as, for example, the values of a variable for a sample of individuals before and after some treatment. The test proceeds by finding the differences between the paired observations, <@itl="x"><@sub="i"> – <@itl="y"><@sub="i">, ranking these differences by absolute value, then assigning to each pair a signed rank, the sign agreeing with the sign of the difference. One then calculates <@itl="W"><@sub="+">, the sum of the positive signed ranks. As with the rank-sum test, this statistic has a well-defined distribution under the null that the median difference is zero, which converges to the Normal for samples of reasonable size. 

Under the "Runs test" tab you can carry out a test for the randomness of a given variable, based on the number of runs of consecutive positive or negative values. If you select the option "Use first difference", the variable is differenced prior to the analysis and hence the runs are interpreted as runs of increasing or decreasing values of the original variable. The test statistic is based on a normal approximation to the distribution of the number of runs under the null of randomness. 

# hausman Tests "Panel diagnostics"

This test is available only after estimating an OLS model using panel data (see also <@lit="setobs">). It tests the simple pooled model against the principal alternatives, the fixed effects and random effects models. 

The fixed effects model allows the intercept of the regression to vary across the cross-sectional units. An <@itl="F">-test is reported for the null hypotheses that the intercepts do not differ. The random effects model decomposes the residual variance into two parts, one part specific to the cross-sectional unit and the other specific to the particular observation. (This estimator can be computed only if the number of cross-sectional units in the data set exceeds the number of parameters to be estimated.) The Breusch–Pagan LM statistic tests the null hypothesis that the pooled OLS estimator is adequate against the random effects alternative. 

The pooled OLS model may be rejected against both of the alternatives, fixed effects and random effects. Provided the unit- or group-specific error is uncorrelated with the independent variables, the random effects estimator is more efficient than the fixed effects estimator; otherwise the random effects estimator is inconsistent and the fixed effects estimator is to be preferred. The null hypothesis for the Hausman test is that the group-specific error is not so correlated (and therefore the random effects model is preferable). A low p-value for this test counts against the random effects model and in favor of fixed effects. 

Menu path: Model window, /Tests/Panel diagnostics

Script command: <@ref="hausman">

# hccme Estimation "Robust standard errors"

You are offered several variant calculations for standard errors that are robust in the presence of heteroskedasticity (and, in the case of the HAC estimator, autocorrelation). 

HC0 produces the original "White's standard errors"; HC1, HC2, HC3 and HC3a are subsequent variations that are generally reckoned to produce superior (more reliable) results. For details of the estimators, see MacKinnon and White (Journal of Econometrics, 1985) or Davidson and MacKinnon, Econometric Theory and Methods (Oxford, 2004). The labels given here are those used by Davidson and MacKinnon. Variant "HC3a" is the jackknife, as described in MacKinnon and White; HC3 is a close approximation to the jackknife. 

If you use the HAC estimator for OLS on time-series data, you are able to fine-tune the lag-length using the <@lit="set"> command. Please see the gretl manual or the script commands help file for details. 

When estimating a model via OLS using panel data, the default robust estimator of the covariance matrix is that given by Arellano. The alternative is Beck and Katz's Panel Corrected Standard Errors (PCSE). The latter take into account heteroskedasticity but not autocorrelation. 

Two robust estimators of the covariance matrix are offered for GARCH models: QML is the Quasi-Maximum Likelihood Estimator, and BW is the Bollerslev-Wooldridge estimator. 

# hsk Estimation "Heteroskedasticity-corrected estimates"

This command is applicable where heteroskedasticity is present in the form of an unknown function of the regressors which can be approximated by a quadratic relationship. In that context it offers the possibility of consistent standard errors and more efficient parameter estimates as compared with OLS. 

The procedure involves (a) OLS estimation of the model of interest, followed by (b) an auxiliary regression to generate an estimate of the error variance, then finally (c) weighted least squares, using as weight the reciprocal of the estimated variance. 

In the auxiliary regression (b) we regress the log of the squared residuals from the first OLS on the original regressors and their squares. The log transformation is performed to ensure that the estimated variances are non-negative. Call the fitted values from this regression <@itl="u"><@sup="*">. The weight series for the final WLS is then formed as 1/exp(<@itl="u"><@sup="*">). 

Menu path: /Model/Other linear models/Heteroskedasticity corrected

Script command: <@ref="hsk">

# hurst Statistics "Hurst exponent"

Calculates the Hurst exponent (a measure of persistence or long memory) for a time-series variable having at least 128 observations. 

The Hurst exponent is discussed by Mandelbrot. In theoretical terms it is the exponent, <@itl="H">, in the relationship 

  <@fig="hurst">

where RS is the "rescaled range" of the variable <@itl="x"> in samples of size <@itl="n"> and <@itl="a"> is a constant. The rescaled range is the range (maximum minus minimum) of the cumulated value or partial sum of <@itl="x"> over the sample period (after subtraction of the sample mean), divided by the sample standard deviation. 

As a reference point, if <@itl="x"> is white noise (zero mean, zero persistence) then the range of its cumulated "wandering" (which forms a random walk), scaled by the standard deviation, grows as the square root of the sample size, giving an expected Hurst exponent of 0.5. Values of the exponent significantly in excess of 0.5 indicate persistence, and values less than 0.5 indicate anti-persistence (negative autocorrelation). In principle the exponent is bounded by 0 and 1, although in finite samples it is possible to get an estimated exponent greater than 1. 

In gretl, the exponent is estimated using binary sub-sampling: we start with the entire data range, then the two halves of the range, then the four quarters, and so on. For sample sizes smaller than the data range, the RS value is the mean across the available samples. The exponent is then estimated as the slope coefficient in a regression of the log of RS on the log of sample size. 

Menu path: /Variable/Hurst exponent

Script command: <@ref="hurst">

# intreg Estimation "Interval regression model"

Estimates an interval regression model. This model arises when the dependent variable is imperfectly observed for some (possibly all) observations. In other words, the data generating process is assumed to be 

  <@itl="y* = x b + u">

but we only observe <@itl="m <= y* <= M"> (the interval may be left- or right-unbounded). Note that for some observations <@itl="m"> may equal <@itl="M">. The variables <@var="minvar"> and <@var="maxvar"> must contain <@lit="NA">s for left- and right-unbounded observations, respectively. 

In the model specification dialog, <@var="minvar"> and <@var="maxvar"> are indentified as the Lower bound variable and the Upper bound variable respectively. 

The model is estimated by maximum likelihood, assuming normality of the disturbance term. 

By default, standard errors are computed using the negative inverse of the Hessian. If the "Robust standard errors" box is checked, then QML or Huber–White standard errors are calculated instead. In this case the estimated covariance matrix is a "sandwich" of the inverse of the estimated Hessian and the outer product of the gradient. 

Menu path: /Model/Nonlinear models/Interval regression

Script command: <@ref="intreg">

# irfboot Graphs "Impulse response plots"

If you select the bootstrap option when plotting impulse responses, gretl computes a confidence interval for the responses using the bootstrap method. The residuals from the original VAR (or VECM) are resampled with replacement; an artificial dataset is constructed based on the original parameter estimates and the resampled residuals; the system is re-estimated and the impulse responses are re-evaluated. This is repeated 999 times and the α/2 and 1 – α/2 quantiles for the responses are found and plotted along with the point estimates. This option is not currently available for restricted VECMs. 

This dialog also supports reordering of the variables for the Cholesky decomposition of the cross-equation covariance matrix. The default is given by the order in which the variables are entered into the model specification, but the up and down arrows can be used to promote or demote a selected variable. 

# kalman Estimation "Kalman filter"

Opens a block of statements to set up a Kalman filter. This block should end with the line <@lit="end kalman">, to which the options shown above may be appended. The intervening lines specify the matrices that compose the filter. For example, 

<code>          
	kalman 
	  obsy y
	  obsymat H
	  statemat F
	  statevar Q
	end kalman
</code>

Please see <@pdf="the Gretl User's Guide"> for details. 

See also <@xrf="kfilter">, <@xrf="ksimul">, <@xrf="ksmooth">. 

Script command: <@ref="kalman">

# kpss Tests "KPSS stationarity test"

Computes the KPSS test (Kwiatkowski, Phillips, Schmidt and Shin, Journal of Econometrics, 1992) for stationarity of the given variable (or its first difference, if the differencing option is selected). The null hypothesis is that the variable in question is stationary, either around a level or, if the "include a trend" box is checked, around a deterministic linear trend. 

The selected lag order determines the size of the window used for Bartlett smoothing. If the "show regression results" box is checked the results of the auxiliary regression are printed, along with the estimated variance of the random walk component of the variable. 

The critical values shown for the test statistic are based on the response surfaces estimated by Sephton (Economics Letters, 1995), which are more accurate for small samples than the values given in the original KPSS article. When the test statistic lies between the 10 percent and 1 percent critical values a p-value is shown; this is obtained by linear interpolation and should not be taken too literally. 

Menu path: /Variable/Unit root tests/KPSS test

Script command: <@ref="kpss">

# lad Estimation "Least Absolute Deviation estimation"

Calculates a regression that minimizes the sum of the absolute deviations of the observed from the fitted values of the dependent variable. Coefficient estimates are derived using the Barrodale–Roberts simplex algorithm; a warning is printed if the solution is not unique. 

Standard errors are derived using the bootstrap procedure with 500 drawings. The covariance matrix for the parameter estimates, printed when the <@opt="--vcv"> flag is given, is based on the same bootstrap. 

Menu path: /Model/Robust estimation/Least Absolute Deviation

Script command: <@ref="lad">

# lags-dialog Estimation "Lag selection box"

In this dialog you can select the lag order for the independent variables in a time-series model, and in some cases for the dependent variable also. (But note that the common lag order for vector models such as VARs and VECMs is handled separately, via a selection spinner in the main model dialog box.) 

The spinners on the left let you select a range of consecutive lags for any given variable. To specify non-consecutive lags, click the check box next to the entry field titled "specific lags". This activates the entry box, into which you can type a list of lags, separated by spaces. 

The row marked "default" offers a quick way to set a common lag specification for all the independent variables: values set in that row are copied to all the others (apart from the dependent variable, if present). 

The dependent variable is treated specially: the minimum lag must be zero, which places the current value of the variable on the left-hand side of the model. Any higher lags appear with the independent variables on the right-hand side of the model. 

Values selected in this dialog are remembered for the duration of your session with a given dataset. 

# leverage Tests "Influential observations"

Must immediately follow an <@lit="ols"> command. Calculates the leverage (<@itl="h">, which must lie in the range 0 to 1) for each data point in the sample on which the previous model was estimated. Displays the residual (<@itl="u">) for each observation along with its leverage and a measure of its influence on the estimates, <@fig="influence">. "Leverage points" for which the value of <@itl="h"> exceeds 2<@itl="k">/<@itl="n"> (where <@itl="k"> is the number of parameters being estimated and <@itl="n"> is the sample size) are flagged with an asterisk. For details on the concepts of leverage and influence see Davidson and MacKinnon (1993), Chapter 2. 

DFFITS values are also shown: these are "studentized residuals" (predicted residuals divided by their standard errors) multiplied by <@fig="dffit">. For discussions of studentized residuals and DFFITS see chapter 12 of Maddala's Introduction to Econometrics or Belsley, Kuh and Welsch (1980). 

Briefly, a "predicted residual" is the difference between the observed value of the dependent variable at observation <@itl="t">, and the fitted value for observation <@itl="t"> obtained from a regression in which that observation is omitted (or a dummy variable with value 1 for observation <@itl="t"> alone has been added); the studentized residual is obtained by dividing the predicted residual by its standard error. 

The "+" icon at the top of the leverage test window brings up a dialog box that allows you to save one or more of the test variables to the current data set. 

Menu path: Model window, /Tests/Influential observations

Script command: <@ref="leverage">

# levinlin Tests "Levin-Lin-Chu test"

Carries out the panel unit-root test described by Levin, Lin and Chu (2002). The null hypothesis is that all of the individual time series exhibit a unit root, and the alternative is that none of the series has a unit root. (That is, a common AR(1) coefficient is assumed, although in other respects the statistical properties of the series are allowed to vary across individuals.) 

Menu path: /Variable/Unit root tests/Levin-Lin-Chu test

Script command: <@ref="levinlin">

# logistic Estimation "Logistic regression"

Logistic regression: carries out an OLS regression using the logistic transformation of the dependent variable, 

  <@fig="logistic1">

You are presented with a dialog box that allows you to specify a different maximum if you wish. The supplied <@itl="y"><@sup="*"> value must be greater than all of the observed values of the dependent variable. 

The fitted values and residuals from the regression are automatically transformed using 

  <@fig="logistic2">

where <@itl="x"> represents either a fitted value or a residual from the OLS regression using the transformed dependent variable. The reported values are therefore comparable with the original dependent variable. 

Note that if the dependent variable is binary, you should use the <@ref="logit"> command instead. 

Menu path: /Model/Nonlinear models/Logistic

Script command: <@ref="logistic">

# logit Estimation "Logit regression"

If the dependent variable is a binary variable (all values are 0 or 1) maximum likelihood estimates of the coefficients on <@var="indepvars"> are obtained via the "binary response model regression" (BRMR) method outlined by Davidson and MacKinnon (2004). As the model is nonlinear the slopes depend on the values of the independent variables. By default the slopes with respect to each of the independent variables are calculated (at the means of those variables) and these slopes replace the usual p-values in the regression output. This behavior can be suppressed my giving the <@opt="--p-values"> option. The chi-square statistic tests the null hypothesis that all coefficients are zero apart from the constant. 

By default, standard errors are computed using the negative inverse of the Hessian. If the "Robust standard errors" box is checked, then QML or Huber–White standard errors are calculated instead. In this case the estimated covariance matrix is a "sandwich" of the inverse of the estimated Hessian and the outer product of the gradient. See chapter 10 of Davidson and MacKinnon for details. 

If the dependent variable is not binary but is discrete, then by default it is interpreted as an ordinal response, and Ordered Logit estimates are obtained. However, if the <@opt="--multinomial"> option is given, the dependent variable is interpreted as an unordered response, and Multinomial Logit estimates are produced. (In either case, if the variable selected as dependent is not discrete an error is flagged.) In the multinomial case, the accessor <@lit="$mnlprobs"> is available after estimation, to get a matrix containing the estimated probabilities of the outcomes at each observation (observations in rows, outcomes in columns). 

If you want to use logit for analysis of proportions (where the dependent variable is the proportion of cases having a certain characteristic, at each observation, rather than a 1 or 0 variable indicating whether the characteristic is present or not) you should not use the <@lit="logit"> command, but rather construct the logit variable, as in 

<code>          
	genr lgt_p = log(p/(1 - p))
</code>

and use this as the dependent variable in an OLS regression. See chapter 12 of Ramanathan (2002). 

Menu path: /Model/Nonlinear models/Logit

Script command: <@ref="logit">

# mahal Statistics "Mahalanobis distances"

The Mahalanobis distance is the distance between two points in a <@itl="k">-dimensional space, scaled by the statistical variation in each dimension of the space. For example, if <@itl="p"> and <@itl="q"> are two observations on a set of <@itl="k"> variables with covariance matrix <@itl="C">, then the Mahalanobis distance between the observations is given by 

  <@fig="mahal">

where (<@itl="p"> – <@itl="q">) is a <@itl="k">-vector. This reduces to Euclidean distance if the covariance matrix is the identity matrix. 

The space for which distances are computed is defined by the selected variables. For each observation in the current sample range, the distance is computed between the observation and the centroid of the selected variables. This distance is the multidimensional counterpart of a standard <@itl="z">-score, and can be used to judge whether a given observation "belongs" with a group of other observations. 

If the number of variables selected is 4 or less, the covariance matrix and its inverse are printed. Clicking the "+" button at the top of the window displaying the distances give you the option of adding the distances to the dataset as a new variable. 

Menu path: /View/Mahalanobis distances

Script command: <@ref="mahal">

# markers Dataset "Add case markers"

This command needs the name of a file containing "case markers", that is, short identifying strings for the individual observations in the data set (for example, country or city names or codes). These marker strings should be no more than 8 characters long. The file should contain one marker per line, and there should be just as many markers as observations in the current dataset. If these conditions are met and the specified file is found, the case markers will be added; they will be visible when you choose "Display values" under gretl's Data menu. 

# meantest Tests "Difference of means"

By default the test statistic is calculated on the assumption that the variances are equal for the two variables; with the <@opt="--unequal-vars"> option the variances are assumed to be different. This will make a difference to the test statistic only if there are different numbers of non-missing observations for the two variables. 

Calculates the t statistic for the null hypothesis that the population means are equal for two selected variables, and shows its p-value. The command may be called with or without the assumption that the variances are equal for the two variables (although this will make a difference to the test statistic only if there are different numbers of non-missing observations for the two variables.) 

Menu path: /Model/Bivariate tests/Difference of means

Script command: <@ref="meantest">

# missing Dataset "Missing data values"

Set a numerical value that will be interpreted as "missing" or "not available", either for a particular data series (under the Variable menu) or globally for the entire data set (under the Sample menu). 

Gretl has its own internal coding for missing values, but sometimes imported data may employ a different code. For example, if a particular series is coded such that a value of -1 indicates "not applicable", you can select "Set missing value code" under the Variable menu and type in the value "-1" (without the quotes). Gretl will then read the -1s as missing observations. 

# mle Estimation "Maximum likelihood estimation"

Performs Maximum Likelihood (ML) estimation using the BFGS (Broyden, Fletcher, Goldfarb, Shanno) algorithm. You must specify the log-likelihood function; it is recommended that you also supply expressions for the derivatives of this function with respect to each of the parameters if possible. 

Simple example: Suppose we have a series <@lit="X"> with values 0 or 1 and we wish to obtain the maximum likelihood estimate of the probability, <@lit="p">, that <@lit="X"> = 1. (In this simple case we can guess in advance that the ML estimate of <@lit="p"> will simply equal the proportion of Xs equal to 1 in the sample.) 

The parameter <@lit="p"> must first be added to the dataset and given an initial value. This can be done using the genr command or via menu choices. Appropriate "genr" lines may be typed into the MLE specification window prior to the specification of the log-likelihood function. 

In the MLE window we type the following lines: 

<code>          
	loglik = X*log(p) + (1-X)*log(1-p)
	deriv p = X/p - (1-X)/(1-p)
</code>

The first line specifies the log-likelihood function, and the next line supplies the derivative of that function with respect to the parameter p. If no "deriv" lines are given, a numerical approximation to the derivatives is computed. 

If the parameter p was not previously declared we could preface the above lines with something like the following: 

<code>          
	genr p = 0.5
</code>

By default, standard errors are based on the Outer Product of the Gradient. If the robust standard errors box is checked, a QML estimator is used (namely, a sandwich of the negative inverse of the Hessian and the covariance matrix of the gradient). The Hessian is approximated numerically. 

Menu path: /Model/Maximum likelihood

Script command: <@ref="mle">

# modeltab Utilities "The model table"

In econometric research it is common to estimate several models with a common dependent variable—the models differing in respect of which independent variables are included, or perhaps in respect of the estimator used. In this situation it is convenient to present the regression results in the form of a table, where each column contains the results (coefficient estimates and standard errors) for a given model, and each row contains the estimates for a given variable across the models. 

Gretl provides a means of constructing such a table (and copying it in plain text, LaTeX or Rich Text Format). Here is how to do it: 

<indent>
1. Estimate a model which you wish to include in the table, and in the model display window, under the File menu, select "Save to session as icon" or "Save as icon and close". 
</indent>

<indent>
2. Repeat step 1 for the other models to be included in the table (up to a total of six models). 
</indent>

<indent>
3. When you are done estimating the models, open the icon view of your gretl session (by selecting "icon view" under the Session menu in the main gretl window, or by clicking the "session icon view" icon on the gretl toolbar). 
</indent>

<indent>
4. In session icon view, there is an icon labeled "Model table". Decide which model you wish to appear in the left-most column of the model table and add it to the table, either by dragging its icon onto the Model table icon, or by right-clicking on the model icon and selecting "Add to model table" from the pop-up menu. 
</indent>

<indent>
5. Repeat step 4 for the other models you wish to include in the table. The second model selected will appear in the second column from the left, and so on. 
</indent>

<indent>
6. When you are finished composing the model table, display it by double-clicking on its icon. Under the Edit menu in the window which appears, you have the option of copying the table to the clipboard in various formats. 
</indent>

<indent>
7. If the ordering of the models in the table is not what you wanted, right-click on the model table icon and select "Clear table". Then go back to step 4 above and try again. 
</indent>

Menu path: Session window, Model table icon

Script command: <@ref="modeltab">

# mpols Estimation "Multiple-precision OLS"

Computes OLS estimates for the specified model using multiple precision floating-point arithmetic. This command is available only if gretl is compiled with support for the Gnu Multiple Precision (GMP) library. By default 256 bits of precision are used for the calculations, but this can be increased via the environment variable <@lit="GRETL_MP_BITS">. For example, when using the bash shell one could issue the following command, before starting gretl, to set a precision of 1024 bits. 

<code>          
	export GRETL_MP_BITS=1024
</code>

Menu path: /Model/Other linear models/High precision OLS

Script command: <@ref="mpols">

# negbin Estimation "Negative Binomial regression"

Estimates a Negative Binomial model. The dependent variable is taken to represent a count of the occurrence of events of some sort, and must have only non-negative integer values. By default the model NegBin 2 is used, in which the conditional variance of the count is given by μ(1 + αμ), where μ denotes the conditional mean. But if the <@opt="--model1"> option is given the conditional variance is μ(1 + α). 

The optional <@lit="offset"> series works in the same way as for the <@ref="poisson"> command. The Poisson model is a restricted form of the Negative Binomial in which α = 0 by construction. 

By default, standard errors are computed using a numerical approximation to the Hessian at convergence. But if the <@opt="--opg"> option is given the covariance matrix is based on the Outer Product of the Gradient (OPG), or if the <@opt="--robust"> option is given QML standard errors are calculated, using a "sandwich" of the inverse of the Hessian and the OPG. 

Menu path: /Model/Nonlinear models/Count data...

Script command: <@ref="negbin">

# nls Estimation "Nonlinear Least Squares"

Performs Nonlinear Least Squares (NLS) estimation using a modified version of the Levenberg–Marquardt algorithm. You must supply a function specification; it is recommended but not required that you also supply expressions for the derivatives of this function with respect to each of the parameters if possible. If you do not supply derivatives you should instead give a list of the parameters to be estimated (separated by spaces or commas), preceded by the keyword <@lit="params">; these can be either scalars, or vectors, or any combination of the two. 

Example: Suppose we have a data set with variables <@itl="C"> and <@itl="Y"> (e.g. <@lit="greene11_3.gdt">) and we wish to estimate a nonlinear consumption function of the form 

  <@fig="greene_Cfunc">

The parameters alpha, beta and gamma must first be added to the dataset and given initial values. This can be done using the genr command or via menu choices. Appropriate "genr" lines may be typed into the NLS specification window prior to the function specification. 

In the NLS window we type the following lines: 

<code>          
	C = alpha + beta * Y^gamma
	deriv alpha = 1
	deriv beta = Y^gamma
	deriv gamma = beta * Y^gamma * log(Y)
</code>

The first line specifies the regression function, and the next three lines supply the derivatives of that function with respect to each of the parameters in turn. If the "deriv" lines are not given, a numerical approximation to the Jacobian is computed. 

If the parameters alpha, beta and gamma were not previously declared we could preface the above lines with something like the following: 

<code>          
	genr alpha = 1
	genr beta = 1
	genr gamma = 1
</code>

For further details on NLS estimation please see <@pdf="the Gretl User's Guide">. 

Menu path: /Model/Nonlinear models/Nonlinear Least Squares

Script command: <@ref="nls">

# normtest Tests "Normality test"

Carries out a test for normality for the given <@var="series">. The specific test is controlled by the option flags (but if no flag is given, the Doornik–Hansen test is performed). Note: the Doornik–Hansen and Shapiro–Wilk tests are recommended over the others, on account of their superior small-sample properties. 

The test statistic and its p-value may be retrieved using the accessors <@lit="$test"> and <@lit="$pvalue">. Please note that if the <@opt="--all"> option is given, the result recorded is that from the Doornik–Hansen test. 

Menu path: /Variable/Normality test

Script command: <@ref="normtest">

# nulldata Dataset "Creating a blank dataset"

Establishes a "blank" data set, containing only a constant and an index variable, with periodicity 1 and the specified number of observations. This may be used for simulation purposes: some of the <@lit="genr"> commands (e.g. <@lit="genr uniform()">, <@lit="genr normal()">) will generate dummy data from scratch to fill out the data set. This command may be useful in conjunction with <@lit="loop">. See also the "seed" option to the <@ref="set"> command. 

By default, this command cleans out all data in gretl's current workspace. If you give the <@opt="--preserve"> option, however, any currently defined matrices are retained. 

Menu path: /File/New data set

Script command: <@ref="nulldata">

# ols Estimation "Ordinary Least Squares"

Computes ordinary least squares (OLS) estimates for the specified model. 

Besides coefficient estimates and standard errors, the program also prints p-values for <@itl="t"> (two-tailed) and <@itl="F">-statistics. A p-value below 0.01 indicates statistical significance at the 1 percent level and is marked with <@lit="***">. <@lit="**"> indicates significance between 1 and 5 percent and <@lit="*"> indicates significance between the 5 and 10 percent levels. Model selection statistics (the Akaike Information Criterion or AIC and Schwarz's Bayesian Information Criterion) are also printed. The formula used for the AIC is that given by Akaike (1974), namely minus two times the maximized log-likelihood plus two times the number of parameters estimated. 

saves the residuals under the name <@lit="uh">. See the "accessors" section of the gretl function reference for details. 

Menu path: /Model/Ordinary Least Squares
Other access: Beta-hat button on toolbar

Script command: <@ref="ols">

# omit Tests "Omit variables"

This command re-estimates the given model after omitting the specified variables, or after sequentially omitting insignificant variables if the relevant box is available and is checked. Besides the usual model output, it prints a test for the joint significance of the omitted variables. The null hypothesis is that the true coefficients on all the omitted variables equal zero. 

Sequential elimination works as follows: at each step the variable with the highest p-value is omitted, until all remaining variables have a p-value no greater than some cutoff. The default cutoff is 10 percent (two-sided); this can be adjusted via the spin button. By default this process operates on all variables in the model (apart from the constant). If you want to confine it to a subset of the variables, check the box labeled "Test only selected variables" and make a selection. 

Menu path: Model window, /Tests/Omit variables

Script command: <@ref="omit">

# online Dataset "Access online databases"

Gretl is able to access databases at Wake Forest University (your computer must be connected to the internet for this to work). 

Under the "File, Browse databases" menu, select the item "on database server". A window should appear, showing a listing of the gretl databases available at Wake Forest. (Depending on your location and the speed of your internet connection, this may take a few seconds.) Along with the name of the database and a short description, there will appear a "Local status" entry: this shows whether you have the database installed locally (on the hard drive of your computer) and if so, whether or not it is up to date with the version on the server. 

If you have a given database installed locally, and it is up to date, there is no advantage in accessing it via the server. But for a database that is not already installed and up to date, you may wish to get a listing of the data series: click on "Get series listing". This brings up a further window, from which you can display the values of a chosen data series, graph those values, or import them into gretl's workspace. These tasks can be accomplished using the "Series" menu, or via the popup menu that appears when you click the right mouse button on a given series. You can also search the listing for a variable of interest (the "Find" menu item). 

If you want faster access to the data, or wish to access the database offline, then select the line showing the database you want, in the initial database window, and press the "Install" button. This will download the database in compressed format, then uncompress it and install it on your hard drive. Thereafter you should be able to find it under the "File, Browse databases, gretl native" menu. 

# panel Estimation "Panel models"

Estimates a panel model. By default the fixed effects estimator is used; this is implemented by subtracting the group or unit means from the original data. 

If the "Random effects" button is checked, random effects GLS estimates are computed, using the method of Swamy and Arora. 

For more details on panel estimation, please see <@pdf="the Gretl User's Guide">. 

Menu path: /Model/Panel

Script command: <@ref="panel">

# panel-between Estimation "Between groups model"

This dialog allows you to enter a specification for the "between model" in the context of panel data. This regression uses the group-means of the data, thereby ignoring the variation within the groups. This model is rarely of great interest in its own right, but may be useful for purposes of comparison (for example, with the fixed effects model). 

# panel-mode Dataset "Panel data organization"

This dialog offers up to three options with regard to defining a data set as a panel. The first two options require that the data set is already organized in a panel format (although this may not yet be recognized by gretl). The third option requires that the data set contains variables that represent the panel structure. 

<@itl="Stacked time series">: Let there be <@var="N"> cross-sectional units in the data set, and let <@var="T"> = the number of time-series observations per unit. By selecting this option you are telling gretl that the data set is currently composed of <@var="N"> consecutive blocks of <@var="T"> time-series observations, one for each cross-sectional unit. The next step will be to specify the value of <@var="N">. 

<@itl="Stacked cross sections">: You are telling gretl that the data set is currently composed of <@var="T"> consecutive blocks of <@var="N"> cross-sectional observations, one for each time period. The next step, again, will be to specify the value of <@var="N">. 

If the total number of observations in the current dataset is prime, the above options are not available. 

<@itl="Use index variables">: You are saying that the data set is currently organized any old way (it doesn't matter how), but that it contains two variables that index the cross-sectional units and the time periods respectively. The next step will be to select those two variables. Panel index variables must have nothing but non-negative integer values, with no missing values. If there are no such variables in the dataset this option is not available. 

# panel-wls Estimation "Groupwise weighted least squares"

Groupwise weighted least squares for panel data. Computes weighted least squares (WLS) estimates, with the weights based on the estimated error variances for the respective cross-sectional units in the sample. 

If the iteration option is selected, the procedure is iterated: at each round the residuals are re-computed using the current WLS parameter estimates, which gives rise to a new set of estimates of the error variances, and a hence a new set of weights. Iteration stops when the maximum difference in the parameter estimates from one round to the next falls below 0.0001 or the number of iterations reaches 20. If the iteration converges, the resulting estimates are Maximum Likelihood. 

# pca Statistics "Principal Components Analysis"

Principal Components Analysis. Prints the eigenvalues of the correlation matrix (or the covariance matrix if the option box is checked) for the variables in <@var="varlist">, along with the proportion of the joint variance accounted for by each component. Also prints the corresponding eigenvectors (or "component loadings"). 

In the window displaying the results, you have the option of saving the principal components to the dataset as series. 

Menu path: /View/Principal components
Other access: Main window pop-up (multiple selection)

Script command: <@ref="pca">

# pergm Statistics "Periodogram"

Computes and displays (and if not in batch mode, graphs) the spectrum of the specified series. By default the sample periodogram is given, but optionally a Bartlett lag window is used in estimating the spectrum (see, for example, Greene's <@itl="Econometric Analysis"> for a discussion of this). The default width of the Bartlett window is twice the square root of the sample size but this can be set manually using the <@var="bandwidth"> parameter, up to a maximum of half the sample size. 

If the <@opt="--log"> option is given the spectrum is represented on a logarithmic scale. 

The (mutually exclusive) options <@opt="--radians"> and <@opt="--degrees"> influence the appearance of the frequency axis when the periodogram is graphed. By default the frequency is scaled by the number of periods in the sample, but these options cause the axis to be labeled from 0 to π radians or from 0 to 180°, respectively. 

Menu path: /Variable/Periodogram
Other access: Main window pop-up menu (single selection)

Script command: <@ref="pergm">

# polyweights Transformations "Polynomial trend fitting"

In fitting a polynomial trend to a time series it may be desirable to give extra weight to the observations at the start and end of the sample. (Points in the middle of the sample range have neighbours on both sides that are likely to be pulling the fit in the same general direction.) 

The weighting schemes offered here (quadratic, cosine-bell and steps) can be used to this effect. If you select one of these schemes two additional settings must be chosen: first, what maximum weight should be used (the minimum, baseline weight is 1.0)? Second, what central fraction of the sample should be given a uniform (minimal) weighting? 

Suppose, for example, you select a maximum weight of 3.0 and a central fraction of 0.4. This means that the middle 40 percent of the data get a weight of 1.0. If the steps shape is selected the first and last 30 percent of the observations get a weight of 3.0; otherwise, for the first 30 percent of observations the weights decline gradually from 3.0 to 1.0; and for the last 30 percent the weights increase from 1.0 to 3.0. 

# poisson Estimation "Poisson estimation"

Estimates a poisson regression. The dependent variable is taken to represent the occurrence of events of some sort, and must take on only non-negative integer values. 

If a discrete random variable <@itl="Y"> follows the Poisson distribution, then 

  <@fig="poisson1">

for <@itl="y"> = 0, 1, 2,…. The mean and variance of the distribution are both equal to <@itl="v">. In the Poisson regression model, the parameter <@itl="v"> is represented as a function of one or more independent variables. The most common version (and the only one supported by gretl) has 

  <@fig="poisson2">

or in other words the log of <@itl="v"> is a linear function of the independent variables. 

Optionally, you may add an "offset" variable to the specification. This is a scale variable, the log of which is added to the linear regression function (implicitly, with a coefficient of 1.0). This makes sense if you expect the number of occurrences of the event in question to be proportional, other things equal, to some known factor. For example, the number of traffic accidents might be supposed to be proportional to traffic volume, other things equal, and in that case traffic volume could be specified as an "offset" in a Poisson model of the accident rate. The offset variable must be strictly positive. 

By default, standard errors are computed using the negative inverse of the Hessian. If the <@opt="--robust"> flag is given, then QML or Huber–White standard errors are calculated instead. In this case the estimated covariance matrix is a "sandwich" of the inverse of the estimated Hessian and the outer product of the gradient. 

See also <@ref="negbin">. 

Menu path: /Model/Nonlinear models/Count data...

Script command: <@ref="poisson">

# probit Estimation "Probit model"

If the dependent variable is a binary variable (all values are 0 or 1) maximum likelihood estimates of the coefficients on <@var="indepvars"> are obtained via the "binary response model regression" (BRMR) method outlined by Davidson and MacKinnon (2004). As the model is nonlinear the slopes depend on the values of the independent variables. By default the slopes with respect to each of the independent variables are calculated (at the means of those variables) and these slopes replace the usual p-values in the regression output. This behavior can be suppressed my giving the <@opt="--p-values"> option. The chi-square statistic tests the null hypothesis that all coefficients are zero apart from the constant. 

By default, standard errors are computed using the negative inverse of the Hessian. If the "Robust standard errors" box is checked, then QML or Huber–White standard errors are calculated instead. In this case the estimated covariance matrix is a "sandwich" of the inverse of the estimated Hessian and the outer product of the gradient. See chapter 10 of Davidson and MacKinnon for details. 

If the dependent variable is not binary but is discrete, then Ordered Probit estimates are obtained. (If the variable selected as dependent is not discrete, an error is flagged.) 

Probit for analysis of proportions is not implemented in gretl at this point. 

Menu path: /Model/Nonlinear models/Probit

Script command: <@ref="probit">

# qlrtest Tests "Quandt likelihood ratio test"

For a model estimated on time-series data via OLS, performs the Quandt likelihood ratio (QLR) test for a structural break at an unknown point in time, with 15 percent trimming at the beginning and end of the sample period. 

For each potential break point within the central 70 percent of the observations, a Chow test is performed (see <@ref="chow">). The QLR test statistic is the maximum of the <@itl="F"> values from these tests. It follows a non-standard distribution, the critical values of which are taken from Stock and Watson's <@itl="Introduction to Econometrics"> (2003). If the QLR statistic exceeds the critical value at the chosen level of significance, one can infer that the parameters of the model are not constant. This statistic can be used to detect forms of instability other than a single discrete break (such as multiple breaks or a slow drifting of the parameters). 

Menu path: Model window, /Tests/QLR test

Script command: <@ref="qlrtest">

# qqplot Graphs "Q-Q plot"

With just one series selected, displays a plot of the empirical quantiles of the given series against the quantiles of the normal distribution. The series must include at least 20 valid observations in the current sample range. By default the empirical quantiles are plotted against quantiles of the normal distribution having the same mean and variance as the sample data, but two alternatives are available: the data may be standardized (converted to z-scores) before plotting, or the "raw" empirical quantiles may be plotted against the quantiles of the standard normal distribution. 

Given two series arguments, <@var="y"> and <@var="x">, displays a plot of the empirical quantiles of <@var="y"> against those of <@var="x">. The data values are not standardized. 

Menu path: /Variable/Normal Q-Q plot
Menu path: /View/Graph specified vars/Q-Q plot

Script command: <@ref="qqplot">

# quantreg Estimation "Quantile regression"

Quantile regression. By default standard errors are computed according to the asymptotic formula given by Koenker and Bassett (<@itl="Econometrica">, 1978), but if the "robust" box is checked we use the heteroskedasticity-robust variant from Koenker and Zhao (<@itl="Journal of Nonparametric Statistics">, 1994). 

If the "Compute confidence intervals" option is checked gretl will calculate confidence intervals for the coefficients, in place of standard errors. The "robust" check-box still has an effect: if it is not checked, the intervals are computed on the assumption of IID errors; with it, gretl uses the robust estimator developed by Koenker and Machado (<@itl="Journal of the American Statistical Association">, 1999). Note that these intervals are not just "plus or minus so many standard errors"; in general, they are asymmetrical about the point estimates of the coefficients. 

You may give a list of quantiles (see the drop-down list for some pre-defined possibilities). In that case gretl will calculate quantile estimates and either standard errors or confidence intervals for each of the specified values. 

To Follow up on the references given above, please see <@pdf="the Gretl User's Guide">. 

Menu path: /Model/Robust estimation/Quantile regression

Script command: <@ref="quantreg">

# reset Tests "Ramsey's RESET"

Must follow the estimation of a model via OLS. Carries out Ramsey's RESET test for model specification (non-linearity) by adding the square and/or the cube of the fitted values to the regression and calculating the <@itl="F"> statistic for the null hypothesis that the parameters on the added terms are zero. 

Menu path: Model window, /Tests/Ramsey's RESET

Script command: <@ref="reset">

# restrict-model Tests "Restrictions on a model"

Each restriction in the set should be expressed as an equation, with a linear combination of parameters on the left and a numeric value to the right of the equals sign. Parameters may be referenced in the form <@lit="b["><@var="i"><@lit="]">, where <@var="i"> represents the position in the list of regressors (starting at 1), or <@lit="b["><@var="varname"><@lit="]">, where <@var="varname"> is the name of the regressor in question. 

The <@lit="b"> terms in the equation representing a restriction may be prefixed with a numeric multiplier, using <@lit="*"> to represent multiplication, for example <@lit="3.5*b[4]">. 

Here is an example of a set of restrictions: 

<code>          
	b[1] = 0
	b[2] - b[3] = 0
	b[4] + 2*b[5] = 1
</code>

# restrict-system Tests "Restrictions on a system of equations"

Each restriction in the set should be expressed as an equation, with a linear combination of parameters on the left and a numeric value to the right of the equals sign. Parameters are referenced using <@lit="b"> plus two numbers in square brackets. The leading number represents the position of the equation within the system and the second number indicates position in the list of regressors, starting at 1 in both cases. For example <@lit="b[2,1]"> denotes the first parameter in the second equation, and <@lit="b[3,2]"> the second parameter in the third equation. 

The <@lit="b"> terms in the equation representing a restriction may be prefixed with a numeric multiplier, using <@lit="*"> to represent multiplication, for example <@lit="3.5*b[1,4]">. 

Here is an example of a set of restrictions: 

<code>          
	b[1,1] = 0
	b[1,2] - b[2,2] = 0
	b[3,4] + 2*b[3,5] = 1
</code>

# restrict-vecm Tests "Restrictions on a VECM"

Use this command to place linear restrictions on the cointegrating relations (beta) and/or adjustment coefficients (alpha) in a vector error-correction model (VECM). 

Each restriction should be expressed as an equation, with a linear combination of parameters to the left of the equals sign and a numerical value on the right. Restrictions on beta may be non-homogeneous (non-zero on the right), but alpha restrictions must be homogeneous (zero on the right). 

If the VECM is of rank 1, the elements of beta are referenced in the form <@lit="b["><@var="i"><@lit="]">, where <@var="i"> represents position in the cointegrating vector, starting at 1. For example, <@lit="b[2]"> denotes the second element in beta. If the rank is greater than 1, use <@lit="b"> plus two numbers in square brackets. For example, <@lit="b[2,1]"> denotes the first element in the second cointegrating vector. 

To reference elements of alpha, use <@lit="a"> instead of <@lit="b">. 

The parameter identifiers in the equation representing a restriction may be prefixed with a numeric multiplier, using <@lit="*"> to represent multiplication, for example <@lit="3.5*b[4]">. 

Here is an example of a set of restrictions on a VECM of rank 1. 

<code>          
	b[1] + b[2] = 0
	b[1] + b[3] = 0
</code>

See also <@pdf="the Gretl User's Guide">. 

# rmplot Graphs "Range-mean plot"

Range–mean plot: this command creates a simple graph to help in deciding whether a time series, <@itl="y">(t), has constant variance or not. We take the full sample t=1,...,T and divide it into small subsamples of arbitrary size <@itl="k">. The first subsample is formed by <@itl="y">(1),...,<@itl="y">(k), the second is <@itl="y">(k+1), ..., <@itl="y">(2k), and so on. For each subsample we calculate the sample mean and range (= maximum minus minimum), and we construct a graph with the means on the horizontal axis and the ranges on the vertical. So each subsample is represented by a point in this plane. If the variance of the series is constant we would expect the subsample range to be independent of the subsample mean; if we see the points approximate an upward-sloping line this suggests the variance of the series is increasing in its mean; and if the points approximate a downward sloping line this suggests the variance is decreasing in the mean. 

Besides the graph, gretl displays the means and ranges for each subsample, along with the slope coefficient for an OLS regression of the range on the mean and the p-value for the null hypothesis that this slope is zero. If the slope coefficient is significant at the 10 percent significance level then the fitted line from the regression of range on mean is shown on the graph. The <@itl="t">-statistic for the null, and the corresponding p-value, are recorded and may be retrieved using the accessors <@lit="$test"> and <@lit="$pvalue"> respectively. 

Menu path: /Variable/Range-mean graph

Script command: <@ref="rmplot">

# runs Tests "Runs test"

Carries out the nonparametric "runs" test for randomness of the specified <@var="series">, where runs are defined as sequences of consecutive positive or negative values. If you want to test for randomness of deviations from the median, for a variable named <@lit="x1"> with a non-zero median, you can do the following: 

<code>          
	genr signx1 = x1 - median(x1)
	runs signx1
</code>

If the <@opt="--difference"> option is given, the variable is differenced prior to the analysis, hence the runs are interpreted as sequences of consecutive increases or decreases in the value of the variable. 

If the <@opt="--equal"> option is given, the null hypothesis incorporates the assumption that positive and negative values are equiprobable, otherwise the test statistic is invariant with respect to the "fairness" of the process generating the sequence, and the test focuses on independence alone. 

Menu path: /Tools/Nonparametric tests

Script command: <@ref="runs">

# sampling Dataset "Setting the sample"

The Sample menu offers several ways of selecting a sub-sample from the current dataset. 

If you choose "Sample/Restrict based on criterion..." you need to supply a Boolean (logical) expression, of the same sort that you would use to define a dummy variable. For example the expression "sqft > 1400" will select only cases for which the variable sqft has a value greater than 1400. Conditions may be concatenated using the logical operators "&&" (AND) and "||" (OR), and may be negated using "!" (NOT). If the dataset already contains dummy variables, you are also given the option of selecting one of these to define the sample (observations with a value of 1 for the selected dummy will be included, and others excluded). 

The menu item "Sample/Drop all obs with missing values" redefines the sample to exclude all observations for which values of one or more variables are missing (leaving only complete cases). 

To select observations for which a particular variable has no missing values, use "Restrict based on criterion..." and supply the Boolean condition "!missing(varname)" (replace "varname" with the name of the variable you want to use). 

If the observations are labeled, you can exclude particular observations using, for example, <@lit="obs!="France""> as the Boolean criterion. The observation name must be enclosed in double quotes. 

One point should be noted about defining a sample based on a dummy variable, a Boolean expression, or on the missing values criterion: Any "structural" information in the data header file (regarding the time series or panel nature of the data) is lost. You may reimpose structure with "Sample/Set frequency, startobs...". 

Please see <@pdf="the Gretl User's Guide"> for further details. 

# save-labels Utilities "Save or remove series labels"

If you choose Export here, gretl will write a file containing the descriptive labels of any series in the current dataset that have such labels. This is a plain text file with one line per variable. The line will be empty for variables that have no descriptive label. 

If you choose Remove, the descriptive labels will be removed for all series that have such labels. This would be appropriate only if the current labels have somehow been added in error. 

# add-labels Utilities "Add series labels"

If you choose Yes here, you are offered a file-open dialog box to select a plain text file containing descriptive labels for the series in the current dataset. The file should contain one label per line; a blank line means no label. Gretl will attempt to read as many labels as there are series in the dataset, excluding the constant. 

# save-script Utilities "Save commands?"

If you choose Yes here, gretl will write a file containing a record of the commands you executed in the current session. Most commands that you execute via "point and click" have a "script" counterpart, and it is these script commands that will be saved. You could take the file as the basis for writing a gretl command script. 

If you don't care to be prompted to save a record of commands on exit, uncheck the tick box in the save commands dialog. 

# save-session Utilities "Save this gretl session?"

If you choose Yes here, gretl will write a file containing a "snapshot" of the current session, including a copy of the working dataset along with any models, graphs or other objects that you have saved "as icons". You can re-open this file later to recreate the state of gretl as of the time you quit the session (see the "File/Session files" menu). 

If you mostly work with gretl using command scripts (which we recommend for "serious" econometric work) you probably don't need to save the session, but you should be sure to save any changes to your script that you wish to keep. You may also want to save any changes to your dataset, unless these are of a sort that can easily be recreated by running a script. 

If you work with scripts and don't care to be prompted to save your session on exit, uncheck the tick box in the save session dialog. 

# scatters Graphs "Multiple pairwise graphs"

Generates pairwise graphs of the selected "Y-axis variable" against each of the selected "X-axis variables" in turn. (Or you can select several variables for the Y-axis and one for the X-axis.) Scanning a set of such plots can be a useful step in exploratory data analysis. The maximum number of plots is six; any extra variables will be ignored. 

Menu path: /View/Multiple graphs

Script command: <@ref="scatters">

# setinfo Dataset "Edit attributes of variable"

In this dialog box you can: 

* Rename a (series) variable. 

* Add or edit a description of the variable: this appears next to the variable name in the gretl main window. 

* Add or edit the "display name" for the variable (if the variable is a series, not a scalar). This string (maximum 19 characters) is shown in place of the variable name when the variable is displayed in a graph. Thus for instance you can associate a more comprehensible string such as "T-bill rate" with a cryptically named variable such as "tb3". 

* (For time-series data) set the compaction method for the variable. This method will be used if you decide to reduce the frequency of the dataset, or if you update the variable by importing from a database where the variable is at a higher frequency than in the working dataset. 

* Mark a variable as discrete (for series with integer values only). This affects the way the variable is handled when you ask for a frequency plot. 

Menu path: /Variable/Edit attributes
Other access: Main window pop-up menu

Script command: <@ref="setinfo">

# setmiss Dataset "Missing value code"

Set a numerical value that will be interpreted as "missing" or "not applicable", either for a particular data series (under the Variable menu) or globally for the entire data set (under the Sample menu). 

Gretl has its own internal coding for missing values, but sometimes imported data may employ a different code. For example, if a particular series is coded such that a value of -1 indicates "not applicable", you can select "Set missing value code" under the Variable menu and type in the value "-1" (without the quotes). Gretl will then read the -1s as missing observations. 

Menu path: /Data/Set missing value code

Script command: <@ref="setmiss">

# spearman Statistics "Spearmans's rank correlation"

Prints Spearman's rank correlation coefficient for a specified pair of variables. The variables do not have to be ranked manually in advance; the function takes care of this. 

The automatic ranking is from largest to smallest (i.e. the largest data value gets rank 1). If you need to invert this ranking, create a new variable which is the negative of the original. For example: 

<code>          
	genr altx = -x
	spearman altx y
</code>

Menu path: /Model/Robust estimation/Rank correlation

Script command: <@ref="spearman">

# store Dataset "Save data"

Saves either the entire dataset or, if a <@var="varlist"> is supplied, a specified subset of the series in the current dataset, to the file given by <@var="filename">. 

By default the data are saved in "native" gretl format, but the option flags permit saving in several alternative formats. CSV (Comma-Separated Values) data may be read into spreadsheet programs, and can also be manipulated using a text editor. The formats of Octave, R and PcGive are designed for use with the respective programs. Gzip compression may be useful for large datasets. See <@pdf="the Gretl User's Guide"> for details on the various formats. 

The option flags <@opt="--omit-obs"> and <@opt="--no-header"> are applicable only when saving data in CSV format. By default, if the data are time series or panel, or if the dataset includes specific observation markers, the CSV file includes a first column identifying the observations (e.g. by date). If the <@opt="--omit-obs"> flag is given this column is omitted. The <@opt="--no-header"> flag suppresses the usual printing of the names of the variables at the top of the columns. 

The option of saving in gretl database format is intended to help with the construction of large sets of series, possibly having mixed frequencies and ranges of observations. At present this option is available only for annual, quarterly or monthly time-series data. If you save to a file that already exists, the default action is to append the newly saved series to the existing content of the database. In this context it is an error if one or more of the variables to be saved has the same name as a variable that is already present in the database. The <@opt="--overwrite"> flag has the effect that, if there are variable names in common, the newly saved variable replaces the variable of the same name in the original dataset. 

The <@opt="--comment"> option is available when saving data as a database or in CSV format. The required parameter is a double-quoted one-line string, attached to the option flag with an equals sign. The string is inserted as a comment into the database index file or at the top of the CSV output. 

Menu path: /File/Save data; /File/Export data

Script command: <@ref="store">

# system Estimation "Systems of equations"

In this window you can define a system of equations and choose an estimator for the system. Four sorts of statement may be given here, as follows: 

<indent>
• <@ref="equation">: specify an equation within the system. At least two such statements must be provided. 
</indent>

<indent>
• <@lit="instr">: for a system to be estimated via Three-Stage Least Squares, a list of instruments (by variable name or number). Alternatively, you can put this information into the <@lit="equation"> line using the same syntax as in the <@ref="tsls"> command. 
</indent>

<indent>
• <@lit="endog">: for a system of simultaneous equations, a list of endogenous variables. This is primarily intended for use with FIML estimation, but with Three-Stage Least Squares this approach may be used instead of giving an <@lit="instr"> list; then all the variables not identified as endogenous will be used as instruments. 
</indent>

<indent>
• <@lit="identity">: for use with FIML, an identity linking two or more of the variables in the system. This sort of statement is ignored when an estimator other than FIML is used. 
</indent>

Menu path: /Model/Simultaneous equations

Script command: <@ref="system">

# tobit Estimation "Tobit model"

Estimates a Tobit model, which may be appropriate when the dependent variable is "censored". For example, positive and zero values of purchases of durable goods on the part of individual households are observed, and no negative values, yet decisions on such purchases may be thought of as outcomes of an underlying, unobserved disposition to purchase that may be negative in some cases. 

By default it is assumed that the dependent variable is censored at zero on the left and is uncensored on the right. However you can use the entry boxes marked "left bound" and "right bound" to specify a different pattern of censoring. Enter either a numerical value or <@lit="NA"> for no censoring. 

The Tobit model is a special case of interval regression, which is supported via the <@ref="intreg"> command. 

Menu path: /Model/Nonlinear models/Tobit

Script command: <@ref="tobit">

# transpos Dataset "Transpose data"

Transposes the current data set. That is, each observation (row) in the current data set will be treated as a variable (column), and each variable as an observation. This command may be useful if data have been read from some external source in which the rows of the data table represent variables. 

See also <@ref="dataset">. 

Menu path: /Data/Transpose data

# tsls Estimation "Instrumental variables regression"

This command requires the selection of two lists of variables: the independent variables to appear in the given model and a set of instruments. Note that any exogenous regressors should appear in both lists. 

Output for two-stage least squares estimates includes the Hausman test and, if the model is over-identified, the Sargan over-identification test. In the Hausman test, the null hypothesis is that OLS estimates are consistent, or in other words estimation by means of instrumental variables is not really required. A model of this sort is over-identified if there are more instruments than are strictly required. The Sargan test is based on an auxiliary regression of the residuals from the two-stage least squares model on the full list of instruments. The null hypothesis is that all the instruments are valid, and suspicion is thrown on this hypothesis if the auxiliary regression has a significant degree of explanatory power. For a good explanation of both tests see chapter 8 of Davidson and MacKinnon (2004). 

For both TSLS and LIML estimation, an additional test result is shown provided that the model is estimated under the assumption of i.i.d. errors (that is, the <@opt="--robust"> option is not selected). This is a test for weakness of the instruments. Weak instruments can lead to serious problems in IV regression: biased estimates and/or incorrect size of hypothesis tests based on the covariance matrix, with rejection rates well in excess of the nominal significance level (Stock, Wright and Yogo, 2002). The test statistic is the first-stage <@itl="F">-test if the model contains just one endogenous regressor, otherwise it is the smallest eigenvalue of the matrix counterpart of the first stage <@itl="F">. Critical values based on the Monte Carlo analysis of Stock and Yogo (2003) are shown when available. 

The R-squared value printed for models estimated via two-stage least squares is the square of the correlation between the dependent variable and the fitted values. 

Menu path: /Model/Other linear models/Two-Stage Least Squares

Script command: <@ref="tsls">

# var Estimation "Vector Autoregression"

This command requires specification of: 

<indent>
• - the lag order, that is, the number of lags of each variable that should be included in the system; 
</indent>

<indent>
• - any exogenous variables (but note that a constant is included automatically unless you specify otherwise, a trend can be added using the trend checkbox, and seasonal dummy variables can be added using the seasonals checkbox); and 
</indent>

<indent>
• - a list of endogenous variables, lags of which will be included on the right-hand side of each equation (note: do not include lagged variables in this list -- they will be added automatically). 
</indent>

A separate regression will be run for each variable in the system. Output for each equation includes F-tests for zero restrictions on all lags of each of the variables and an F-test for the maximum lag, along with (optionally) forecast variance decompositions and impulse response functions. 

Forecast variance decompositions and impulse responses are based on the Cholesky decomposition of the contemporaneous covariance matrix, and in this context the order in which the (stochastic) variables are given matters. The first variable in the list is assumed to be "most exogenous" within-period. The horizon for variance decompositions and impulse responses can be set using the <@ref="set"> command. 

Menu path: /Model/Time series/Vector autoregression

Script command: <@ref="var">

# VAR-lagselect Tests "VAR lag-length selection"

In this dialog box you specify a VAR as usual, but use the lag order spin button to set the maximum number of lags to test. 

Output will consist of a table showing the values of the Akaike (AIC), Schwartz (BIC) and Hannan–Quinn (HQC) information criteria computed from VARs of order 1 to the chosen maximum. This is intended to help with the selection of the optimal lag order. 

# VAR-omit Tests "Test exogenous variables in VAR"

Use this dialog box to specify a subset of exogenous variables in a VAR. These variables will be omitted from the original VAR, and the system re-estimated. 

A Likelihood Ratio test is reported, where the null hypothesis is that the true parameter values are zero, in all equations of the VAR, for the omitted variables. The test is based on the difference between the log-determinant of the variance matrix for the unrestricted system, and that for the restricted system with the selected variables omitted. 

# vartest Tests "Difference of variances"

Calculates the <@itl="F"> statistic for the null hypothesis that the population variances are equal for the two selected variables, and shows its p-value. 

Menu path: /Model/Bivariate tests/Difference of variances

Script command: <@ref="vartest">

# vecm Estimation "Vector Error Correction Model"

A VECM is a form of vector autoregression or VAR (see <@ref="var">), applicable where the variables in the model are individually integrated of order 1 (that is, are random walks, with or without drift), but exhibit cointegration. This command is closely related to the Johansen test for cointegration (see <@ref="coint2">). 

The lag order selected in the VECM dialog box is that of the VAR system. The number of lags in the VECM itself (where the dependent variable is given as a first difference) is one less than this number. 

The "cointegration rank" represents the number of cointegrating vectors. This must be greater than zero and less than or equal to (generally, less than) the number of endogenous variables selected. 

In the "Endogenous variables" box you select the vector of endogenous variables, in levels. The inclusion of deterministic terms in the model is controlled by the option buttons. The default is to include an "unrestricted constant", which allows for the presence of a non-zero intercept in the cointegrating relations as well as a trend in the levels of the endogenous variables. In the literature stemming from the work of Johansen (see for example his 1995 book) this is often referred to as "case 3". The other four options produce cases 1, 2, 4 and 5 respectively. The meaning of these cases and the criteria for selecting a case are explained in <@pdf="the Gretl User's Guide">. 

In the "Exogenous variables" box you may add specific exogenous variables. By default these enter the model in unrestricted form (indicated by a <@lit="U"> next to the name of the variable). If you want a certain exogenous variable to be restricted to the cointegrating space, right-click on it and select "Restricted" from the pop-up menu. The symbol next to the variable will change to R. 

If the data are quarterly or monthly, a check box is shown that allows you to include a set of centered seasonal dummy variables. In all cases, an additional check box ("Show details") allows for the printing of the auxiliary regressions that form the starting point of the Johansen maximum likelihood estimation procedure. 

Menu path: /Model/Time series/VECM

Script command: <@ref="vecm">

# wls Estimation "Weighted Least Squares"

Let "wtvar" denote the variable selected in the "Weight variable" box. An OLS regression is run, where the dependent variable is the product of the positive square root of wtvar and the selected dependent variable, and the independent variables are also multiplied by the square root of wtvar. Statistics such as <@itl="R">-squared are based on the weighted data. If wtvar is a dummy variable, weighted least squares estimation is equivalent to eliminating all observations with value zero for wtvar. 

Menu path: /Model/Other linear models/Weighted Least Squares

Script command: <@ref="wls">

# working-dir Utilities "Working directory"

The "working directory" is where gretl looks by default when reading or writing data files or scripts via the file Open and Save dialogs. 

In addition the working directory is the default location for 

<indent>
• reading files via the script commands <@lit="append">, <@lit="open">, <@lit="run"> and <@lit="include">; and 
</indent>

<indent>
• writing files via the commands <@lit="eqnprint">, <@lit="tabprint">, <@lit="gnuplot">, <@lit="outfile"> and <@lit="store">. 
</indent>

The option of having gretl use the current directory (as determined via the shell) at start-up may be useful to people who are in the habit of launching gretl from a command prompt rather than a menu or icon. 

This dialog also allows you to set the behavior of the GUI file selector: when you open or save a file in a given folder, should the selector remember and return to the same folder on the next invocation? Or should the selector always visit the chosen working directory? 

Menu path: /File/Working directory

# x12a Utilities "X-12-ARIMA"

There are two procedural options here, controlled by the lower set of radio-buttons. 

If you select "Execute X-12-ARIMA directly" then gretl writes a command file for X-12-ARIMA and calls the x12a program to execute the commands. In this case you have the option of producing a graph and/or saving selected output series to the gretl dataset. 

If you select "Make X-12-ARIMA command file" gretl writes a command file for X-12-ARIMA, as above, but then opens this file in an editor window. In that window you are able to make changes and to save the file under a chosen name. You are also able to send the file for execution by x12a (by clicking the "Run" button on the editor window toolbar) and view the output. But in this case you do not have the option of saving data as gretl series or producing a gretl graph. 

# xcorrgm Statistics "Cross-correlogram"

Prints and graphs the cross-correlogram for variables <@var="var1"> and <@var="var2">, which may be specified by name or number. The values are the sample correlation coefficients between the current value of <@var="var1"> and successive leads and lags of <@var="var2">. 

If an <@var="order"> value is specified the length of the cross-correlogram is limited to at most that number of leads and lags, otherwise the length is determined automatically, as a function of the frequency of the data and the number of observations. 

Menu path: /View/Cross-correlogram
Other access: Main window pop-up menu (multiple selection)

Script command: <@ref="xcorrgm">

# xtab Statistics "Cross-tabulate variables"

Displays a contingency table or cross-tabulation for each combination of the selected variables. Note that all the variables must be discrete. 

By default, frequency count values are shown in the cells and on the margins of the table. However, you can choose to display either row or column percentages instead. 

By default, cells with a zero count are shown as empty, but you can choose to show zero values explicitly. 

Pearson's chi-square test for independence is displayed if the expected frequency under independence is at least 1.0e-7 for all cells. A common rule of thumb for the validity of this statistic is that at least 80 percent of cells should have expected frequencies of 5 or greater; if this criterion is not met a warning is printed. 

If the contingency table is 2 by 2, Fisher's Exact Test for independence is computed. Note that this test is based on the assumption that the row and column totals are fixed, which may or may not be approriate depending on how the data were generated. The left p-value should be used when the alternative to independence is negative association (values tend to cluster in the lower left and upper right cells); the right p-value should be used if the alternative is positive association. The two-tailed p-value for this test is calculated by method (b) in section 2.1 of Agresti (1992): it is the sum of the probabilities of all possible tables having the given row and column totals and having a probability less than or equal to that of the observed table. 

Script command: <@ref="xtab">