Output Options Example References
INTERVAL estimates a model like the linear Ordered Probit model, but where the limits are known. Unlike Ordered Probit, the limits may be different for different observations. INTERVAL is also similar to two-limit Tobit, with the difference that when the dependent variable is between the upper and lower bounds, only that fact is observed and not its actual value. INTERVAL is useful when the dependent variable is in a known range, but the actual value has been censored for confidentiality reasons.
INTERVAL (LOWER=<lowerlimit>,UPPER=<upperlimit>, nonlinear options) <dependent variable> <list of independent variables> ;
Usage
The basic INTERVAL statement is like the OLSQ statement: first list the dependent variable and then the independent variables. If you wish to have an intercept term in the regression (usually recommended), include the special variable C or CONSTANT in your list of independent variables. You may have as many independent variables as you like subject to the overall limits on the number of arguments per statement and the amount of working space, as well as the number of data observations you have available.
The LOWER= and UPPER= options are required. Normally these will be series with the lower and upper limits for each observation. The dependent variable should be coded so that it lies between its two bounds. For example, if category=3 means that the dependent variable (Y) is between 10 and 20, then Y should be coded so that 10<Y<20. The lower and upper limits for this observation will take the values 10 and 20. See the examples below.
The observations over which the regression is computed are determined by the current sample. If any of the observations have missing values within the current sample, INTERVAL will print a warning message and will drop those observations.
The list of independent variables on the INTERVAL command may include variables with explicit lags and leads as well as PDL (Polynomial Distributed Lag) variables. These distributed lag variables are a way to reduce the number of free coefficients when entering a large number of lagged variables in a regression by imposing smoothness on the coefficients. See the PDL section for a description of how to specify such variables.
The output of INTERVAL begins with an equation title and the usual starting values and diagnostic output from the iterations. Final convergence status is printed. After convergence, the number of observations, the value of the log likelihood, and the Schwarz-Bayes information criterion are printed. This is followed by observation counts for lower, upper, and double bounded observations, and the usual table of right hand side variable names, estimated coefficients, standard errors and associated t-statistics.
INTERVAL also stores some of these results in data storage for later use. The table below lists the results available after an INTERVAL command.
|
variable |
type |
length |
description |
|
@LHV |
list |
1 |
Name of dependent variable |
|
@RNMS |
list |
#vars |
List of names of right hand side variables |
|
@IFCONV |
scalar |
1 |
=1 if convergence achieved, 0 otherwise |
|
@YMEAN |
scalar |
1 |
Mean of the dependent variable |
|
@NOB |
scalar |
1 |
Number of observations |
|
@LOGL |
scalar |
1 |
Log of likelihood function |
|
@AIC |
scalar |
1 |
Akaike information criterion |
|
@SBIC |
scalar |
1 |
Schwarz Bayesian information criterion |
|
@NCOEF |
scalar |
1 |
Number of independent variables (#vars) |
|
@NCID |
scalar |
1 |
Number of identified coefficients |
|
@COEF |
vector |
#vars |
Coefficient estimates |
|
@SES |
vector |
#vars |
Standard errors |
|
@T |
vector |
#vars |
T-statistics |
|
%T |
vector |
#vars |
p-values for T-statistics |
|
@GRAD |
vector |
#vars |
Gradient of log likelihood at convergence |
|
@VCOV |
matrix |
#vars* #vars |
Variance-covariance of estimated coefficients |
|
@FIT |
series |
#obs |
Fitted values of dependent variable |
If the regression includes a PDL or SDL variable, the following will also be stored:
|
@SLAG |
scalar |
1 |
Sum of the lag coefficients |
|
@MLAG |
scalar |
1 |
Mean lag coefficient (number of time periods) |
|
@LAGF |
vector |
#lags |
Estimated lag coefficients, after "unscrambling" |
Method
Like the binary and ordered Probit models, the Interval model is based on an unobserved continuous dependent variable (y*). The model is
y* = XB + e.
Instead of y*, we observe a category value Y, which implies that y* lies between known limits, where the limits may include minus or plus infinity. In the usual application the set of possible limits are the same for all observations but this is not necessary. The underlying model is the same as that used for Ordered Probit (that is, e is assumed to be normally distributed), but with known limits.
For example, suppose there are 3 categories, with category values 0, 1, and 2, where the first and the last are open-ended. The model is
Y = 0 if MU0 <= XB + e < MU1 (MU0 = -infinity, MU1 a known value)
Y = 1 if MU1 <= XB + e < MU2 (MU2 a known value)
Y = 2 if MU2 <= XB + e < MU3 (Note: MU3 = infinity)
The terms in the likelihood function for observations with each of the values 0, 1, or 2 are the following:

where Φ(.) and φ(.) denote the cumulative normal distribution and normal density respectively. INTERVAL uses analytic first and second derivatives to obtain maximum likelihood estimates via the Newton-Raphson algorithm. This algorithm usually converges fairly quickly. TSP uses zeros for starting parameter values. @START can be used to provide different starting values (see NONLINEAR). Multicollinearity of the independent variables is handled with generalized inverses, as in the other linear and nonlinear regression procedures in TSP.
If you wish to estimate a nonstandard ordered probit model (e.g. adjusted for heteroskedasticity or with a nonlinear regression function), use the ML command. See the website for examples of how to do this.
LOWER= scalar or series containing lower bounds (required).
UPPER= scalar or series containing upper bounds (required).
The usual nonlinear options are available - see the NONLINEAR section of this manual.
A simple example, showing how to estimate a binary Probit model using PROBIT and INTERVAL with scalars as the lower and upper bounds for the dependent variable.
PROBIT D C X1-X8 ; ? Probit estimation, D=0 or 1.
Q = 2*D-1 ; ? redefine dep variable to be (-1,0)
INTERVAL (LOWER=0,UPPER=0) Q C X1-X8 ;
A more complex example, where there are 4 categories (<40, 40 to 50, 50 to 60, and >60), showing how to code the lower and upper bounds and the dependent variables. YCAT takes on the values 1 to 4 corresponding to the four categories.
yrec = 35*(ycat=1)+45*(ycat=2)+55*(ycat=3)+65*(ycat=4) ;
ylo = 40*(ycat=1)+40*(ycat=2)+50*(ycat=3)+60*(ycat=4) ;
yhi = 40*(ycat=1)+50*(ycat=2)+60*(ycat=3)+60*(ycat=4) ;
interval (lower=ylo,upper=yhi) yrec c x1-x8 ;
Note that by coding the upper and lower limits to be equal for YCAT=1 and YCAT=4 we have specified that they represent a single bound (upper in the case of YCAT=1 and lower in the case of YCAT=4).
Verbeek, Marno, A Guide to Modern Econometrics, Wiley, 2000, pp. 189-193.