INTERVAL

Output     Options     Example     References

INTERVAL estimates a model like the linear Ordered Probit model, but where the limits are known. Unlike Ordered Probit, the limits may be different for different observations. INTERVAL is also similar to two-limit Tobit, with the difference that when the dependent variable is between the upper and lower bounds, only that fact is observed and not its actual value. INTERVAL is useful when the dependent variable is in a known range, but the actual value has been censored for confidentiality reasons.

INTERVAL (LOWER=<lowerlimit>,UPPER=<upperlimit>, nonlinear options) <dependent variable> <list of independent variables> ;

Usage

The basic INTERVAL statement is like the OLSQ statement: first list the dependent variable and then the independent variables. If you wish to have an intercept term in the regression (usually recommended), include the special variable C or CONSTANT in your list of independent variables. You may have as many independent variables as you like subject to the overall limits on the number of arguments per statement and the amount of working space, as well as the number of data observations you have available.

The LOWER= and UPPER= options are required. Normally these will be series with the lower and upper limits for each observation. The dependent variable should be coded so that it lies between its two bounds. For example, if category=3 means that the dependent variable (Y) is between 10 and 20, then Y should be coded so that 10<Y<20. The lower and upper limits for this observation will take the values 10 and 20. See the examples below.

The observations over which the regression is computed are determined by the current sample. If any of the observations have missing values within the current sample, INTERVAL will print a warning message and will drop those observations.

The list of independent variables on the INTERVAL command may include variables with explicit lags and leads as well as PDL (Polynomial Distributed Lag) variables. These distributed lag variables are a way to reduce the number of free coefficients when entering a large number of lagged variables in a regression by imposing smoothness on the coefficients. See the PDL section for a description of how to specify such variables.

Output

The output of INTERVAL begins with an equation title and the usual starting values and diagnostic output from the iterations. Final convergence status is printed. After convergence, the number of observations, the value of the log likelihood, and the Schwarz-Bayes information criterion are printed. This is followed by observation counts for lower, upper, and double bounded observations, and the usual table of right hand side variable names, estimated coefficients, standard errors and associated t-statistics.

INTERVAL also stores some of these results in data storage for later use. The table below lists the results available after an INTERVAL command.

variable

 type

length

  description

@LHV

list

1

Name of dependent variable

@RNMS

list

 #vars

List of names of right hand side variables

@IFCONV

scalar

1

=1 if convergence achieved, 0 otherwise

@YMEAN

scalar

1

Mean of the dependent variable

@NOB

scalar

1

Number of observations

@LOGL

scalar

1

Log of likelihood function

@AIC

scalar

1

Akaike information criterion

@SBIC

scalar

1

Schwarz Bayesian information criterion

@NCOEF

scalar

1

Number of independent variables (#vars)

@NCID

scalar

1

Number of identified coefficients

@COEF

vector

#vars

Coefficient estimates

@SES

vector

#vars

Standard errors

@T

vector

#vars

T-statistics

%T

vector

#vars

p-values for T-statistics

@GRAD

vector

#vars

Gradient of log likelihood at convergence

@VCOV

matrix

#vars* #vars

Variance-covariance of estimated coefficients

@FIT

series

#obs

Fitted values of dependent variable

If the regression includes a PDL or SDL variable, the following will also be stored:

@SLAG

scalar

1

Sum of the lag coefficients

@MLAG

scalar

1

Mean lag coefficient (number of time periods)

@LAGF

vector

#lags

Estimated lag coefficients, after "unscrambling"

Method

Like the binary and ordered Probit models, the Interval model is based on an unobserved continuous dependent variable (y*). The model is

y* = XB + e.

Instead of y*, we observe a category value Y, which implies that y* lies between known limits, where the limits may include minus or plus infinity. In the usual application the set of possible limits are the same for all observations but this is not necessary. The underlying model is the same as that used for Ordered Probit (that is, e is assumed to be normally distributed), but with known limits.  

For example, suppose there are 3 categories, with category values 0, 1, and 2, where the first and the last are open-ended. The model is

Y = 0 if MU0 <= XB + e < MU1 (MU0 = -infinity, MU1 a known value)

Y = 1 if MU1 <= XB + e < MU2 (MU2 a known value)

Y = 2 if MU2 <= XB + e < MU3 (Note: MU3 = infinity)

The terms in the likelihood function for observations with each of the values 0, 1, or 2 are the following:

where Φ(.) and φ(.) denote the cumulative normal distribution and normal density respectively. INTERVAL uses analytic first and second derivatives to obtain maximum likelihood estimates via the Newton-Raphson algorithm. This algorithm usually converges fairly quickly. TSP uses zeros for starting parameter values. @START can be used to provide different starting values (see NONLINEAR). Multicollinearity of the independent variables is handled with generalized inverses, as in the other linear and nonlinear regression procedures in TSP.

If you wish to estimate a nonstandard ordered probit model (e.g. adjusted for heteroskedasticity or with a nonlinear regression function), use the ML command. See the website for examples of how to do this.

Options

LOWER= scalar or series containing lower bounds (required).

UPPER= scalar or series containing upper bounds (required).

The usual nonlinear options are available - see the NONLINEAR section of this manual.

Example

A simple example, showing how to estimate a binary Probit model using PROBIT and INTERVAL with scalars as the lower and upper bounds for the dependent variable.

PROBIT D C X1-X8 ;    ? Probit estimation, D=0 or 1.

Q = 2*D-1 ;                  ? redefine dep variable to be (-1,0)

INTERVAL (LOWER=0,UPPER=0) Q C X1-X8 ;

A more complex example, where there are 4 categories (<40, 40 to 50, 50 to 60, and >60), showing how to code the lower and upper bounds and the dependent variables. YCAT takes on the values 1 to 4 corresponding to the four categories.

yrec = 35*(ycat=1)+45*(ycat=2)+55*(ycat=3)+65*(ycat=4) ;

ylo = 40*(ycat=1)+40*(ycat=2)+50*(ycat=3)+60*(ycat=4) ;

yhi = 40*(ycat=1)+50*(ycat=2)+60*(ycat=3)+60*(ycat=4) ;

interval (lower=ylo,upper=yhi) yrec c x1-x8 ;

Note that by coding the upper and lower limits to be equal for YCAT=1 and YCAT=4 we have specified that they represent a single bound (upper in the case of YCAT=1 and lower in the case of YCAT=4).

Reference

Verbeek, Marno, A Guide to Modern Econometrics, Wiley, 2000, pp. 189-193.