Interaction term in a non-linear model

In a non-linear model (for example, logit or poisson model), the interpretation of the coefficient on the interaction term is tricky. Ai and Norton (2003) points out that the interaction term coefficient is not the same as people can interpret as in a linear model; that is, how much effect of $x 1$ changes with the value of $x 2$ . They interpret this as a cross

If we have a linear model with interaction:

$E (y) = β_{1} x_{1} + β_{2} x_{2} + β_{12} x_{1} * x_{2}$

Then, the marginal effect

$\frac{\partial^{2} E (y)}{\partial x_{1} \partial x_{2}} = β_{12}$

That is, $β_{12}$ is the second derivative of $E (y)$ on $x_{1}$ and $x_{2}$ . The marginal effect of $x_{1}$

In a non-linear model,

$F (E (y)) = β_{1} x_{1} + β_{2} x_{2} + β_{12} x_{1} * x_{2}$

$\frac{\partial^{2} F (E (y))}{\partial x_{1} \partial x_{2}} = β_{12}$

Here, the partial derivative of $F (E (y))$ on $x_{1}$ and $x_{2}$ is still $β_{12}$ . However, most people are interested in $\frac{\partial^{2} E (y)}{\partial x_{1} \partial x_{2}}$ .

$\frac{\partial^{2} E (y)}{\partial x_{1} \partial x_{2}} = β_{12} G^{'} () + (β_{1} + β_{12} x_{2}) (β_{2} + β_{12} x_{1}) G^{″} ()$

where $G ()$ is the inverse function of $F ()$ .

It is true that in a non-linear model with interaction, the marginal effect of $x_{1}$ differs with different values of $x_{2}$ . However, even if we have a non-linear model without interaction, the marginal effect of $x_{1}$ is still different with different values of $x_{2}$ . To see this,

$F (E (y)) = β_{1} x_{1} + β_{2} x_{2}$

$\frac{\partial^{2} E (y)}{\partial x_{1} \partial x_{2}} = (β_{1} β_{2}) G^{″} ()$

Therefore, when we set up our model,

$F (E (y)) = β_{1} x_{1} + β_{2} x_{2} + β_{12} x_{1} * x_{2}$

we have in mind that we allow interaction of $x_{1}$ and $x_{1}$ to interact for the effect on $F (E (y))$ ; not on $E (y)$ .

We agree with Bill Greene, 2013. In a nonlinear model, the partial effects (as Greene calls it) is nonlinear, regardless of the model. For example, in a logit model, even if you don’t have an interaction term in your model, the effect of $x_{1}$ will still be different for every value of $x_{2}$ , simply because it’s a nonlinear model.

As Greene put it at the summary section, “Build the model based on appropriate statistical procedures and principles. Statistical testing about the model specification is done at this step Hypothesis tests are about model coefficients and about the structural aspects of the model specifications. Partial effects are neither coefficients nor elements of the specification of the model. They are implications of the specified and estimated model.”

We also agree with Maarten Buis 2010, that we should use multiplicative effect in a non-linear model. That is, in a non-linear model,

$F (E (y)) = β_{1} x_{1} + β_{2} x_{2} + β_{12} x_{1} * x_{2}$

We should pay more attention to

$\frac{\partial^{2} F (E (y))}{\partial x_{1} \partial x_{2}} = β_{12}$

For example, in a logit model,

$l o g (P (y = 1) / (1 - P (y = 1))) = β_{1} x_{1} + β_{2} x_{2} + β_{12} x_{1} * x_{2}$

That is, the log of odds is a linear function of $x_{1}$ and $x_{2}$ and interaction. The interaction effect has the same interpretation as the linear model, in terms of log of odds.

Or, it becomes multiplicative effect when we talk about odds ratios. Stata’s “margins” command is a great tool to calculate marginal effects in various situations, as shown in Maarten Buis 2010.