The linear probability model operates under the assumption that the conditional probability function is linear and serves to predict dependent variables that are continuous. However, in the case of a binary classification where the dependent variable only takes two values (0 or 1), this can be a major flaw since the resulting model might not restrict the predicted response values within 0 and 1. To circumvent this difficulty, a non-linear approach to statistically model the conditional probability function of a dichotomous variable was developed with one commonly known as the logistic regression.
Logistic regression has applications in various fields, including machine learning, medical sciences, and social sciences. It is a type of regression analysis used to evaluate binomial response variables. The dichotomous dependent variable (Y) with the discrete values 0 and 1 is first mapped onto the interval [0, 1], that is, given any real value it produces a number (probability) between 0 and 1. This is further transformed into a continuous variable Y' on the half-line [0, +∞) with a cumulative distribution function. As a final step, the log of the odds ratio is taken, which completes the mapping by moving the half-line [0, +∞) to all of real line (+∞, -∞). As a rule of thumb, probability ranges from 0 to 1, odds range from 0 to ∞, and log odds range from –∞ to +∞.
The logistic regression models the log odds of the event using the following relationship:
Zi = ln (Pi/1-Pi) = B0+B1x1+B2x2+B3x3+…+Bnxn
where Zi is logit(Pi), Pi is the probability of the event occurring, Bi is the beta coefficient, and xi is the independent variable. To obtain the probability of the event from the log odds of the event, the equation can be converted as follows:
P = 1/(1+e^-(B0+B1x1+B2x2+B3x3+…+Bnxn))
The estimations for beta coefficients, p values, standard errors, log likelihood, residual deviance, null deviance, and AIC are generated in the process of fitting a logistic regression model by transforming the sigmoid response curve to a straight line that can be analyzed either through least squares or maximum likelihood. The exponential beta values (Bi) are interpreted with respect to the reference category and represent the degree of change in the outcome variable for every 1-unit of change in the predictor/independent variable. Similarly, the p-value for each independent variable tests the null hypothesis that the variable has no correlation with the dependent variable. A low p-value (<0.05) is suggestive of statistical insignificance and typically leads to rejecting the null hypothesis, which is then used to determine which terms to keep in the regression model. The standard errors provide insight into the average distance that the observed values fall from the regression line and signify the model's goodness of fit. Moreover, the z-value (aka z-statistic) is calculated by dividing the regression coefficient by its standard error. A large z-value indicates that the corresponding true regression coefficient is not 0 and the corresponding independent variable matters. Negative values suggest that the odds ratio is smaller than 1, and that the odds of the test groups are lower than the odds of the reference groups. The null deviance shows how well the response variable is predicted by the model with nothing but an intercept, whereas the residual deviance takes into account all the predictors. A small null deviance suggests that the null model explains the data pretty well. Finally, AIC stands for Akaike information criterion and is a technique for determining sample fit to estimate the likelihood of the model in predicting future values. At low AIC values, a more parsimonious model is obtained for comparisons within models that are fit to the same response.
The density function associated with a logit regression is very close to a standard normal distribution and is known to produce a better fit in the presence of extreme independent variable levels. It is important to note that the logistic regression model itself does not perform statistical classification. The test is utilized in modeling probability of an outcome based on the independent variables. Nevertheless, it can be used as a classifier by setting a cutoff value and classifying inputs with probability greater than the cutoff as a single group and below the cutoff as the other.