X
AAT Bioquest

AAT Bioquest

Logistic Regression (Logit) Calculator

Logistic regression (aka logit regression or logit model) is a non-linear statistical analysis for a categorical response (dependent variable), which takes two values: ‘0’ and ‘1’ and represents an outcome such as success/failure. The technique is useful in estimating the relationship of a categorical response to one or more independent variables and thus, predicting a qualitative response. The estimations for beta coefficients, p values, standard errors, log likelihood, residual deviance, null deviance, and AIC are obtained from fitting a logistic regression model by transforming the binomial data into linearity. Applications of logistic models range from studying toxicity effects and cancer detection problems to predicting forecast relations and several other dichotomic outcomes.

How to use this tool

1. Enter the data into the box on the right. Data can be directly copied from Excel or pasted as values in comma-separated, tab-separated, or space-separated formats. If the data is being entered manually, only place one value per line. The format should be as follows:

 Dependent Variable Independent Variable … X1 Y1 … X2 Y2 … X3 Y3 … X4 Y4 … … … …

Place the dependent variable into the first column. This has to be a binary value (1 or 0). The subsequent columns (Data Set 2 and onwards) should hold independent variables. Users can insert up to ten independent variables per analysis. To add a new data set, press on the ‘+’ tab above the data entry area. Variables can be named by double clicking the tab but is optional.

2. Verify your data is accurate in the table that appears.

3. Press the "Calculate Logistic Regression" button to display results. Each dataset will generate an output in the form of a summary table comprising of beta coefficients, p values, standard errors, log likelihood, and so forth.

4. In the probability equation solver, insert values for independent variables in the input boxes to obtain probability of the event occurring. Click on the box for the independent variable you want to observe. This plots the probability distribution of the dependent variable with respect to that independent variable while keeping the remaining parameters fixed.

Data Entry

+

The linear probability model operates under the assumption that the conditional probability function is linear and serves to predict dependent variables that are continuous. However, in the case of a binary classification where the dependent variable only takes two values (0 or 1), this can be a major flaw since the resulting model might not restrict the predicted response values within 0 and 1. To circumvent this difficulty, a non-linear approach to statistically model the conditional probability function of a dichotomous variable was developed with one commonly known as the logistic regression.

Logistic regression has applications in various fields, including machine learning, medical sciences, and social sciences. It is a type of regression analysis used to evaluate binomial response variables. The dichotomous dependent variable (Y) with the discrete values 0 and 1 is first mapped onto the interval [0, 1], that is, given any real value it produces a number (probability) between 0 and 1. This is further transformed into a continuous variable Y' on the half-line [0, +∞) with a cumulative distribution function. As a final step, the log of the odds ratio is taken, which completes the mapping by moving the half-line [0, +∞) to all of real line (+∞, -∞). As a rule of thumb, probability ranges from 0 to 1, odds range from 0 to ∞, and log odds range from –∞ to +∞.

The logistic regression models the log odds of the event using the following relationship:

Zi = ln (Pi/1-Pi) = B0+B1x1+B2x2+B3x3+…+Bnxn

where Zi is logit(Pi), Pi is the probability of the event occurring, Bi is the beta coefficient, and xi is the independent variable. To obtain the probability of the event from the log odds of the event, the equation can be converted as follows:

P = 1/(1+e^-(B0+B1x1+B2x2+B3x3+…+Bnxn))

The estimations for beta coefficients, p values, standard errors, log likelihood, residual deviance, null deviance, and AIC are generated in the process of fitting a logistic regression model by transforming the sigmoid response curve to a straight line that can be analyzed either through least squares or maximum likelihood. The exponential beta values (Bi) are interpreted with respect to the reference category and represent the degree of change in the outcome variable for every 1-unit of change in the predictor/independent variable. Similarly, the p-value for each independent variable tests the null hypothesis that the variable has no correlation with the dependent variable. A low p-value (<0.05) is suggestive of statistical insignificance and typically leads to rejecting the null hypothesis, which is then used to determine which terms to keep in the regression model. The standard errors provide insight into the average distance that the observed values fall from the regression line and signify the model's goodness of fit. Moreover, the z-value (aka z-statistic) is calculated by dividing the regression coefficient by its standard error. A large z-value indicates that the corresponding true regression coefficient is not 0 and the corresponding independent variable matters. Negative values suggest that the odds ratio is smaller than 1, and that the odds of the test groups are lower than the odds of the reference groups. The null deviance shows how well the response variable is predicted by the model with nothing but an intercept, whereas the residual deviance takes into account all the predictors. A small null deviance suggests that the null model explains the data pretty well. Finally, AIC stands for Akaike information criterion and is a technique for determining sample fit to estimate the likelihood of the model in predicting future values. At low AIC values, a more parsimonious model is obtained for comparisons within models that are fit to the same response.

The density function associated with a logit regression is very close to a standard normal distribution and is known to produce a better fit in the presence of extreme independent variable levels. It is important to note that the logistic regression model itself does not perform statistical classification. The test is utilized in modeling probability of an outcome based on the independent variables. Nevertheless, it can be used as a classifier by setting a cutoff value and classifying inputs with probability greater than the cutoff as a single group and below the cutoff as the other.

Feedback

Have a question or a feature request about this tool? Feel free to reach out to us and let us know! We're always looking for ways to improve!

References

MLA

"Quest Graph™ Logistic Regression (Logit) Calculator." AAT Bioquest, Inc.2 Mar2024https://www.aatbio.com/tools/logistic-regression-logit-calculator.

APA

AAT Bioquest, Inc. (2024March 2). Quest Graph™ Logistic Regression (Logit) Calculator. AAT Bioquest. https://www.aatbio.com/tools/logistic-regression-logit-calculator.
BibTeXEndNoteRefMan

This online tool has been cited in 1 publications, including

A novel, non-invasive method to diagnose active eosinophilic esophagitis, combining clinical data and oral cavity RNA levels
Authors: Sebastian-delaCruz, Maialen and Garcia-Etxebarria, Koldo and Bilbao, Jose Ram{\'o}n and Lucendo, Alfredo J and Bujanda, Luis and Castellanos-Rubio, Ainara
Journal: Clinical Gastroenterology and Hepatology (2023)