Python implementation of multivariate linear regression. It supports both nominal and categorical variables and implicitly drop null values in data. Both single-node and distributed mode return JSON with structure such as
{
'agegroup_50-59y': {
'coef': 3.2571304466,
'p_values': 0.7387901953,
'std_err': 9.5993224941,
't_values': 0.3393083677
},
'intercept': {
'coef': 1042.2837545842,
'p_values': 0.0,
'std_err': 45.1479998776,
't_values': 23.0859342033
},
...
}
Multinominal logistic regression implemented as a log-linear model by fitting logistic regressions on one class versus the others. Only single-node mode is supported, for distributed mode use SGD regression.
The output is JSON where each category has its own coefficients
{
'AD': {
'agegroup_50-59y': {
'coef': 3.2571304466,
'p_values': 0.7387901953,
'std_err': 9.5993224941,
't_values': 0.3393083677
},
'intercept': {
'coef': 1042.2837545842,
'p_values': 0.0,
'std_err': 45.1479998776,
't_values': 23.0859342033
},
...
},
'CN': ...
}
Regression coefficients and statistics are calculated using statsmodels package.
docker run python-linear-regression compute
Aggregation mode pools the local betas and XtX matrices, constructs normal equations from these blocks and uses them to calculate aggregated betas (see original R implementation). Calculated betas are identical to the single-node mode, however standard errors, t-statistics and p-values are estimated from the local standard errors and might differ from the single-node case. This is because we do not have residuals available in the aggregation step and therefore cannot compute standard error of the residuals. In order to do that, we would have to propagate aggregate betas back to nodes, recalculate standard error there and perform one more aggregation step.
It has two modes
compute --mode intermediate
compute --mode aggregate --job-ids 1 2 3
Intermediate mode returns the same output as a single-node mode and aggregate mode combines these outputs into single estimate.
Run: ./build.sh
Run: ./tests/test.sh
Run: ./publish.sh