Problem 1.Transformation of factors with splines
Mathematical Problem Statement
Problem 2.Removing factors with a small importance (increment) using maximum likelihood in LR
Mathematical Problem Statement
Problem 3. Removing factors by adding cardinality constraint to LR optimization problem
Mathematical Problem Statement
Problem 4. Model verification with 4-fold cross-validation
Mathematical Problem Statement
This case study demonstrates binary classification with a large number of factors (independent variables). Classification algorithm is based on the Logistic Regression (LR) and includes four main steps:
Step 1) Factors are transformed by spline approximation using maximum likelihood in LR.
Step 2) Removing factors with a small importance (increment) using maximum likelihood in LR.
Step 3) Removing factors by adding cardinality constraint to LR optimization problem.
Step 4) Model verification with 4-fold cross-validation.
Transformation of factors with splines
Maximize logexp_sum(spline_sum)
Value:
logistic(spline_sum)
where
logexp_sum = log-likelihood function for logistic regression (Logarithms Exponents Sum)
spline_sum = spline Sum calculates spline values depending upon regression variables for every observation (scenario)
logistic = calculates values of logistic function for every observation
Mathematical Problem Statement
Solution in MATLAB Environment
Solved with PSG MATLAB function tbpsg_run (PSG Subroutine Interface):
Input Files to run CS:
Removing factors with a small importance (increment) using maximum likelihood in Logistic Regression
Maximize logexp_sum
Value:
logistic
where
logexp_sum = log-likelihood function for logistic regression (Logarithms Exponents Sum)
logistic = calculates values of logistic function for every observation (scenario)
Mathematical Problem Statement
Solution in MATLAB Environment
Solved with PSG MATLAB function tbpsg_run (PSG Subroutine Interface):
Input Files to run CS:
Removing factors by adding cardinality constraint to Logistic Regression optimization problem
maximize
logexp_sum
Constraint: <= upper_bound
cardn
Value:
logistic
where
logexp_sum = log-likelihood function for logistic regression (Logarithms Exponents Sum)
cardn = cardinality function
logistic = calculates values of logistic function for every observation (scenario)
Mathematical Problem Statement
Solution in MATLAB Environment
Solved with PSG MATLAB function tbpsg_run (PSG Subroutine Interface):
Input Files to run CS:
Model verification with 4-fold cross-validation
4-fold crossvalidation
Maximize logexp_sum
Value:
logistic
where
crossvalidation(N,Matrix) = matrix operation splits input Matrix into N pairs of complementary sub-matrices
logexp_sum = log-likelihood function for logistic regression (Logarithms Exponents Sum)
logistic = calculates values of logistic function for every observation (scenario)
Mathematical Problem Statement
Solution in MATLAB Environment
Solved with PSG MATLAB function tbpsg_run (PSG Subroutine Interface):
Input Files to run CS: