Logistic Regression with Factors Transformed by Splines and Approaches for Dimension Reduction

 

Background

Problem 1.Transformation of factors with splines

Simplified Problem Statement

Mathematical Problem Statement

Solution in MATLAB Toolbox

Problem 2.Removing factors with a small importance (increment) using maximum likelihood in LR

Simplified Problem Statement

Mathematical Problem Statement

Solution in MATLAB Toolbox

Problem 3. Removing factors by adding  cardinality constraint to LR optimization problem

Simplified Problem Statement

Mathematical Problem Statement

Solution in MATLAB Toolbox

Problem 4. Model verification with 4-fold cross-validation

Simplified Problem Statement

Mathematical Problem Statement

Solution in MATLAB Toolbox

 

 

Background

This case study demonstrates binary classification with a large  number of  factors (independent variables). Classification algorithm is based on the Logistic Regression (LR) and includes four main steps:

Step 1) Factors are transformed by spline approximation using maximum likelihood in LR.

Step 2) Removing factors with a small importance (increment) using maximum likelihood in LR.

Step 3) Removing factors by adding  cardinality constraint to LR optimization problem.

Step 4) Model verification with 4-fold cross-validation.

 

Problem 1

Transformation of factors with splines

 

Simplified Problem Statement

 

Maximize logexp_sum(spline_sum)

 

Value:                                        

 logistic(spline_sum)  

 

where

 

logexp_sum = log-likelihood function for logistic regression (Logarithms Exponents Sum)

spline_sum = spline Sum calculates spline values depending upon regression variables for every observation (scenario)

logistic = calculates values of logistic function for every observation

 

Mathematical Problem Statement

 

Formal Problem Statement

 

Solution in MATLAB Environment

 

Solved with PSG MATLAB function tbpsg_run (PSG Subroutine Interface):

 

Description (tbpsg_run)

 

Input Files to run CS:

 MATLAB code (.txt file)
 Data (.zip file with.mat file)

 

 

Problem 2

Removing factors with a small importance (increment) using maximum likelihood in Logistic Regression

 

Simplified Problem Statement

 

Maximize logexp_sum

 

Value:                                        

 logistic  

 

where

 

logexp_sum = log-likelihood function for logistic regression (Logarithms Exponents Sum)

logistic = calculates values of logistic function for every observation (scenario)

 

 

Mathematical Problem Statement

 

Formal Problem Statement

 

 

Solution in MATLAB Environment

 

Solved with PSG MATLAB function tbpsg_run (PSG Subroutine Interface):

 

Description (tbpsg_run)

 

Input Files to run CS:

 MATLAB code (.txt file)
 Data (.zip file with.mat file)

 

 

 

Problem 3

Removing factors by adding  cardinality constraint to Logistic Regression optimization problem

 

Simplified Problem Statement

 

maximize                                      

 logexp_sum          

Constraint: <= upper_bound

 cardn                            

                                             

Value:                                        

 logistic  

 

where

 

logexp_sum = log-likelihood function for logistic regression (Logarithms Exponents Sum)

cardn = cardinality function

logistic = calculates values of logistic function for every observation (scenario)

 

Mathematical Problem Statement

 

Formal Problem Statement

 

 

Solution in MATLAB Environment

 

Solved with PSG MATLAB function tbpsg_run (PSG Subroutine Interface):

 

Description (tbpsg_run)

 

Input Files to run CS:

 MATLAB code (.txt file)
 Data (.zip file with.mat file)

 

 

 

Problem 4

Model verification with 4-fold cross-validation

 

Simplified Problem Statement

 

4-fold crossvalidation

 

Maximize logexp_sum

 

Value:                                        

 logistic  

 

where

 

crossvalidation(N,Matrix) = matrix operation splits input Matrix into N pairs of complementary sub-matrices

logexp_sum = log-likelihood function for logistic regression (Logarithms Exponents Sum)

logistic = calculates values of logistic function for every observation (scenario)

 

Mathematical Problem Statement

 

Formal Problem Statement

 

 

Solution in MATLAB Environment

 

Solved with PSG MATLAB function tbpsg_run (PSG Subroutine Interface):

 

Description (tbpsg_run)

 

Input Files to run CS:

 MATLAB code (.txt file)
 Data (.zip file with.mat file)