Newest Viewed Downloaded

Financial Accounting Systems, Åbo Akademi 2010 Jaana Aaltonen & Ralf Östermark Financial classification models

Financial Accounting Systems, Åbo Akademi 2010 Jaana Aaltonen & Ralf Östermark Financial classification models

Contents The classification problem Classification models Discriminant analysis Logistic regression Recursive partitioning algorithm (RPA) Mathematical programming Linear programming models Quadratic programming models Neural network classifiers

Contents Case: Bankruptcy prediction of Spanish banks Some comments on hypothesis testing References

1. The classification problem In a traditional classification problem the main purpose is to assign one of k labels (or classes) to each of n objects, in a way that is consistent with some observed data, i.e. to determine the class of an observation based on a set of variables known as predictors or input variables Typical classification problems in finance are for example Financial failure/bankruptcy prediction Credit risk rating

Discriminant analysis Discriminant analysis is the most common technique for classifying a set of observations into predefined classes The model is built based on a set of observations for which the classes are known This set of observations is sometimes referred to as the training set or estimation sample

Discriminant analysis... Based on the training set, the technique constructs a set of linear functions of the predictors, known as discriminant functions, such that L = b1x1 + b2x2 + … + bnxn + c, where the b's are discriminant coefficients, the x's are the input variables or predictors and c is a constant.

Discriminant functions The discriminant functions are optimized to provide a classification rule that minimizes the probability of misclassification In order to achieve optimal performance, some statistical assumptions about the data must be met Each group must be a sample from a multivariate normal population The population covariance matrices must all be equal In practice the discriminant has been shown to perform fairly well even though the assumptions on data are violated

Discriminant functions

k-1 Canonical functions k Fisher’s functions

Distributions of the discriminant scores for two classes A discriminant function is optimized to minimize the common area for the distributions

Discriminant analysis... The discriminant functions are used to predict the class of a new observation with unknown class For a k class problem, k discriminant functions are constructed Given a new observation, all the k discriminant functions are evaluated and the observation is assigned to class i if the i:th discriminant function has the highest value.

Variable selection: Analyzing group differences Although the variables are interrelated and the multivariate statistical techniques such as discriminant analysis incorporate these dependencies, it is often helpful to begin analyzing the differences between groups by examining univariate statistics The first step is to compare the group means of the predictor variables A significant inequality in group means indicates the predictor variable’s ability to separate between the groups The significance test for the equality of the group means is an F-test with 1 and n-g degrees of freedom If the observed significance level is less than 0.05, the hypothesis of equal group means is rejected

Analyzing group differences: Wilks’ Lambda Another statistic used to analyze the univariate equality of group means is Wilks’ Lambda, sometimes called the U-statistic Lambda is the ratio of the within-groups sum of squares to the total sum of squares Lambda has values between 0 and 1 A lambda of 1 occurs when all observed group means are equal Values close to 0 occur when within-groups variability is small compared to total variability Large values of lambda indicate that group means do not appear to be different while small values indicate that group means do appear to be different

Multivariate Wilks’ Lambda statistic In the case of several variables {X1, X2,...,Xp}, the total variability is expressed by the total cross product matrix T The sum of cross-product matrix T is decomposed into the within-group sum of cross- product matrix W and the between-group sum of cross-product matrix B such that T = W + B  W = T - B

Multivariate Wilks’ Lambda statistic... For the set of the X variables, the multivariate global Wilks’ Lambda is defined as Lp = |W| / |W + B| = |W| / |T| ~ L(p,m,n) where |W| = the determinant of the within-group SSCP matrix |B| = the determinant of the between-groups SSCP matrix |T| = the determinant of the total sum of cross product matrix L(p,m,n) = Wilks’ Lambda distribution For large m, Bartlett's (1954) approximation allows Wilks' lambda to be approximated by a Chi-square distribution

Variable selection: Correlations between predictor variables Since interdependencies among the variables affect most multivariate analyses, it is worth examining the correlation matrix of the predictor variables Including highly correlated variables in the analysis should be avoided as correlations between variables affect the magnitude and the signs of the coefficients If correlated variables are included in the analysis, care should be exercised when interpreting the individual coefficients

Logistic Regression Logistic regression is part of a category of statistical models called generalized linear models Whereas discriminant analysis can only be used with continuous independent variables. Logistic regression allows one to predict a discrete outcome, such as group membership, from a set of variables that may be continuous, discrete, dichotomous, or a mix of any of these Generally, the dependent or response variable is dichotomous, such as presence/absence or success/failure.

Logistic Regression... Even though the dependent variable in logistic regression is usually dichotomous, that is, the dependent variable can take the value 1 with a probability of success q, or the value 0 with probability of failure 1-q, applications of logistic regression have also been extended to cases where the dependent variable is of more than two cases

Logistic Regression... The independent or predictor variables in logistic regression can take any form, i.e. logistic regression makes no assumption about the distribution of the independent variables They do not have to be normally distributed, linearly related or of equal variance within each group The relationship between the predictor and response variables is not a linear function, instead, the logistic regression function is used, which is the logit transformation of probability q

Logistic Regression... The Model:   where a is the constant of the equation and, b:s are the coefficient of the predictor variables An alternative form of the logistic regression equation is:

Logistic Regression... The goal of logistic regression is to correctly predict the category of outcome for individual cases using the most parsimonious model To accomplish this goal, a model is created that includes all predictor variables that are useful in predicting the response variable. Different methods for model creation Stepwise regression Backward stepwise regression

Showing 1 - 20 of 63 items Details

Name: 
classification_models
Author: 
jaaltone
Company: 
Åbo Akademi
Description: 
Financial Accounting Systems, Åbo Akademi 2010 Jaana Aaltonen & Ralf Östermark Financial classification models
Tags: 
variabl | group | class | discrimin | case | model | regress | logist
Created: 
1/7/2010 11:45:08 AM
Slides: 
63
Views: 
14
Downloads: 
1
Rating: 
0


> Comment



Share this presentation
|

Comments

Share this presentation:

|
Sitemap