Transfer Learning for Credit Risk Classification
This study introduces transfer learning to improve credit risk classification by leveraging knowledge transfer from related tasks and datasets with financial risk labels. Challenges include divergences in probability distributions and the overemphasis on mature credit business data. Addressing these challenges can enhance credit scoring performance for nascent products.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
The 23rdInternational Conference on Electronic Business Domain-Adversarial Neural Network with Joint- Distribution Adaption for Credit Risk Classification Jianshan Pan1, Yiqiong Wu2, Yang Lv1, Qiyou Lin1, Jinrui Peng1, Wen Ye2, Xiaofang Cai3, Wei (Wayne) Huang2,3 1Shenzhen Public Credit Center 2School of Management, Xi an Jiaotong University 1 3National Center ofApplied Mathematics-Shenzhen (NCAMS), Southern University of Science and Technology
INTRODUCTION Credit scoring model is a risk management tool that assesses a borrower s creditworthiness by estimating default probability based on historical data. Credit scoring models can support decision-making for credit applications, manage underlying credit risks. Many companies have joined the boat and explored ways of offering novel financial products. Financial institutions normally have limited applicants and small sample credit datasets in early stages of business. Machine learning models may get overfitted due to a lack of sufficient training samples, which will lower the models classification accuracy. 2
INTRODUCTION Transfer learning is the process by which a system adapts to new circumstances, tasks, or surroundings by transferring knowledge from related tasks (Pang & Yang, 2010). To address the problem, this study introduced transfer learning into the study of credit scoring problems. By leveraging transfer learning and datasets with financial risk labels (which usually come from users of existing products or data of similar contexts), financial practitioners can improve credit scoring performance for nascent products (Li et al., 2019). 3
INTRODUCTION Challenge 1: The process of knowledge transfer simultaneously involves both marginal probability distribution and conditional probability distribution divergences. Marginal probability distribution: The different credit access requirements in different loan businesses may lead to discrepancies among corresponding customer groups. Conditional probability distribution: There are differences between various credit businesses in terms of collateral, and interest rates. Even when given the same (accessible) customer characteristics, disparities in any of the aforementioned aspects may lead to variations in customer repayment capacity and willingness, subsequently resulting in differences in customer default behavior (Li et al., 2023). Most existing studies largely ignore the differences in conditional probability distributions between the source and target domain data, only narrowing the gap in marginal probability distribution between domains (Iwai et al., 2021). 4
INTRODUCTION Challenge 2: The source domain data dominates in quantity. The volume of mature credit business data is significantly larger than that of nascent credit business data. This business reality may lead to the model overemphasizing the tasks related to mature credit business, consequently resulting in poor generalization performance of the model in the nascent credit business (Zheng & Yang, 2022). Existing studies overlook the fact that the source domain data dominates in quantity. 5
MODEL CONSTRUCTION In this study, the credit scoring problem is formulated as a binary classification task (Wu et al., 2022). Input vectors and their corresponding labels are denoted as ? ?and ? {0,1}, respectively, where ? represents the dimension of input vectors. Here, ? = 0 represents defaulted loan applications, while ? = 1 represents non-defaulted loan applications. ?? ?,?? ?)}?=1 In this study, the training set consists of ??labeled samples ??= {(?? from the source ?? ?,?? ?)}?=1 domain (mature credit business), and ??labeled samples ??= {(?? from the target domain (new credit business), where ?? ??. Both the source and target domain samples share the same feature space ??= ??and the same label space ??= ??; however, they have different marginal probability distributions P(??) P(??), as well as different conditional probability distributions P(??|??) P(??|??). 6
MODEL CONSTRUCTION In order to achieve effective knowledge transfer between business domains and address the aforementioned research challenges, this study proposes a novel transfer learning model named domain-adversarial neural network with joint-distribution adaption and loss rectification (DANN- JDA-LR). The proposed method consists of two major modules, a domain-adversarial neural network with joint distribution adaption, and a loss rectification module. 7
MODEL CONSTRUCTION Adomain-adversarial neural network with joint distribution adaption In DANN, an effective feature representation should possess two key characteristics simultaneously. Discriminativeness: The classifier within the model can perform the credit scoring task effectively based on the mapped feature representation. Domain-invariance: The domain discriminator within the model cannot identify which domain the data came from based on the mapped feature representation. 8
MODEL CONSTRUCTION In the DANN training process, Eq. (1) and Eq. (2) are used to calculate the classification prediction loss ?and the domain prediction loss ?, respectively. ???,?? = ??????;??;??,? ?? [?? log ?? + 1 ?? (1 log( ??))] 1 (1) ?? ?=1 = ???,?? = ??????;??;??,? 1 ?? [?? log ?? + 1 ?? (1 log( ??))] (2) ?? ?=1 = 9
MODEL CONSTRUCTION When applying the traditional DANN model to the research context in this study, two key issues need to be considered. Firstly, the conventional DANN model implicitly assumes that there is only marginal probability distribution divergence between the source domain data and the target domain data. Secondly, the quantity of labeled target domain data is significantly smaller than that of labeled source domain data. This can easily lead to the classifier exhibiting 'strong supervision' on the source domain data, which in turn impacts the generalization performance of the classifier on the target domain data. 10
MODEL CONSTRUCTION When applying the traditional DANN model to the research context in this study, two key issues need to be considered. Firstly, the conventional DANN model implicitly assumes that there is only marginal probability distribution divergence between the source domain data and the target domain data. Secondly, the quantity of labeled target domain data is significantly smaller than that of labeled source domain data. This can easily lead to the classifier exhibiting 'strong supervision' on the source domain data, which in turn impacts the generalization performance of the classifier on the target domain data. 11
MODEL CONSTRUCTION The following loss is employed in this study to address the above issues. R= ? ? exp( ? ?) + ? ? y ?and ? ?are the cross-entropy losses of the classifier for the target domain and source domain, Where ? respectively. ? exp( ? ?), primarily serves the purpose of penalizing the prediction loss in the The first term, ? source domain. Simultaneously, this term introduces the exponential factor exp( ? when the prediction loss ? penalizing the target domain prediction loss ? ?) to ensure that ?in the target domain becomes significantly larger, the model prioritizes ?over penalizing the source domain prediction loss ? ?. ?, is mainly utilized to penalize the prediction loss in the target domain. The second term, ? While the loss rectification strategy proposed in this study cannot guarantee the optimality (minimization) of source domain data category prediction losses under all conditions, it can ensure that the classifier exhibits superior category prediction performance for target domain data, aligning with the original design intent of this study's research model. 12
RESEARCH DATA A real-world credit scoring dataset provided by Qianhai Zheng Xin A credit loan business (referred to as Business A) and Acash loan business (referred to as Business B). Source domain training set Target domain training set Target domain validation set Target domain test set Transfer scenario A B 3560 249 107 1159 B A 1515 105 46 24913 13
BENCHMARK MODELS Benchmark model Random forest-TO (RF-TO) TO-Models AdaBoost-TO Deep neural network-TO (DNN-TO) Random forest-S&T (RF-S&T) AdaBoost-S&T S&T-Models Deep neural network-S&T (DNN-S&T) Deep neural network-S&T with loss rectification (DNN-S&T w/ LR) TrAdaBoost DANN Transfer Learning Models Domain-adversarial neural network with joint-distribution adaption (DANN- JDA) 14
RESEARCH RESULTS Transfer scenario Model RF-TO AdaBoost-TO DNN-TO RF-S&T AdaBoost-S&T DNN-S&T DNN-S&T w/ LR TrAdaBoost DANN DANN-JDA DANN-JDA-LR RF-TO AdaBoost-TO DNN-TO RF-S&T AdaBoost-S&T DNN-S&T DNN-S&T w/ LR TrAdaBoost DANN DANN-JDA DANN-JDA-LR AUC KS G-mean 0.0701 (0.0363) 0.0733 (0.0342) 0.0518 (0.0321) 0.0728 (0.0244) 0.0615 (0.0089) 0.0795 (0.0306) 0.0802 (0.0227) 0.0956 (0.0395) 0.0668 (0.0209) 0.0790 (0.0337) 0.0873 (0.0223) 0.1606 (0.0412) 0.1540 (0.0378) 0.1497 (0.0327) 0.1575 (0.0234) 0.1573 (0.0151) 0.1511 (0.0348) 0.1683 (0.0300) 0.1788 (0.0618) 0.1453 (0.0242) 0.1575 (0.0280) 0.1880 (0.0183) 0.5039 (0.0346) 0.5341 (0.0242) 0.5199 (0.0381) 0.5218 (0.0363) 0.5441 (0.0215) 0.5410 (0.0311) 0.5471 (0.0288) 0.5113 (0.0272) 0.5404 (0.0261) 0.5437 (0.0191) 0.5490 (0.0347) 0.5533 (0.0434) 0.5328 (0.0386) 0.5311 (0.0343) 0.5475 (0.0165) 0.5410 (0.0168) 0.5278 (0.0195) 0.5398 (0.0281) 0.5123 (0.0068) 0.5228 (0.0226) 0.5339 (0.0160) 0.5573 (0.0191) 0.0949 (0.0318) 0.1278 (0.0394) 0.1118 (0.0290) 0.1025(0.0264) 0.1301 (0.0325) 0.1230 (0.0343) 0.1303 (0.0336) 0.0485 (0.0333) 0.1208 (0.0400) 0.1273 (0.0274) 0.1376 (0.0553) 0.0911 (0.0519) 0.0707 (0.0389) 0.0688 (0.0413) 0.0754 (0.0218) 0.0748 (0.0195) 0.0532 (0.0290) 0.0679 (0.0399) 0.0253 (0.0122) 0.0550 (0.0237) 0.0616 (0.0225) 0.0989 (0.0277) A B B A 15
RESEARCH RESULTS 0.01 0.03 0.005 0.025 0.02 0 AUC KS G-mean 0.015 -0.005 0.01 0.005 -0.01 0 AUC KS G-mean -0.005 -0.015 -0.01 -0.02 -0.015 RF models AdaBoost models B A DNN models RF models AdaBoost models A B DNN models It is possible to enhance model performance in the target domain by extracting shared knowledge from different but related domains. Directly mixing source domain and target domain data may lead to an unstable and unreliable model. 16
RESEARCH RESULTS The necessity of joint-distribution adaption In both transfer scenarios, the DANN-JDA model exhibits a certain degree of performance improvement compared to the DANN model. Transfer scenario Model DANN DANN-JDA DANN DANN-JDA AUC KS G-mean 0.0668 (0.0209) 0.0790 (0.0337) 0.1453 (0.0242) 0.1575 (0.0280) 0.5404 (0.0261) 0.5437 (0.0191) 0.5228 (0.0226) 0.5339 (0.0160) 0.1208 (0.0400) 0.1273 (0.0274) 0.0550 (0.0237) 0.0616 (0.0225) A B B A 17
RESEARCH RESULTS The necessity of loss rectification 0.04 0.012 0.035 0.01 0.03 0.008 0.025 0.02 0.006 0.015 0.004 0.01 0.002 0.005 0 0 AUC KS G-mean AUC KS G-mean DNN-S&T models DANN-JDA models DNN-S&T models DANN-JDA models A B B A 18
CONCLUSION This study is of paramount importance for financial institutions engaged in credit risk management during the early stages of credit business. The proposed model can assist financial institutions in more accurately identifying potential defaulting and non-defaulting customers, thereby reducing losses stemming from erroneous lending decisions and enhancing the risk management capabilities of financial institutions. In the future, we will continue to investigate the usage of the proposed model and apply it to other fields, such as market risk regulation, to obtain the model with good generalizability. 19
Thank You 20