mirror of https://github.com/01-edu/Branch-AI.git
b.ghazlane
3 years ago
4 changed files with 215 additions and 0 deletions
@ -0,0 +1,45 @@
|
||||
# Credit scoring |
||||
|
||||
## Machine learning model |
||||
|
||||
- the model is trained only the training set |
||||
- the AUC on the test set is higher than 75% |
||||
- the model learning curves should prove that the model is not overfitting |
||||
- the training has been stopped early enough to avoid the overfitting |
||||
- the text document should describe the methodology used to train the machine learning model. |
||||
- predict.py runs without any error and returns: |
||||
|
||||
|
||||
```prompt |
||||
python predict.py |
||||
|
||||
AUC on test set: 0.76 |
||||
|
||||
``` |
||||
|
||||
This article gives a complete example of a good modelling approach: |
||||
|
||||
https://medium.com/thecyphy/home-credit-default-risk-part-2-84b58c1ab9d5 |
||||
|
||||
|
||||
## Model's interpretability |
||||
|
||||
- Feature importance: |
||||
- the importance of all features used by the model are computed and showed in a visualisation. You should be careful here to associate the right variables to the their feature importance. Sometimes, the preprocessing pipeline can remove some features during the features selection step for instance. |
||||
|
||||
|
||||
- Descriptive variables: |
||||
These are important to understand for example the age of the client. If the data could be scaled or modified in the preprocessing pipeline but the data visualised here should be "raw". This part is validated if the visualisations are computed for the 3 clients. |
||||
- visualisations that show at least 10 variables describing the client and its loan(s) |
||||
- visualisations that show the comparison between this client and other clients. |
||||
|
||||
- SHAP values on the model: |
||||
- a summary plot shows the important features and their impact on the target. This is optional if you have already computed the features importance. |
||||
|
||||
- SHAP values on predictions: |
||||
This part is validated if the SHAP visualisations are computed for the 3 clients. |
||||
- a force plot shows what variables contributes the most to the score. |
||||
- **check that the score outputted by the force plot corresponds to the one outputted by the model.** |
||||
|
||||
|
||||
|
After Width: | Height: | Size: 358 KiB |
@ -0,0 +1,47 @@
|
||||
# Credit scoring data description |
||||
|
||||
This file describes the available data for the project. |
||||
|
||||
|
||||
![alt data description](project5_data_description.png "Credit scoring data description") |
||||
|
||||
## application_{train|test}.csv |
||||
|
||||
This is the main table, broken into two files for Train (with TARGET) and Test (without TARGET). |
||||
Static data for all applications. One row represents one loan in our data sample. |
||||
|
||||
## bureau.csv |
||||
|
||||
All client's previous credits provided by other financial institutions that were reported to Credit Bureau (for clients who have a loan in our sample). |
||||
For every loan in our sample, there are as many rows as number of credits the client had in Credit Bureau before the application date. |
||||
|
||||
## bureau_balance.csv |
||||
|
||||
Monthly balances of previous credits in Credit Bureau. |
||||
This table has one row for each month of history of every previous credit reported to Credit Bureau – i.e the table has (#loans in sample * # of relative previous credits * # of months where we have some history observable for the previous credits) rows. |
||||
|
||||
## POS_CASH_balance.csv |
||||
|
||||
Monthly balance snapshots of previous POS (point of sales) and cash loans that the applicant had with Home Credit. |
||||
This table has one row for each month of history of every previous credit in Home Credit (consumer credit and cash loans) related to loans in our sample – i.e. the table has (#loans in sample * # of relative previous credits * # of months in which we have some history observable for the previous credits) rows. |
||||
|
||||
## credit_card_balance.csv |
||||
|
||||
Monthly balance snapshots of previous credit cards that the applicant has with Home Credit. |
||||
This table has one row for each month of history of every previous credit in Home Credit (consumer credit and cash loans) related to loans in our sample – i.e. the table has (#loans in sample * # of relative previous credit cards * # of months where we have some history observable for the previous credit card) rows. |
||||
|
||||
## previous_application.csv |
||||
|
||||
All previous applications for Home Credit loans of clients who have loans in our sample. |
||||
There is one row for each previous application related to loans in our data sample. |
||||
|
||||
|
||||
## installments_payments.csv |
||||
|
||||
Repayment history for the previously disbursed credits in Home Credit related to the loans in our sample. |
||||
There is a) one row for every payment that was made plus b) one row each for missed payment. |
||||
One row is equivalent to one payment of one installment OR one installment corresponding to one payment of one previous Home Credit credit related to loans in our sample. |
||||
|
||||
## HomeCredit_columns_description.csv |
||||
|
||||
This file contains descriptions for the columns in the various data files. |
Loading…
Reference in new issue