mirror of https://github.com/01-edu/Branch-AI.git
b.ghazlane
3 years ago
4 changed files with 1677 additions and 3 deletions
File diff suppressed because it is too large
diff.load
@ -0,0 +1,72 @@
|
||||
Citation Request: |
||||
This dataset is public available for research. The details are described in [Cortez et al., 2009]. |
||||
Please include this citation if you plan to use this database: |
||||
|
||||
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. |
||||
Modeling wine preferences by data mining from physicochemical properties. |
||||
In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236. |
||||
|
||||
Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 |
||||
[Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf |
||||
[bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib |
||||
|
||||
1. Title: Wine Quality |
||||
|
||||
2. Sources |
||||
Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009 |
||||
|
||||
3. Past Usage: |
||||
|
||||
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. |
||||
Modeling wine preferences by data mining from physicochemical properties. |
||||
In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236. |
||||
|
||||
In the above reference, two datasets were created, using red and white wine samples. |
||||
The inputs include objective tests (e.g. PH values) and the output is based on sensory data |
||||
(median of at least 3 evaluations made by wine experts). Each expert graded the wine quality |
||||
between 0 (very bad) and 10 (very excellent). Several data mining methods were applied to model |
||||
these datasets under a regression approach. The support vector machine model achieved the |
||||
best results. Several metrics were computed: MAD, confusion matrix for a fixed error tolerance (T), |
||||
etc. Also, we plot the relative importances of the input variables (as measured by a sensitivity |
||||
analysis procedure). |
||||
|
||||
4. Relevant Information: |
||||
|
||||
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. |
||||
For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009]. |
||||
Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables |
||||
are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.). |
||||
|
||||
These datasets can be viewed as classification or regression tasks. |
||||
The classes are ordered and not balanced (e.g. there are munch more normal wines than |
||||
excellent or poor ones). Outlier detection algorithms could be used to detect the few excellent |
||||
or poor wines. Also, we are not sure if all input variables are relevant. So |
||||
it could be interesting to test feature selection methods. |
||||
|
||||
5. Number of Instances: red wine - 1599; white wine - 4898. |
||||
|
||||
6. Number of Attributes: 11 + output attribute |
||||
|
||||
Note: several of the attributes may be correlated, thus it makes sense to apply some sort of |
||||
feature selection. |
||||
|
||||
7. Attribute information: |
||||
|
||||
For more information, read [Cortez et al., 2009]. |
||||
|
||||
Input variables (based on physicochemical tests): |
||||
1 - fixed acidity |
||||
2 - volatile acidity |
||||
3 - citric acid |
||||
4 - residual sugar |
||||
5 - chlorides |
||||
6 - free sulfur dioxide |
||||
7 - total sulfur dioxide |
||||
8 - density |
||||
9 - pH |
||||
10 - sulphates |
||||
11 - alcohol |
||||
Output variable (based on sensory data): |
||||
12 - quality (score between 0 and 10) |
||||
|
||||
8. Missing Attribute Values: None |
Loading…
Reference in new issue