mirror of https://github.com/01-edu/Branch-AI.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
301 lines
14 KiB
301 lines
14 KiB
The Forest CoverType dataset |
|
|
|
|
|
1. Title of Database: |
|
|
|
Forest Covertype data |
|
|
|
|
|
2. Sources: |
|
|
|
(a) Original owners of database: |
|
Remote Sensing and GIS Program |
|
Department of Forest Sciences |
|
College of Natural Resources |
|
Colorado State University |
|
Fort Collins, CO 80523 |
|
(contact Jock A. Blackard, jblackard 'at' fs.fed.us |
|
or Dr. Denis J. Dean, denis.dean 'at' utdallas.edu) |
|
|
|
NOTE: Reuse of this database is unlimited with retention of |
|
copyright notice for Jock A. Blackard and Colorado |
|
State University. |
|
|
|
(b) Donors of database: |
|
Jock A. Blackard (jblackard 'at' fs.fed.us) |
|
GIS Coordinator |
|
USFS - Forest Inventory & Analysis |
|
Rocky Mountain Research Station |
|
507 25th Street |
|
Ogden, UT 84401 |
|
|
|
Dr. Denis J. Dean (denis.dean 'at' utdallas.edu) |
|
Professor |
|
Program in Geography and Geospatial Sciences |
|
School of Economic, Political and Policy Sciences |
|
800 West Campbell Rd |
|
Richardson, TX 75080-3021 |
|
|
|
Dr. Charles W. Anderson (anderson 'at' cs.colostate.edu) |
|
Associate Professor |
|
Department of Computer Science |
|
Colorado State University |
|
Fort Collins, CO 80523 USA |
|
|
|
(c) Date donated: August 1998 |
|
|
|
|
|
3. Past Usage: |
|
|
|
Blackard, Jock A. and Denis J. Dean. 2000. "Comparative |
|
Accuracies of Artificial Neural Networks and Discriminant |
|
Analysis in Predicting Forest Cover Types from Cartographic |
|
Variables." Computers and Electronics in Agriculture |
|
24(3):131-151. |
|
|
|
Blackard, Jock A. and Denis J. Dean. 1998. "Comparative |
|
Accuracies of Neural Networks and Discriminant Analysis |
|
in Predicting Forest Cover Types from Cartographic |
|
Variables." Second Southern Forestry GIS Conference. |
|
University of Georgia. Athens, GA. Pages 189-199. |
|
|
|
Blackard, Jock A. 1998. "Comparison of Neural Networks and |
|
Discriminant Analysis in Predicting Forest Cover Types." |
|
Ph.D. dissertation. Department of Forest Sciences. |
|
Colorado State University. Fort Collins, Colorado. |
|
165 pages. |
|
|
|
Abstract of dissertation: |
|
Natural resource managers responsible for developing |
|
ecosystem management strategies require basic descriptive |
|
information including inventory data for forested lands to |
|
support their decision-making processes. However, managers |
|
generally do not have this type of data for inholdings or |
|
neighboring lands that are outside their immediate |
|
jurisdiction. One method of obtaining this information is |
|
through the use of predictive models. |
|
Two predictive models were examined in this study, a |
|
feedforward neural network model and a more traditional |
|
statistical model based on discriminant analysis. The overall |
|
objectives of this research were to first construct these two |
|
predictive models, and second to compare and evaluate their |
|
respective classification accuracies when predicting forest |
|
cover types in undisturbed forests. |
|
The study area included four wilderness areas found in |
|
the Roosevelt National Forest of northern Colorado. A total |
|
of twelve cartographic measures were utilized as independent |
|
variables in the predictive models, while seven major forest |
|
cover types were used as dependent variables. Several subsets |
|
of these variables were examined to determine the best overall |
|
predictive model. |
|
For each subset of cartographic variables examined in |
|
this study, relative classification accuracies indicate the |
|
neural network approach outperformed the traditional |
|
discriminant analysis method in predicting forest cover types. |
|
The final neural network model had a higher absolute |
|
classification accuracy (70.58%) than the final corresponding |
|
linear discriminant analysis model(58.38%). In support of these |
|
classification results, thirty additional networks with randomly |
|
selected initial weights were derived. From these networks, the |
|
overall mean absolute classification accuracy for the neural |
|
network method was 70.52%, with a 95% confidence interval of |
|
70.26% to 70.80%. Consequently, natural resource managers may |
|
utilize an alternative method of predicting forest cover types |
|
that is both superior to the traditional statistical methods and |
|
adequate to support their decision-making processes for |
|
developing ecosystem management strategies. |
|
|
|
|
|
-- Classification performance |
|
-- first 11,340 records used for training data subset |
|
-- next 3,780 records used for validation data subset |
|
-- last 565,892 records used for testing data subset |
|
-- 70% Neural Network (backpropagation) |
|
-- 58% Linear Discriminant Analysis |
|
|
|
|
|
4. Relevant Information Paragraph: |
|
|
|
Predicting forest cover type from cartographic variables only |
|
(no remotely sensed data). The actual forest cover type for |
|
a given observation (30 x 30 meter cell) was determined from |
|
US Forest Service (USFS) Region 2 Resource Information System |
|
(RIS) data. Independent variables were derived from data |
|
originally obtained from US Geological Survey (USGS) and |
|
USFS data. Data is in raw form (not scaled) and contains |
|
binary (0 or 1) columns of data for qualitative independent |
|
variables (wilderness areas and soil types). |
|
|
|
This study area includes four wilderness areas located in the |
|
Roosevelt National Forest of northern Colorado. These areas |
|
represent forests with minimal human-caused disturbances, |
|
so that existing forest cover types are more a result of |
|
ecological processes rather than forest management practices. |
|
|
|
Some background information for these four wilderness areas: |
|
Neota (area 2) probably has the highest mean elevational value of |
|
the 4 wilderness areas. Rawah (area 1) and Comanche Peak (area 3) |
|
would have a lower mean elevational value, while Cache la Poudre |
|
(area 4) would have the lowest mean elevational value. |
|
|
|
As for primary major tree species in these areas, Neota would have |
|
spruce/fir (type 1), while Rawah and Comanche Peak would probably |
|
have lodgepole pine (type 2) as their primary species, followed by |
|
spruce/fir and aspen (type 5). Cache la Poudre would tend to have |
|
Ponderosa pine (type 3), Douglas-fir (type 6), and |
|
cottonwood/willow (type 4). |
|
|
|
The Rawah and Comanche Peak areas would tend to be more typical of |
|
the overall dataset than either the Neota or Cache la Poudre, due |
|
to their assortment of tree species and range of predictive |
|
variable values (elevation, etc.) Cache la Poudre would probably |
|
be more unique than the others, due to its relatively low |
|
elevation range and species composition. |
|
|
|
|
|
5. Number of instances (observations): 581,012 |
|
|
|
|
|
6. Number of Attributes: 12 measures, but 54 columns of data |
|
(10 quantitative variables, 4 binary |
|
wilderness areas and 40 binary |
|
soil type variables) |
|
|
|
|
|
7. Attribute information: |
|
|
|
Given is the attribute name, attribute type, the measurement unit and |
|
a brief description. The forest cover type is the classification |
|
problem. The order of this listing corresponds to the order of |
|
numerals along the rows of the database. |
|
|
|
Name Data Type Measurement Description |
|
|
|
Elevation quantitative meters Elevation in meters |
|
Aspect quantitative azimuth Aspect in degrees azimuth |
|
Slope quantitative degrees Slope in degrees |
|
Horizontal_Distance_To_Hydrology quantitative meters Horz Dist to nearest surface water features |
|
Vertical_Distance_To_Hydrology quantitative meters Vert Dist to nearest surface water features |
|
Horizontal_Distance_To_Roadways quantitative meters Horz Dist to nearest roadway |
|
Hillshade_9am quantitative 0 to 255 index Hillshade index at 9am, summer solstice |
|
Hillshade_Noon quantitative 0 to 255 index Hillshade index at noon, summer soltice |
|
Hillshade_3pm quantitative 0 to 255 index Hillshade index at 3pm, summer solstice |
|
Horizontal_Distance_To_Fire_Points quantitative meters Horz Dist to nearest wildfire ignition points |
|
Wilderness_Area (4 binary columns) qualitative 0 (absence) or 1 (presence) Wilderness area designation |
|
Soil_Type (40 binary columns) qualitative 0 (absence) or 1 (presence) Soil Type designation |
|
Cover_Type (7 types) integer 1 to 7 Forest Cover Type designation |
|
|
|
|
|
Code Designations: |
|
|
|
Wilderness Areas: 1 -- Rawah Wilderness Area |
|
2 -- Neota Wilderness Area |
|
3 -- Comanche Peak Wilderness Area |
|
4 -- Cache la Poudre Wilderness Area |
|
|
|
Soil Types: 1 to 40 : based on the USFS Ecological |
|
Landtype Units (ELUs) for this study area: |
|
|
|
Study Code USFS ELU Code Description |
|
1 2702 Cathedral family - Rock outcrop complex, extremely stony. |
|
2 2703 Vanet - Ratake families complex, very stony. |
|
3 2704 Haploborolis - Rock outcrop complex, rubbly. |
|
4 2705 Ratake family - Rock outcrop complex, rubbly. |
|
5 2706 Vanet family - Rock outcrop complex complex, rubbly. |
|
6 2717 Vanet - Wetmore families - Rock outcrop complex, stony. |
|
7 3501 Gothic family. |
|
8 3502 Supervisor - Limber families complex. |
|
9 4201 Troutville family, very stony. |
|
10 4703 Bullwark - Catamount families - Rock outcrop complex, rubbly. |
|
11 4704 Bullwark - Catamount families - Rock land complex, rubbly. |
|
12 4744 Legault family - Rock land complex, stony. |
|
13 4758 Catamount family - Rock land - Bullwark family complex, rubbly. |
|
14 5101 Pachic Argiborolis - Aquolis complex. |
|
15 5151 unspecified in the USFS Soil and ELU Survey. |
|
16 6101 Cryaquolis - Cryoborolis complex. |
|
17 6102 Gateview family - Cryaquolis complex. |
|
18 6731 Rogert family, very stony. |
|
19 7101 Typic Cryaquolis - Borohemists complex. |
|
20 7102 Typic Cryaquepts - Typic Cryaquolls complex. |
|
21 7103 Typic Cryaquolls - Leighcan family, till substratum complex. |
|
22 7201 Leighcan family, till substratum, extremely bouldery. |
|
23 7202 Leighcan family, till substratum - Typic Cryaquolls complex. |
|
24 7700 Leighcan family, extremely stony. |
|
25 7701 Leighcan family, warm, extremely stony. |
|
26 7702 Granile - Catamount families complex, very stony. |
|
27 7709 Leighcan family, warm - Rock outcrop complex, extremely stony. |
|
28 7710 Leighcan family - Rock outcrop complex, extremely stony. |
|
29 7745 Como - Legault families complex, extremely stony. |
|
30 7746 Como family - Rock land - Legault family complex, extremely stony. |
|
31 7755 Leighcan - Catamount families complex, extremely stony. |
|
32 7756 Catamount family - Rock outcrop - Leighcan family complex, extremely stony. |
|
33 7757 Leighcan - Catamount families - Rock outcrop complex, extremely stony. |
|
34 7790 Cryorthents - Rock land complex, extremely stony. |
|
35 8703 Cryumbrepts - Rock outcrop - Cryaquepts complex. |
|
36 8707 Bross family - Rock land - Cryumbrepts complex, extremely stony. |
|
37 8708 Rock outcrop - Cryumbrepts - Cryorthents complex, extremely stony. |
|
38 8771 Leighcan - Moran families - Cryaquolls complex, extremely stony. |
|
39 8772 Moran family - Cryorthents - Leighcan family complex, extremely stony. |
|
40 8776 Moran family - Cryorthents - Rock land complex, extremely stony. |
|
|
|
Note: First digit: climatic zone Second digit: geologic zones |
|
1. lower montane dry 1. alluvium |
|
2. lower montane 2. glacial |
|
3. montane dry 3. shale |
|
4. montane 4. sandstone |
|
5. montane dry and montane 5. mixed sedimentary |
|
6. montane and subalpine 6. unspecified in the USFS ELU Survey |
|
7. subalpine 7. igneous and metamorphic |
|
8. alpine 8. volcanic |
|
|
|
The third and fourth ELU digits are unique to the mapping unit |
|
and have no special meaning to the climatic or geologic zones. |
|
|
|
Forest Cover Type Classes: 1 -- Spruce/Fir |
|
2 -- Lodgepole Pine |
|
3 -- Ponderosa Pine |
|
4 -- Cottonwood/Willow |
|
5 -- Aspen |
|
6 -- Douglas-fir |
|
7 -- Krummholz |
|
|
|
|
|
8. Basic Summary Statistics for quantitative variables only |
|
(whole dataset -- thanks to Phil Rennert for the summary values): |
|
|
|
Name Units Mean Std Dev |
|
Elevation meters 2959.36 279.98 |
|
Aspect azimuth 155.65 111.91 |
|
Slope degrees 14.10 7.49 |
|
Horizontal_Distance_To_Hydrology meters 269.43 212.55 |
|
Vertical_Distance_To_Hydrology meters 46.42 58.30 |
|
Horizontal_Distance_To_Roadways meters 2350.15 1559.25 |
|
Hillshade_9am 0 to 255 index 212.15 26.77 |
|
Hillshade_Noon 0 to 255 index 223.32 19.77 |
|
Hillshade_3pm 0 to 255 index 142.53 38.27 |
|
Horizontal_Distance_To_Fire_Points meters 1980.29 1324.19 |
|
|
|
|
|
9. Missing Attribute Values: None. |
|
|
|
|
|
10. Class distribution: |
|
|
|
Number of records of Spruce-Fir: 211840 |
|
Number of records of Lodgepole Pine: 283301 |
|
Number of records of Ponderosa Pine: 35754 |
|
Number of records of Cottonwood/Willow: 2747 |
|
Number of records of Aspen: 9493 |
|
Number of records of Douglas-fir: 17367 |
|
Number of records of Krummholz: 20510 |
|
Number of records of other: 0 |
|
|
|
Total records: 581012 |
|
|
|
===================================================================== |
|
Jock A. Blackard |
|
08/28/1998 -- original text |
|
12/07/1999 -- updated mailing address, citations, background info |
|
for study area, added summary statistics. |
|
===================================================================== |
|
|
|
|