Data_analysis_and

Data analysis and reporting; Mon May, 14 -- Fri May 18, 2007

After the field work we have a common data set. All groups do the analysis listed in the page. The data set has

STRS observations - the photo-trees
Field observations - the reference observations

During the calculations, consider having
- a tree-file and
- a plot-file
in which the results of the calculations are presented.

These files have as many records as there were trees and plots in the inventory. You will have many other files, in which your computations and intermediate variables are stored. Document and store everything carefully from the start, as the documentation and files used in the computations will be a part of the report that is given a grade. Use the variable names that are given in this documentation. Your document can be written in the form of a flow chart, with references (links) to files, where appropriate. Consider writing it using HTML or equivalent hypertext documentation language. The structure can follow the sections 1-8 of this document, and it is recommendable to copy the table-templates etc. from this document.

Our data analysis tasks will include:

1) An evalution of the performance of the STRS-system

2) Calculations of timber resources in the area using the sample plots with:
- Field observations
- STRS observations

The analyses can be done with MS-Excel. Some tasks require programming and we use VBA, which is built in Excel. Alternatively you can program in Python or other languages that you find support for. Logit-modeling (of class variables) can not be done inside Excel easily. For that and any statistical analysis you can use the statistical tools that you are familiar with or Excel. SAS, SPSS and others should be available.

Timetable
Monday
- Prepare tree-records in plotnumber.xls files, one file for each plot. Have them sent to course assistant, sent each plot once it is done, so he can join them into one big observation matrix.
+ Check d13-ref x h-ref and d13-ref x h-foto distributions, mark suspicious (gross outliers), put 1/0 to column "Research" in the plotnumber.xls
- Prepare field forms, originals on paper and an xls-file, one per plot. Have papers stored in a folder and the xls's sent to course assistant
- send assistant the file group.csv, an ascii file having all your valid (used) azimuths and distances for the triangulated trees (proper input file for the calculator -program), include center poles with number 200.
Tuesday
Sections 1- 5
Wednesday
Sections 6-7 (-8)
Friday
(8)-9

Tree-records from Hyytiälä (joined plotnumber.xls files)

When we are done with the field work, our data set has per-tree records from the 59 0.04-ha plots:

Plot-id, integer
tree-id, integer [1,199] for STRS-trees, > 201 for new positionings, 200 pole, don't include
STRS-status, classes: correct, omission, commission
Field status, classes: 11,12, 13,..23 (living, dead, crown...)
X, real-valued in KKJ [m]
Y, real, valued in KKJ [m]
SigmaX, real [m] for a tree positioned in the field
SigmaY, real [m] for a tree positioned in the field
Zbutt, real valued in N60 [m]
Ztop-fotogrammetric, real in N60 [m]
Ztop-LiDAR, real in N60 [m]
Sp-foto, Class, Species by visual interpretation
Sp-ref, Class, Reference from the field
h-foto, fotogrammetric height, real valued, by multi-scale TM, [m]
h-LiDAR, real valued, by highest LiDAR-hit, [m]
Dcrm-foto, crown width, real valued by multi-scale TM, [m]
Dcrm-LiDAR, crown width, real by adjustment of crown model [m]
Dcrm-LiDAR-RMSE, the RMS of the residuals in crown modelling [m]
d13-ref, Field mesured reference value in [cm]
Sample-tree, Class variable, 0 = tally tree, 1=sample tree
h-ref, Real valued, Reference height in [m], with hypsometer
hc-ref, Height of the lowest living branch, [m], with hypsometer
Research, Usable for research 1/0 , 1 = OK, 0 = suspicious reference values

1. Calculation of plot-level variables using sparse LiDAR in 2004 and regression functions by Suvanto et al. (2005)

Process the 2004 LiDAR data using the SUVANTOLiDARLASKIN program and calculate estimates for

V-LiDAR (m3/ha),
N-LiDAR (n/ha),
G-LiDAR (m2/ha),
HGM-LiDAR (m) and
DGM-LiDAR (cm).

Filter the data for the 59 plots-of-interest using the IDs. Store these variables in your plot-file. Metsätieteen aikakausikirja (2005) has an article by Suvanto et el. on the regression estimation method that has been implemented in the program.

2. Calculation of tree-level reference values

2.1 Height estimation for tally trees using sample tree information

We have height measurements for every N'th sample tree. We need heights for the tally trees in order to compute their volumes with taper curves.

Stratify the 59 plots in the following 5 strata using the HGM-LiDAR variable as the basis of stratification. We try to have the plots stratifield according to the development class.

1) HGM-LIDAR < 10 m
2) HGM-LIDAR 10-15 m
3) HGM-LIDAR 15-20 m
4) HGM-LIDAR 20-25 m
5) HGM-LIDAR >25 m

In each stratum, estimate separately Näslund's height curves for pine, spruce and broadleaved trees. This gives 5×3 linear regression models of type y = a + b×d13-ref, where y = SQRT(d13-ref²/(h-ref-1.3)) in the LS- estimation of parameters a and b. When you finally apply the model and calculate heights (h-ref-nasl) for the tally trees of certain species in a stratum, use the form

Store these estimated heights in your tree-file, give a separate column / variable name for the reference height that was estimated using height curves: h-ref-nasl. The transformation of the y-variable introduces systematic error: you can correct for it, but it is optional for those who are interested. Juha Lappi's book in forest biometry (Finnish) has examples on this.

2.2 Stem bucking and calculation of reference volumes

For each tree, using the reference values: d13-ref, h-ref or h-ref-nasl height and Sp-ref, calculate the total stem volume and volume in saw logs, pulp logs and treetop using the MARV42_POLKYTTAJA program. Name these variables v-ref, v-saw-ref, v-pulp-ref, v-waste-ref. Store them in your tree-file.

2.3 STRS: Allometric modeling of stem diameters

The following tree-level variables are calculated using the STRS observations.

d13-foto, as a function of Sp-foto, h-foto and Dcrm-foto, [cm]
d13-foto-LiDAR, function of Sp-foto, h-foto, Dcrm-LiDAR, [cm]

Equations for computing d13 are found in the Silva Fennica article by Kalliovirta and Tokola (2005). Use models for the region that Hyytiälä belongs to. Remember to add the bias-corrections as described in the articles. Store the variables in your tree-file.

3. Calculation of plot-level reference variables from field observations

Calculate, for each 0.04-ha plot, the following variables using the reference measurements

Stem-n-ref, Stem number, [n/ha]
G-ref, Basal area, [m2/ha]
Hdom-ref, Dominant height, [m] is the arithmetic mean of the heights of 100 largest trees per hectare. Size is measured with d13-ref, use h-ref and h-ref-nasl heights.
Hg-ref, Basal-area weighted mean height [m]. Multiply each h-ref or h-ref-nasl value with the squared d13-ref, sum these and divide the sum by the sum of squared d13-ref values.(Hg-ref ~ HGM, median)
Dg-ref, [cm] Basal-area weighted mean diameter (Dg-ref ~ DGM)
Vtot-ref, Total volume [m3/ha] sum of v-ref
Vtot-ref-Bit, Total volume [m3, ha], V = FGH using Nyyssönen's function for FH. (Relaskooppitaulukoista, Vol using relascope tables)
Vsaw-ref, Volume in saw logs [m3/ha], sum of v-saw-ref
Vpulp-ref, Volume in pulp logs [m3/ha], sum of v-pulp-ref
Vwaste-ref, Volume in treetop logs [m3/ha], sum of v-waste-ref
Vpine-ref, [m3/ha], sum of v-ref in pines
Vspruce-ref, [m3/ha], sum of v-ref in spruce
Vbroadl-ref, [m3/ha], ...etc.
Vsaw-pine-ref, [m3/ha]
Vsaw-spruce-ref, [m3/ha]
Vsaw-broadl-ref, [m3/ha]
Vpulp-pine-ref, [m3/ha]
Vpulp-spruce-ref, [m3/ha]
Vpulp-broadl-ref, [m3/ha]

Arriving at these variables requires a little programming. Consult tutors and this document for programming help. Visual basic (for applications, VBA, an event-based programming language) can be used from within Excel. Pressing ALT-F11 opens a VBA editor, in which you can write and execute code. Double-click item Sheet1, and a code-window opens. Select Worksheet from the pull-down menu, and program the event "Calculate". You can make an ASCII-file, myfile.txt, with tree-records: e.g. plot-id, tree-id, STRS-status, d13-ref, Sample-tree, h-ref and h-ref-nasl. The example code below calculates the stem number (stem-n-ref), basal area (G-ref), basal area weighted mean height (Hg-ref) for each plot. Make alterations to the code to get the other needed plot-level variables listed above.

First, insert a module for declaring a special/helpful data structure: Menu command Insert | Module (Module1). Under "general declarations" spesify a data structure that is given the name "Plot" and declare an array of this data structure with 224 entries:

Type Plot plotId As Long NtreesIn As Long Dsum As Double Hsum As Double D2Sum As Double HD2Sum As Double End Type Public data(1 To 224) As Plot
In the code window of Sheet1, enter the following code. It reads the tree-records in myfile.txt, stores the needed variables into the array of plot data type named data, and finally loops the array, find plots with observations and calculates the needed variables and writes the answers to a file. Lines that start with the "Rem" keyword are comments.

Private Sub Worksheet_Calculate() Open "c:\data\myfile.txt" for input as 1Rem Read the file line by line, calculate needed sums Rem and store variables to the array
Do until EOF(1) input #1, plotN,tree,status,d13,sample,h_ref,h_ref_n data(plotN).NtreesIn = data(plotN).NtreesIn + 1 data(plotN).Dsum = data(plotN).Dsum+ d13 data(plotN).D2sum = data(plotN).D2sum+ d13^2
Rem select reference height between two, collect sums Rem for computing Hg-ref
if sample = 0 then data(plotN).HD2sum = data(plotN).HD2sum + h_ref_n * d13^2 Else data(plotN).HD2sum = data(plotN).HD2sum + h_ref * d13^2 end if Loop Close(1)Rem Everything is now stored, compute and output
Open "C:\data\MyOutput.txt" for output as 2
For i = 1 to 224 if data(i).NtreesIn > 0 then Rem this is a plot i with trees StemNumber =data(i).NtreesIn / 0.04
BasalAr =data(i).D2Sum * (3.1415/4)
Hg = data(i).HD2sum/data(i).D2sum Rem Print values to file
Print #2, i, StemNumber, BasalAr, Hg end if Next i Close(2) Msgbox("I'm done!")
End SubThe Code executes, when you press F9 in Excel when Sheet1 is active, or enter a formula, that needs calculations as it triggers the event "Calculate".

Nyyssönen's height-form-functions (muotokorkeusfunktiot) for computing volume in Bitterlich-type of plots: Vtot-ref-Bit = F*G*H, are: for pine, spruce and birch stands. Use Hg-ref or HgM-ref for the mean height (H).

FH = 0.4116 - 0.04275*H^1.5 + 0.6359 * H FH = 1.3187 + 0.00099*H^2 + 0.3978 * H FH = 0.4907 - 0.00137*H^2 + 0.4556 * H
4. Calculations of timber resources using the reference data

The area of the forest holding is 56.8 ha. Each 0.04-ha plot represents an area of 0.9635 ha of the total area. Based on this information, calculate the total estimates for the forest holding.

Cross-tabulations:

Table 1. Timber resources: Stem number (#1, n) and volume (#2, m3) by species and stem dbh-classes; living trees.

Sp / dbh	50-100 mm	101-150 mm	151-200 mm	201- 250 mm	251- mm	Total
Pine	#1 / #2	#1 / #2	#1 / #2	#1 / #2	#1 / #2	#1 / #2
Spruce	#1 / #2	#1 / #2	#1 / #2	#1 / #2	#1 / #2	#1 / #2
Birch	#1 / #2	#1 / #2	#1 / #2	#1 / #2	#1 / #2	#1 / #2
Other	#1 / #2	#1 / #2	#1 / #2	#1 / #2	#1 / #2	#1 / #2
Total	#1 / #2	#1 / #2	#1 / #2	#1 / #2	#1 / #2	#1 / #2

Table 2. Timber resources: Volume (#1, m3) by sortiments and species; living trees.

Sp / sortiment	Saw- wood	Pulp- wood	Waste	Total
Pine	#1	#1	#1	#1
Spruce	#1	#1	#1	#1
Birch	#1	#1	#1	#1
Other	#1	#1	#1	#1
Total	#1	#1	#1	#1

5. Computation of relative tree heights

Using Hdom-ref of each plot, compute the relative height for each tree and scale it between 0...1 by simply truncating values > 1 to 1.0.

For sample trees: h-rel-ref = h-ref / Hdom-ref
For tally trees: h-rel-ref = h-ref-nasl / Hdom-ref

6. Assessment of STRS (single tree) estimates

6.1 Correctly found trees (Match-rate), omission and commission error-rates

These three statistics are for the whole data set of 59 plots. They are percentages of stem number and volume, of living trees, see EQs i)-vi) below. These statistics describe the overall potential of the STRS method in finding trees. Commission errors are photo-trees, that were rejected in the field, because no tree (or stump) was found in that location or anywhere near (within a meter) the XY location of the top. Omission errors consist of the trees that were missed by the STRS system, and only the reference measurements "have this tree". In i)-vi) below, N refers to the number and Vol is the total volume.

i) Match-rate-N-% of correctly found trees in stem number =
(N_STRS_trees-N_Commission) / (N_All_trees) * 100%

ii) Match-rate-Vol-% of correctly found trees in stem number =
(Vol_STRS_trees) / (Vol_All_trees) * 100%

iii) Omission-N-% = (N_omission) / (N_All_trees)
iv) Omission-Vol-% = (Vol_omission) / (Vol_All_trees)

v) Commission-N-% = (N_commission) / (N_All_trees)
vi) Commission-Vol-% = (Vol_commission) / (N_All_trees)

6.2 Discernibility statistics

Prepare a table, where you give the percentages of correctly found and missed trees of all trees over classes of relative height.

Relative tree height	Found trees, %	Missed trees, %
< 0.35	#-%	#-%
0.35-0.45	#-%	#-%
0.45-0.55	#-%	#-%
0.55-0.65	#-%	#-%
0.65-0.75	#-%	#-%
0.75-0.85	#-%	#-%
0.85-0.95	#-%	#-%
> 0.95	#-%	#-%

Optional part of 6.2:

Model the probability of measurability of a tree using logistic regression i.e. model the probability (0...1) that a the tree was found = 1, or missed = 0 and
explain the probability with relative tree height. The model formulation is :
Tree is measurable (0/1) = f (h-rel-ref). I.e. apply Binomial logistic regression.

Also, compute for each plot, a measure of relative density of the stand using relationship between the mean height of the stand and the Basal area:

DensityMeasure = G-ref/Hg-ref

Use this variable together with h-rel-ref as covariates for the measurability (0/1): Binomial logistic regression.

6.3 Accuracy of height estimates

Report by filling the tables below. Study the distribution of the residuals:
(h-ref - h-foto) and (h-ref - h-LiDAR). N = N of sample trees.
Give the RMS errors in both absolute and relative values. Use the arithmetic mean of reference heights in transforming the absolute RMSE into relative RMSE in %. The analysis can only be done for the correctly found trees, give their number (N).

Table #. Accuracy of photogrammetric height estimation.

	Mean h-ref, m	N	Mean, m	SD, m	RMSE, m, %
All species					#1, #2
Pine
Spruce
Broadleaved

Table #. Accuracy of LiDAR-based height estimation

	Mean h-ref, m	N	Mean, m	SD, m	RMSE, m, %
All species					#1, #2
Pine
Spruce
Broadleaved

6.4 Accuracy of d13-estimates

Report by filling the tables below. Study the distribution of the residuals:
(d13-ref - d13-foto) and (d13-ref - d13-foto-LiDAR).
Give the RMS both in absolute (cm) and relative values (%). Use the arithmetic mean of reference d13 in transforming the absolute RMSE to the relative RMSE in %. The analysis can only be done for the correctly found trees, give their number (N).

Table #. Accuracy of photogrammetric d13 estimation.

	Mean d13-ref, cm	N	Mean, cm	SD, cm	RMSE, cm, %
All species					#1, #2
Pine
Spruce
Broadleaved

Table #. Accuracy of LiDAR-based d13 estimation

	Mean d13-ref,c m	N	Mean, cm	SD, cm	RMSE, cm, %
All species					#1, #2
Pine
Spruce
Broadleaved

6.5 Accuracy assesment of species recognition

Using variables Sp-foto and Sp-ref for the correctly found photo-trees, prepare a table called the confusion or error matrix of classification.

Sp-foto/ Sp-ref	Pine	Spruce	Birch	Aspen	Other	Dead
Pine	#1.1	#1.2	#1.3	#1.5	#1.5
Spruce	#2.1	#2.2	#2.3	...	...
Broadl			#3.3
Dead						#20.20

The table shows e.g that #1.1 pines were correctly classified as pines, and #2.1 pines were seen as spruces in images. The sum of the diagonal (N correct) is used in computing the overall accuracy-%. Compute it. For that analysis, reclassify Sp-ref into classes of Pine, Spruce, Broadleaved and Dead, i.e. make the matrix to have an equal number of columns and rows. Look for example by Åsa Persson: http://www.isprs.org/commission8/workshop_laser_forest/PERSSON.pdf, Table 1.
Determine the Kappa-coefficient of the classification, which another measure of classification success. See textbook by Tokola et al. Metsän kaukokartoitus?, Cohen's Kappa:

Cohen, J. (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20: 37-46.

http://en.wikipedia.org/wiki/Cohen's_kappa

Optional: For each photo-tree model the class variables identification successfull (0/1) using logistic regression: Success of recognition (0/1) = f(h-rel-ref)

6.6 Accuracy of single-tree volume-estimates

Using MARV42_POLKYTTAJA, compute STRS volume estimates with taper curves

v-STRS-foto = f(Sp-foto, d13-foto, h-foto)
v-STRS-LiDAR = f(Sp-foto, d13-foto-LiDAR, h-foto)

Calculate the differences in single tree volumes: (v-ref - v-STRS-foto) and (v-ref - v-STRS-LiDAR) and calculate RMSE (absolute and %), SD and mean of these errors/differences.

Table #. Accuracy of foto-based single-tree volume estimation

	Mean v-ref, m3	N	Mean, m3	SD, m3	RMSE, m3, %
All species					#1, #2
Pine
Spruce
Broadleaved

Table #. Accuracy of LiDAR-based volume estimation

	Mean v-ref, m3	N	Mean, m3	SD, m3	RMSE, m3, %
All species					#1, #2
Pine
Spruce
Broadleaved

7. Assesment of LiDAR regression estimates of V, N, G, DgM and HgM

Calculate the differences of plot variables V-LiDAR (m3/ha), N-LiDAR (n/ha), G-LiDAR (m2/ha), HGM-LiDAR (m) and DGM-LiDAR (m) with respect to the reference values: Error/difference = (reference - #-LiDAR). N = 59.

Analyze the differences, give overall RMSE, SD and mean for the 5 variables and give them in the form of a table. Assume Dg ~ DGM and Hg ~ HGM and take the references Dg-ref and Hg-ref.

Table #. Accuracy of LiDAR-based volume estimation

	Mean #-ref	N	Mean	SD	RMSE
Volume					#1, #2
Stem num
Basal area
HGM
DGM

Study the residuals further and provide plots (5) : distribution of the 5 variables and the proportion of broadleaved trees of volume. Are the errors associated with the presence of broadleaved trees?

Plot1: Volume errors x Proportion of broadleaved trees
Plot2: Basal area errors x Proportion of broadleaved trees
Plot3: Stem number errors x Proportion of broadleaved trees
Plot4: DGM errors x Proportion of broadleaved trees
Plot5: HGM errors x Proportion of broadleaved trees

8. Calculations of timber resources using STRS estimates

Using Sp-foto, d13-foto-LiDAR and h-foto; calculate for each photo-tree the volume estimates: vtot-STRS-LiDAR, vsaw-STRS-LiDAR, vpulp-STRS-LIDAR and Vwaste-STRS-LiDAR. Using these, compute estimates for the whole area: total volume, saw wood volume, pulpwood volume, volume of pine, volume of spruce and volume of broadleaved and volume of dead standing trees. Report them in a table.

Table #. Timber resources based on STRS

Volume	#### m3
Saw wood	#### m3
Pulp wood	#### m3
Vol Pine	#### m3
Vol spruce	#### m3
Vol Broadl.	#### m3
Vol Dead	#### m3

9. Conclusions and accuracy analysis

Write a 1-page condense report on your findings on the performance STRS and the LiDAR-regression estimation method (Sections 6 and 7).

Perform an analysis on the accuracy of the timber resources by field observations (Section 4): Calculate error estimates using the formulas for random sampling. Provide error estimates for the i) total volume (m3), ii) mean volume (m3/ha), iii-vi) volumes per species (m3) and vii-ix) volumes of sortiments (m3).