Data analysis and reporting;
Mon May, 14  Fri May 18, 2007 After the field work we have a common data set. All groups do the analysis listed in the page. The data set has
 a treefile and  a plotfile in which the results of the calculations are presented. These files have as many records as there were trees and plots in the inventory. You will have many other files, in which your computations and intermediate variables are stored. Document and store everything carefully from the start, as the documentation and files used in the computations will be a part of the report that is given a grade. Use the variable names that are given in this documentation. Your document can be written in the form of a flow chart, with references (links) to files, where appropriate. Consider writing it using HTML or equivalent hypertext documentation language. The structure can follow the sections 18 of this document, and it is recommendable to copy the tabletemplates etc. from this document. Our data analysis tasks will include: 1) An evalution of the performance of the STRSsystem 2) Calculations of timber resources in the area using the sample plots with:  Field observations  STRS observations The analyses can be done with MSExcel. Some tasks require programming and we use VBA, which is built in Excel. Alternatively you can program in Python or other languages that you find support for. Logitmodeling (of class variables) can not be done inside Excel easily. For that and any statistical analysis you can use the statistical tools that you are familiar with or Excel. SAS, SPSS and others should be available. Timetable Monday  Prepare treerecords in plotnumber.xls files, one file for each plot. Have them sent to course assistant, sent each plot once it is done, so he can join them into one big observation matrix. + Check d13ref x href and d13ref x hfoto distributions, mark suspicious (gross outliers), put 1/0 to column "Research" in the plotnumber.xls  Prepare field forms, originals on paper and an xlsfile, one per plot. Have papers stored in a folder and the xls's sent to course assistant  send assistant the file group.csv, an ascii file having all your valid (used) azimuths and distances for the triangulated trees (proper input file for the calculator program), include center poles with number 200. Tuesday Sections 1 5 Wednesday Sections 67 (8) Friday (8)9 Treerecords from Hyytiälä (joined plotnumber.xls files) When we are done with the field work, our data set has pertree records from the 59 0.04ha plots:
Process the 2004 LiDAR data using the SUVANTOLiDARLASKIN program and calculate estimates for
2. Calculation of treelevel reference values 2.1 Height estimation for tally trees using sample tree information We have height measurements for every N'th sample tree. We need heights for the tally trees in order to compute their volumes with taper curves. Stratify the 59 plots in the following 5 strata using the HGMLiDAR variable as the basis of stratification. We try to have the plots stratifield according to the development class. 1) HGMLIDAR < 10 m 2) HGMLIDAR 1015 m 3) HGMLIDAR 1520 m 4) HGMLIDAR 2025 m 5) HGMLIDAR >25 m In each stratum, estimate separately Näslund's height curves for pine, spruce and broadleaved trees. This gives 5×3 linear regression models of type y = a + b×d13ref, where y = SQRT(d13ref^{2}/(href1.3)) in the LS estimation of parameters a and b. When you finally apply the model and calculate heights (hrefnasl) for the tally trees of certain species in a stratum, use the form 2.2 Stem bucking and calculation of reference volumes For each tree, using the reference values: d13ref, href or hrefnasl height and Spref, calculate the total stem volume and volume in saw logs, pulp logs and treetop using the MARV42_POLKYTTAJA program. Name these variables vref, vsawref, vpulpref, vwasteref. Store them in your treefile. 2.3 STRS: Allometric modeling of stem diameters The following treelevel variables are calculated using the STRS observations.
3. Calculation of plotlevel reference variables from field observations Calculate, for each 0.04ha plot, the following variables using the reference measurements
Arriving at these variables requires a little programming. Consult tutors and this document for programming help. Visual basic (for applications, VBA, an eventbased programming language) can be used from within Excel. Pressing ALTF11 opens a VBA editor, in which you can write and execute code. Doubleclick item Sheet1, and a codewindow opens. Select Worksheet from the pulldown menu, and program the event "Calculate". You can make an ASCIIfile, myfile.txt, with treerecords: e.g. plotid, treeid, STRSstatus, d13ref, Sampletree, href and hrefnasl. The example code below calculates the stem number (stemnref), basal area (Gref), basal area weighted mean height (Hgref) for each plot. Make alterations to the code to get the other needed plotlevel variables listed above. First, insert a module for declaring a special/helpful data structure: Menu command Insert  Module (Module1). Under "general declarations" spesify a data structure that is given the name "Plot" and declare an array of this data structure with 224 entries: Type Plot plotId As Long NtreesIn As Long Dsum As Double Hsum As Double D2Sum As Double HD2Sum As Double End Type Public data(1 To 224) As Plot In the code window of Sheet1, enter the following code. It reads the treerecords in myfile.txt, stores the needed variables into the array of plot data type named data, and finally loops the array, find plots with observations and calculates the needed variables and writes the answers to a file. Lines that start with the "Rem" keyword are comments. Private Sub Worksheet_Calculate() Open "c:\data\myfile.txt" for input as 1 Rem Read the file line by line, calculate needed sums Rem and store variables to the array Do until EOF(1) input #1, plotN,tree,status,d13,sample,h_ref,h_ref_n data(plotN).NtreesIn = data(plotN).NtreesIn + 1 data(plotN).Dsum = data(plotN).Dsum + d13 data(plotN).D2sum = data(plotN).D2sum + d13^2 Rem select reference height between two, collect sums Rem for computing Hgref if sample = 0 then data(plotN).HD2sum = data(plotN).HD2sum + h_ref_n * d13^2 Else data(plotN).HD2sum = data(plotN).HD2sum + h_ref * d13^2 end if Loop Close(1) Rem Everything is now stored, compute and output Open "C:\data\MyOutput.txt" for output as 2 For i = 1 to 224 if data(i).NtreesIn > 0 then Rem this is a plot i with trees StemNumber = data(i).NtreesIn / 0.04 BasalAr = data(i).D2Sum * (3.1415/4) Hg = data(i).HD2sum/data(i).D2sum Rem Print values to file Print #2, i, StemNumber, BasalAr, Hg end if Next i Close(2) Msgbox("I'm done!") End Sub The Code executes, when you press F9 in Excel when Sheet1 is active, or enter a formula, that needs calculations as it triggers the event "Calculate". Nyyssönen's heightformfunctions (muotokorkeusfunktiot) for computing volume in Bitterlichtype of plots: VtotrefBit = F*G*H, are: for pine, spruce and birch stands. Use Hgref or HgMref for the mean height (H). FH = 0.4116  0.04275*H^1.5 + 0.6359 * H FH = 1.3187 + 0.00099*H^2 + 0.3978 * H FH = 0.4907  0.00137*H^2 + 0.4556 * H 4. Calculations of timber resources using the reference data The area of the forest holding is 56.8 ha. Each 0.04ha plot represents an area of 0.9635 ha of the total area. Based on this information, calculate the total estimates for the forest holding. Crosstabulations: Table 1. Timber resources: Stem number (#1, n) and volume (#2, m3) by species and stem dbhclasses; living trees.
Table 2. Timber resources: Volume (#1, m3) by sortiments and species; living trees.

5. Computation of relative tree heights Using Hdomref of each plot, compute the relative height for each tree and scale it between 0...1 by simply truncating values > 1 to 1.0. For sample trees: hrelref = href / Hdomref For tally trees: hrelref = hrefnasl / Hdomref 6. Assessment of STRS (single tree) estimates 6.1 Correctly found trees (Matchrate), omission and commission errorrates These three statistics are for the whole data set of 59 plots. They are percentages of stem number and volume, of living trees, see EQs i)vi) below. These statistics describe the overall potential of the STRS method in finding trees. Commission errors are phototrees, that were rejected in the field, because no tree (or stump) was found in that location or anywhere near (within a meter) the XY location of the top. Omission errors consist of the trees that were missed by the STRS system, and only the reference measurements "have this tree". In i)vi) below, N refers to the number and Vol is the total volume. i) MatchrateN% of correctly found trees in stem number = (N_STRS_treesN_Commission) / (N_All_trees) * 100% ii) MatchrateVol% of correctly found trees in stem number = (Vol_STRS_trees) / (Vol_All_trees) * 100% iii) OmissionN% = (N_omission) / (N_All_trees) iv) OmissionVol% = (Vol_omission) / (Vol_All_trees) v) CommissionN% = (N_commission) / (N_All_trees) vi) CommissionVol% = (Vol_commission) / (N_All_trees) 6.2 Discernibility statistics Prepare a table, where you give the percentages of correctly found and missed trees of all trees over classes of relative height.
Optional part of 6.2: Model the probability of measurability of a tree using logistic regression i.e. model the probability (0...1) that a the tree was found = 1, or missed = 0 and explain the probability with relative tree height. The model formulation is : Tree is measurable (0/1) = f (hrelref). I.e. apply Binomial logistic regression. Also, compute for each plot, a measure of relative density of the stand using relationship between the mean height of the stand and the Basal area: DensityMeasure = Gref/Hgref Use this variable together with hrelref as covariates for the measurability (0/1): Binomial logistic regression. 6.3 Accuracy of height estimates Report by filling the tables below. Study the distribution of the residuals: (href  hfoto) and (href  hLiDAR). N = N of sample trees. Give the RMS errors in both absolute and relative values. Use the arithmetic mean of reference heights in transforming the absolute RMSE into relative RMSE in %. The analysis can only be done for the correctly found trees, give their number (N). Table #. Accuracy of photogrammetric height estimation.
Table #. Accuracy of LiDARbased height estimation
6.4 Accuracy of d13estimates Report by filling the tables below. Study the distribution of the residuals: (d13ref  d13foto) and (d13ref  d13fotoLiDAR). Give the RMS both in absolute (cm) and relative values (%). Use the arithmetic mean of reference d13 in transforming the absolute RMSE to the relative RMSE in %. The analysis can only be done for the correctly found trees, give their number (N). Table #. Accuracy of photogrammetric d13 estimation.
Table #. Accuracy of LiDARbased d13 estimation
6.5 Accuracy assesment of species recognition Using variables Spfoto and Spref for the correctly found phototrees, prepare a table called the confusion or error matrix of classification.
The table shows e.g that #1.1 pines were correctly classified as pines, and #2.1 pines were seen as spruces in images. The sum of the diagonal (N correct) is used in computing the overall accuracy%. Compute it. For that analysis, reclassify Spref into classes of Pine, Spruce, Broadleaved and Dead, i.e. make the matrix to have an equal number of columns and rows. Look for example by Åsa Persson: http://www.isprs.org/commission8/workshop_laser_forest/PERSSON.pdf, Table 1. Determine the Kappacoefficient of the classification, which another measure of classification success. See textbook by Tokola et al. Metsän kaukokartoitus?, Cohen's Kappa: Cohen, J. (1960) A coefficient of agreement for nominal scales.
Educational and Psychological Measurement, 20: 3746. Optional: For each phototree model the class variables identification successfull (0/1) using logistic regression: Success of recognition (0/1) = f(hrelref) 6.6 Accuracy of singletree volumeestimates Using MARV42_POLKYTTAJA, compute STRS volume estimates with taper curves vSTRSfoto = f(Spfoto, d13foto, hfoto) vSTRSLiDAR = f(Spfoto, d13fotoLiDAR, hfoto) Calculate the differences in single tree volumes: (vref  vSTRSfoto) and (vref  vSTRSLiDAR) and calculate RMSE (absolute and %), SD and mean of these errors/differences. Table #. Accuracy of fotobased singletree volume estimation
Table #. Accuracy of LiDARbased volume estimation
7. Assesment of LiDAR regression estimates of V, N, G, DgM and HgM Calculate the differences of plot variables VLiDAR (m3/ha), NLiDAR (n/ha), GLiDAR (m2/ha), HGMLiDAR (m) and DGMLiDAR (m) with respect to the reference values: Error/difference = (reference  #LiDAR). N = 59. Analyze the differences, give overall RMSE, SD and mean for the 5 variables and give them in the form of a table. Assume Dg ~ DGM and Hg ~ HGM and take the references Dgref and Hgref. Table #. Accuracy of LiDARbased volume estimation
Study the residuals further and provide plots (5) : distribution of the 5 variables and the proportion of broadleaved trees of volume. Are the errors associated with the presence of broadleaved trees? Plot1: Volume errors x Proportion of broadleaved trees Plot2: Basal area errors x Proportion of broadleaved trees Plot3: Stem number errors x Proportion of broadleaved trees Plot4: DGM errors x Proportion of broadleaved trees Plot5: HGM errors x Proportion of broadleaved trees 8. Calculations of timber resources using STRS estimates Using Spfoto, d13fotoLiDAR and hfoto; calculate for each phototree the volume estimates: vtotSTRSLiDAR, vsawSTRSLiDAR, vpulpSTRSLIDAR and VwasteSTRSLiDAR. Using these, compute estimates for the whole area: total volume, saw wood volume, pulpwood volume, volume of pine, volume of spruce and volume of broadleaved and volume of dead standing trees. Report them in a table. Table #. Timber resources based on STRS
9. Conclusions and accuracy analysis Write a 1page condense report on your findings on the performance STRS and the LiDARregression estimation method (Sections 6 and 7). Perform an analysis on the accuracy of the timber resources by field observations (Section 4): Calculate error estimates using the formulas for random sampling. Provide error estimates for the i) total volume (m3), ii) mean volume (m3/ha), iiivi) volumes per species (m3) and viiix) volumes of sortiments (m3). 