To the list of courses || GAT2018 || To the theme || Estonian

Exercise 3668. Points 4, theme: Cluster Analysis

Open exercise
  1. What are the four forest stand types separated by cluster analysis (k-means clustering, k = 4) from data in the attached file? Tree species names in Latin are in worksheet Puud.
  2. Which part of the difference between stands is described by these clusters?
  3. How large is the probability that the same share or larger amount of differences could be described if the trees were distributed randomly among stands?
  4. Do the clusters depend on the distance function (Euclidean, Block, SQ Euclidean etc)? Which distance function yields in different clusters?
  5. Mention the distance function you used for answering the first and second questions.


  • The number of clusters has to be given for the k-means clustering. Here, k = 4.
  • The object to cluster are observations in rows and variables are tree species proportions in columns from Kuusk to the category other trees Muud_puud. The answer should be stand name not tree species name. E.g. if the wood consists mainly of pines it is pine wood. The wood is called mixed if there is no clear dominant tree species. A statistical cluster can also include forest stands dominated by different species.
  • If using the SDC Cluster analysis, copy the columns with tree proportions to the input cell. If tree names are included check Variable names are in the first row.
  • Uncheck Object names are in the first column.
  • The number of iterations can be zero, as significance is not asked.
  • Press Calculate.
    The members of each cluster are in the results panel. The proportion of explained variance is in the header part of results.
If using Statsoft Statistica, you can find the members of each cluster in AdvancedMembers of each cluster distances.
Log in to send your results and to see the expected answer and responses from other students.