Exercise 3949. Points 4, theme: Cluster Analysis

Coverage of tree species was estimated for six stands. Trees are in rows and stands (koht1 ...koht6) in columns in the attached file.
  1. Which two stands form the most similar pair of stands?
  2. Is their similarity the largest according to all quotients available in the SDC?
  3. Which pair of stands includes the most different stands?
  4. How to interpret different Euclidean distance between observations that have zero similarity?
Data: Yl2195.txt


Notice the following when calculating similarity coefficients using the similarity module in the online calculator.
  • Similarity between sites presented in rows is calculated according to numerical site characteristics in columns.
  • The calculator does not allow empty cells (undefined values). Write 0 if a species is missing in a site.
  • Input only raw data without column headers and totals. Uncheck Variable names are in the first row.
  • Case numbers must be in the first column.
For this exercise, the empty cells must be filled with zeros and then the table must be transposed. In Excel: Copy → Paste special → Transpose.
  • Check that the features of the first stand are in the first row in transposed table, features of the second stand in the second row etc.
  • Check the presence of ID numbers in the first column.
  • Copy the data to the input of the online calculator.
  • Press Calculate.
  • Notice that the first two rows of the results contain ID numbers of the pairs of cases.
  • Notice also that the block distance and the Euclidean distance are distance metrics while the others are similarity metrics.
