Exercise 3665. Points 1, theme: Cluster Analysis

Open exercise
Which two tree species are growing most separately (from other species and from each other) according to the data in the attached file and using cluster analysis (Euclidean distance, Single linkage)?
Data: SAAREMAA.xls


The given setting Euclidean distances means distance function to use, Single Linkage is the grouping rule.
Solution using the SDC
  • Open Cluster analysis in the SDC, select the above mentioned distance function and grouping method.
  • Catch a glimpse of the source data formating instructions and example data. Notice that the first column must contain case names or case codes.
  • Check the data format in the attached file. The species coverage points are there in columns D-Q.
  • Add a column after the existing column C and copy the ID codes to this column.
  • Copy the columns D-R (ID … muu_puuliik) to the SDC input window. You can select columns by clicking to the header as the SDC automatically removes empty rows in the end.
  • If the first row contains feature names, the corresponding SDC checkbox must be checked.
  • Select Group variables as this time the groups of trees (not sites) are asked.
  • The input should look like as shown in the attached figure.
  • Press Calculate,
  • Look from the cluster tree which branches are more separete. Tree names in Latin are in worksheet Puud.
  • By default, the tree is growing upward. Try other directions.
Solution in Statistica
  • Import data to Statistica. Be shure importing Excel worksheet not workbook.
  • The variable names must be in column headers not in the first line of data values.
  • Keep the data table open in Statistica and select from the menu Statistics → Multivariate Exploratory Analysis → Cluster Analysis.
  • Select Tree clustering and open panel Advanced (see attached figure).
  • Select all trees for variables. Start clustering.
  • Find the first two break away branches from the cluster tree.
