To the list of courses || GAT2018 || To the theme || Estonian

Exercise 3666. Points 5, theme: Cluster Analysis

Open exercise
The attached file contains some (somewhat outdated) data about EU member states. Standardize the variables Elanike arv (population) and GDP per Capita in the work sheet Majandus by subtracting the mean and dividing by the SD.
  1. Which groups of member states are formed by cluster analysis? NB! All countries must be classified to a cluster.
  2. Give a name to each group.
  3. Which method and distance function you used to get these clusters? Why you preferred these options?
  4. Why was asked to standardize the variables?
  5. Which proportion of variability is described by these clusters?
  6. Add the cluster tree if you used tree clustering.
Data: EU.xlsx


  • Copy the country names and the variables necessary for this task to columns located next to each other.
  • Calculate the mean and standard deviation (SD) of both variables.
  • Standardize the variables by subtracting the mean from each value and dividing the result with the SD. It is easy using cell addresses fixed with $ sign.
  • Copy the names standardized variables to the input of SDC Cluster analysis.
  • Switch in the option Variable names are in the first row .
  • Try different clustering methods and choose the final result the method which yields better interpretable clusters.
  • You can decide the number of clusters by yourself by selecting pruning level for the cluster tree or deciding k value for the k-means clustering. Prefer statistically significant clusters which you a able to give short titles. About 200 iterations is enough for a learning exercise.
Log in to send your results and to see the expected answer and responses from other students.