To the list of courses || GAT2018 || To the theme || Estonian

**Data**: EU.xlsx ### Instructions

## Exercise 3666. Points 5, theme: Cluster Analysis |
Open exercise |

The attached file contains some (somewhat outdated) data about EU member states. Standardize the variables *Elanike arv* (population) and *GDP per Capita* in the work sheet *Majandus* by subtracting the mean and dividing by the SD.

- Which groups of member states are formed by cluster analysis? NB! All countries must be classified to a cluster.
- Give a name to each group.
- Which method and distance function you used to get these clusters? Why you preferred these options?
- Why was asked to standardize the variables?
- Which proportion of variability is described by these clusters?
- Add the cluster tree if you used tree clustering.

- Copy the country names and the variables necessary for this task to columns located next to each other.
- Calculate the mean and standard deviation (SD) of both variables.
- Standardize the variables by subtracting the mean from each value and dividing the result with the SD. It is easy using cell addresses fixed with $ sign.
- Copy the names standardized variables to the input of SDC Cluster analysis.
- Switch in the option
*Variable names are in the first row*. - Try different clustering methods and choose the final result the method which yields better interpretable clusters.
- You can decide the number of clusters by yourself by selecting pruning level for the cluster tree or deciding
*k*value for the k-means clustering. Prefer statistically significant clusters which you a able to give short titles. About 200 iterations is enough for a learning exercise.

Log in to send your results and to see the expected answer and responses from other students.