To the list of courses || RASA2018 || To the theme || Estonian

Exercise 4051. Points 3, theme: Model validation

Open exercise
The attached file contains presence absence records of two orchid species together with similarity values (as percentages) to a set of exemplar sites of the species.
  1. Which species presence is better predicted by the site similarity according to the ROC AUC?
  2. Which is the TPR and FPR for both species if to use threshold level 0.5?
  3. At which thershold level is the species presence recognition the highest?
Data: KaksKäppa.txt


ROC (Receiver Operating Characteristic orRelative Operating Characteristic) curve is a plot of the true positive rate (TPR) against the false positive rate (FPR) at various threshold levels. ROC characterizes a numerical classifier and helps in selecting the threshold level. ROC is used in tehnical, medical and psychological diagnostics. It can be applied also in predicting species distributions.
See Wikipedia and other sites, e.g
If using ROC in the SDC, open the ROC function and check, how should the data be mounted.
  • Open the attached data in Excel and arrange the values to columns. Check the end of table, the number of observation records is not equal for these species.
  • Copy the columns containing similarity values and presence / absence records of one species to the input cell.
  • Press Calculate
  • Save the results.
  • Do the same with the other species.
  • Area under curve (AUC) is in the header of results. As the values on a ROC graph both axis are proportions (0...1), the range of AUC values is 0...1. AUC = 0,5 indicates haphazard prediction (signal not related to the predictable binary variable). 
  • The most effective distinction level can be found by the true positive proportion out of all positive cases (PPV — positive predictive value).
Log in to send your results and to see the expected answer and responses from other students.