In archaeology it is not uncommon to try to classify large data amounts based on either numeric or categorical attributes. The large amount of data requires an automated process and usually the number and nature of the classes is not known beforehand. Therefore using clustering, a method from unsupervised machine learning, comes as a natural choice. A variety of freely available programming libraries and standalone tools for clustering will be presented and evaluated against a mixed multivariate data set describing Minoan and Mycenean multi-sided seals from the „Corpus der minoischen und mykenischen Siegel“ (CMS). The dataset contains all kind of attributes, of which some can even be a bag of values (i.e. contain multiple values at once). Preparation of data plays a prominent role which is why tools to accomplish this task will be presented along some lessons learned from the practice. The talk will end with a short overview of appropriate tools and mechanisms to visualise clustering results. (Martina Trognitz)

0 Comments