Master thesis summary
The aim of the work was to compare European cities on the basis of frequented sets expressing the adjacency of different landuses (land uses) in cities. Another objective was to implement a program for the preparation of categorical or dichotomous polygon adjacency data necessary for the generation of frequented sets. Based on the identified frequented sets, the objective was to describe the character of cities (1), compare them and find similar cities (2). The source land use data was used from the Copernicus Urban Atlas 2018.
Overall, the work implements a non-traditional method, not commonly used in geoinformatics. The case studies themselves were primarily aimed at verifying the feasibility of the method. The primary outcome of the thesis is to find procedures for applying the method.
The theoretical part of the thesis dealt with general concepts of Data Mining, description of algorithms for computing frequent itemsets and description of studies that more or less implement the frequent itemset method on spatial data.
The first intermediate result of the thesis was the implementation of the tool for generating transactional land use adjacency data SearchDistinctLanduse_SpatialJoin introduced in more detail in Section 4.3.3. The tool calculates the corresponding adjacency data based on user-defined distance. The output of the tool is of two types. One is a text file containing categorical data. The second output is an MS Excel file containing dichotomous data (for use in Orange).
Within the chapter (5.1 Description of selected cities) a possible interpretation of the calculated frequency sets was presented using the example of the cities of Cheltenham (UK) and Prešov (SK). Subsequently, 3 case studies were developed in order to test the applicability of the method to spatial data.
The aim of the 1st case study (5.2 Case study - Czech cities) was to provide a summary description of Czech cities within the UA 2018 dataset, which includes a total of 15 cities. This group of cities includes regional cities of the Czech Republic as well as the cities of Most and Chomutov. Significant frequented sets were identified for the selected Czech cities. Land use combinations that are typical for cities in the Czech Republic were found. Conducting similar analyses for each nation within a given dataset would allow for comparison and identification of similarities and differences between these countries.
The objective of Case Study 2 (5.3 Case Study - European Cities) was to examine 100 selected European cities and find similar cities based on frequent sets of neighbourhood land use. Using hierarchical clustering, 12 groups of cities were found, the nature of which was described in subsections 5.3.1 to 5.3.12.
The aim of Case Study 3 (5.4 Comparison with the results of the study (Dobesova 2020)). In this work, 22 cities were used as input data, which were obtained from the study (Dobesova, 2020). This study identified pairs of cities that are similar based on the use of the Urban Atlas dataset area using the k-Nearest Neighbor method over the feature vector obtained from the Painters neural network (Kaggle, 2016). Compared to the original study, which used a complete dataset of approximately 800 cities, this paper used an input set of 100 cities. This reduction in the dataset may have affected the similarities found. The comparison aimed to check whether the similarities found were valid. The similarity between the cities Le Mans - Enschede and České Budějovice - Hradec Králové was confirmed.