This website is devoted to the presentation of a diploma thesis whose topic is the analysis of data on the utilization of railway infrastructure in terms of analysis of the time component of data and categorization. The diploma thesis was created at the Department of Geoinformatics at Palacký University in Olomouc in cooperation with the Railway Administration of the state organization and its head is doc. Ing. Zdena Dobešová, Ph.D.
The main goal of this diploma thesis was to analyze the time component of railway infrastructure occupancy data. Furthermore, the evaluation of data on railway infrastructure and their categorization. The data provided for individual monitoring points and the sections between them ranged from 2016 to 2019. The data were provided by the state organization Railway Administration.
Before the practical part of this work, it was necessary to obtain a theoretical basis for individual areas. First and foremost, it was the legislative framework for the development of railways in the Czech Republic and the administrator of the railway line, which had to be understood especially in terms of subsequent interpretation and naming of individual functions that are related to the issue of railways in the Czech Republic. Another area was the laying of a theoretical basis for the issue of the railway network in the Czech Republic, where emphasis was placed especially on the division of railway lines and the location of important passenger and freight railway corridors. The theoretical part then continued with a survey of professional publications and articles published on a similar topic as addressed in this work, or working with data of a similar nature. The conclusion of the theoretical part was filled with a search of clustering and time series, as two methods that are key to the submitted thesis.
As part of the practical solution, even before the analyzes themselves, it was necessary to get acquainted with the data and their preprocessing. The data in the provided state was not suitable for immediate use and had to be adjusted. Data preprocessing took place on the basis of requirements given by time series analysis and categorizations. The data were adjusted to a suitable form by filtering, adding, removing duplicates and contingency tables. The acquaintance with the data was based on the description of the structure and attributes of the data for individual monitoring points and the sections between them. Furthermore, simple visualizations were created in the form of basic map outputs and graphs for a closer understanding of the load situation of the railway network in the Czech Republic.
The first of the two analyzes performed on the provided data was the analysis of time series of section data in the time period 2016-2019. The analysis was performed by additive decomposition into components, which were then used to describe the individual time series. Due to the large amount of input data, 3 categories were defined, the representatives of which were then subjected to analysis. These were corridor sections, border crossings and interesting sections. The result of this analysis are interpretations based on visual analysis of graphs of individual components from the decomposition of time series.
The second analysis is the categorization of railway infrastructure data for individual monitoring points. At the beginning of the analysis, several approaches were defined, on the basis of which the data were subsequently categorized. Approaches were defined by categorization methods and combinations of input attributes. The first method was the percentage expression of the ratio of passenger and freight transport in terms of the number and weight of trainsets. This was a relative expression without the influence of absolute data. The second method was categorization in the form of clustering. Based on testing and research, hierarchical clustering by the Ward method using Euclidean metrics was selected. The input attributes were annual totals of data on the number or weight of passenger and freight trainsets and monthly totals of total train composition weights. The use of these data did not show the desired results, and so other variants were tested so that the results reflected as much as possible the relative fluctuations of time series. The result of this part is a data set of categorized data, map outputs of all categorizations and interpretations with the evaluation of individual clusters.
This work is a pilot example of the processing of this unique data on the utilization of railway infrastructure in such a format, which was thus released for the very first time. The resulting analyzes offer an interesting view of the situation of railway transport on SŽ networks in the Czech Republic. A complete view of the variability of monitoring points is given by categorizations, for which several variants are offered within the work, which are appropriately commented. A closer look at the time variability of selected categories of stations from four consecutive years is given by the analysis of time series. Here, the situation of behavior within this time period is described with accuracy to individual months within four years, with emphasis on the main transit corridors, border crossings and traffic monitoring points of interest. The data sets created as a result of the submitted diploma thesis will be available for further processing at the Department of Geoinformatics, so that the agreement with the provider of the original data is not violated. The results of the work will be presented by SŽ.