Frequent Itemset Miniing and Multi-Layer Network-Based Analysis of RDF Databases
Triplestores or resource description framework (RDF) stores are purpose-built databasesused to organise, store and share data with context. Knowledge extraction from a large amountof interconnected data requires effective tools and methods to address the complexity and theunderlying structure of semantic information. We propose a method that generates an interpretablemultilayered network from an RDF database. The method utilises frequent itemset mining (FIM)of the subjects, predicates and the objects of the RDF data, and automatically extracts informativesubsets of the database for the analysis. The results are used to form layers in an analysablemultidimensional network. The methodology enables a consistent, transparent, multi-aspect-orientedknowledge extraction from the linked dataset. To demonstrate the usability and effectiveness ofthe methodology, we analyse how the science of sustainability and climate change are structuredusing the Microsoft Academic Knowledge Graph. In the case study, the FIM forms networks ofdisciplines to reveal the significant interdisciplinary science communities in sustainability and climatechange. The constructed multilayer network then enables an analysis of the significant disciplinesand interdisciplinary scientific areas. To demonstrate the proposed knowledge extraction process, wesearch for interdisciplinary science communities and then measure and rank their multidisciplinaryeffects. The analysis identifies discipline similarities, pinpointing the similarity between atmosphericscience and meteorology as well as between geomorphology and oceanography. The results confirmthat frequent itemset mining provides an informative sampled subsets of RDF databases which canbe simultaneously analysed as layers of a multilayer networ
Analytic Hierarchy Process and Multilayer Network-Based Method for Assembly Line Balancing
Assembly line balancing improves the efficiency of production systems by the optimal assignment of tasks to operators. The optimisation of this assignment requires models that provide information about the activity times, constraints and costs of the assignments. A multilayer network-based representation of the assembly line-balancing problem is proposed, in which the layers of the network represent the skills of the operators, the tools required for their activities and the precedence constraints of their activities. The activity–operator network layer is designed by a multi-objective optimisation algorithm in which the training and equipment costs as well as the precedence of the activities are also taken into account. As these costs are difficult to evaluate, the analytic hierarchy process (AHP) technique is used to quantify the importance of the criteria. The optimisation problem is solved by a multi-level simulated annealing algorithm (SA) that efficiently handles the precedence constraints. The efficiency of the method is demonstrated by a case study from wire harness manufacturing.
Multilayer network based comparative document analysis (MUNCoDA)
The proposed multilayer network-based comparative document analysis (MUNCoDA) method supports the identification of the common points of a set of documents, which deal with the same subject area. As documents are transformed into networks of informative word-pairs, the collection of documents form a multilayer network that allows the comparative evaluation of the texts. The multilayer network can be visualized and analyzed to highlight how the texts are structured. The topics of the documents can be clustered based on the developed similarity measures. By exploring the network centralities, topic importance values can be assigned. The method is fully automated by KNIME preprocessing tools and MATLAB/Octave code.
•Networks can be formed based on informative word pairs of a multiple documents
•The analysis of the proposed multilayer networks provides information for multi-document summarization
•Words and documents can be clustered based on node similarity and edge overlap measures
Focal points for sustainable development strategies—Text mining-based comparative analysis of voluntary national reviews
Countries have to work out and follow tailored strategies for the achievement of their Sustainable Development Goals. At the end of 2018, more than 100 voluntary national reviews were published. The reviews are transformed by text mining algorithms into networks of keywords to identify country-specific thematic areas of the strategies and cluster countries that face similar problems and follow similar development strategies. The analysis of the 75 VNRs has shown that SDG5 (gender equality) is the most discussed goal worldwide, as it is discussed in 77% of the analysed Voluntary National Reviews. The SDG8 (decent work and economic growth) is the second most studied goal, With 76 %, while the SDG1 (no poverty) is the least focused goal, it is mentioned only in 48 % of documents and the SDG10 (reduced inequalities) in 49 %. The results demonstrate that the proposed benchmark tool is capable of highlighting what kind of activities can make significant contributions to achieve sustainable developments.
A multilayer and spatial description of the Erasmus mobility network
The Erasmus Programme is the biggest collaboration network consisting of European Higher Education Institutions (HEIs). The flows of students, teachers and staff form directed and weighted networks that connect institutions, regions and countries. Here, we present a linked and manually verified dataset of this multiplex, multipartite, multi-labelled, spatial network. We enriched the network with institutional socio-economic data from the European Tertiary Education Register (ETER) and the Global Research Identifier Database (GRID). We geocoded the headquarters of institutions and characterised the attractiveness and quality of their environments based on Points of Interest (POI) data. The linked datasets provide relevant information to grasp a more comprehensive understanding of the mobility patterns and attractiveness of the institutions.
Review and structural analysis of system dynamics models in sustainability science
As the complexity of sustainability-related problems increases, it is more and more difficult to understand the related models. Although tremendous models are published recently, their automated structural analysis is still absent. This study provides a methodology to structure and visualise the information content of these models. The novelty of the present approach is the development of a network analysis-based tool for modellers to measure the importance of variables, identify structural modules in the models and measure the complexity of the created model, and thus enabling the comparison of different models. The overview of 130 system dynamics models from the past five years is provided. The typical topics and complexity of these models highlight the need for tools that support the automated structural analysis of sustainability problems. For practising engineers and analysts, nine models from the field of sustainability science, including the World3 model, are studied in details. The results highlight that with the help of the developed method the experts can highlight the most critical variables of sustainability problems (like arable land in the Word 3 model) and can determine how these variables are clustered and interconnected (e.g. the population and fertility are key drivers of global processes). The developed software tools and the resulted networks are all available online.
Data-driven multilayer complex networks of sustainable development goals
This data article presents the formulation of multilayer network for modelling the interconnections among the sustainable development goals (SDGs), targets and includes the correlation based linking of the sustainable development indicators with the available long-term datasets of The World Bank, 2018. The spatial distribution of the time series data allows creating country-specific sustainability assessments. In the related research article “Network Model-Based Analysis of the Goals, Targets and Indicators of Sustainable Development for Strategic Environmental Assessment” the similarities of SDGs for ten regions have been modelled in order to improve the quality of strategic environmental assessments. The datasets of the multilayer networks are available on Mendeley.
Network Model-Based Analysis of the Goals, Targets and Indicators of Sustainable Development for Strategic Environmental Assessment
Strategic environmental assessment is a decision support technique that evaluates policies, plans and programs in addition to identifying the most appropriate interventions in different scenarios. This work develops a network-based model to study interlinked ecological, economic, environmental and social problems to highlight the synergies between policies, plans, and programs in environmental strategic planning. Our primary goal is to propose a methodology for the data-driven verification and extension of expert knowledge concerning the interconnectedness of the sustainable development goals and their related targets. A multilayer network model based on the time-series indicators of the World Bank open data over the last 55 years was assembled. The results illustrate that by providing an objective and data-driven view of the correlated variables of the World Bank, the proposed layered multipartite network model highlights the previously not discussed interconnections, node centrality measures evaluate the importance of the targets, and network community detection algorithms reveal their strongly connected groups. The results confirm that the proposed methodology can serve as a data-driven decision support tool for the preparation and monitoring of long-term environmental policies. The developed new data-driven network model enables multi-level analysis of the sustainability (goals, targets, indicators) and will make it possible to plan long-term environmental strategic planning. Through relationships among indicators, relationships among targets and goals can be modelled. The results show that sustainable development goals are strongly interconnected, while the 5th goal (gender equality) is linked mostly to 17th (partnerships for the goals) goal. The analysis has also highlighted the importance of the 4th (quality education).
Sebestyén V., Bulla M., Rédey Á., Abonyi J.: "Network Model-Based Analysis of the Goals, Targets and Indicators of Sustainable Development for Strategic Environmental Assessment", Journal of Environmental Management, 2019, 238, 126-135
Frequent pattern mining in multidimensional organizational networks
Network analysis can be applied to understand organizations based on patterns of communication, knowledge flows, trust, and the proximity of employees. A multidimensional organizational network was designed, and association rule mining of the edge labels applied to reveal how relationships, motivations, and perceptions determine each other in different scopes of activities and types of organizations. Frequent itemset-based similarity analysis of the nodes provides the opportunity to characterize typical roles in organizations and clusters of co-workers. A survey was designed to define 15 layers of the organizational network and demonstrate the applicability of the method in three companies. The novelty of our approach resides in the evaluation of people in organizations as frequent multidimensional patterns of multilayer networks. The results illustrate that the overlapping edges of the proposed multilayer network can be used to highlight the motivation and managerial capabilities of the leaders and to find similarly perceived key persons.
The Settlement Structure Is Reflected in Personal Investments: Distance-Dependent Network Modularity-Based Measurement of Regional Attractiveness
How are ownership relationships distributed in the geographical space? Is physical proximity a significant factor in investment decisions? What is the impact of the capital city? How can the structure of investment patterns characterize the attractiveness and development of economic regions? To explore these issues, we analyze the network of company ownership in Hungary and determine how are connections are distributed in geographical space. Based on the calculation of the internal and external linking probabilities, we propose several measures to evaluate the attractiveness of towns and geographic regions. Community detection based on several null models indicates that modules of the network coincide with administrative regions, in which Budapest is the absolute centre, and where county centres function as hubs. Gravity model-based modularity analysis highlights that, besides the strong attraction of Budapest, geographical distance has a significant influence over the frequency of connections and the target nodes play the most significant role in link formation, which confirms that the analysis of the directed company-ownership network gives a good indication of regional attractiveness.
Gadar Laszlo, Kosztyan Zsolt T., Abonyi Janos: "The Settlement Structure Is Reflected in Personal Investments: Distance-Dependent Network Modularity-Based Measurement of Regional Attractiveness", Complexity, 2018, Article ID 1306704, 16 pages
Evaluating the Interconnectedness of the Sustainable Development Goals Based on the Causality Analysis of Sustainability Indicators
Policymaking requires an in-depth understanding of the cause-and-effect relationships between the sustainable development goals. However, due to the complex nature of socio-economic and environmental systems, this is still a challenging task. In the present article, the interconnectedness of the United Nations (UN) sustainability goals is measured using the Granger causality analysis of their indicators. The applicability of the causality analysis is validated through the predictions of the World3 model. The causal relationships are represented as a network of sustainability indicators providing the opportunity for the application of network analysis techniques. Based on the analysis of 801 UN indicator types in 283 geographical regions, approximately 4000 causal relationships were identified and the most important global connections were represented in a causal loop network. The results highlight the drastic deficiency of the analysed datasets, the strong interconnectedness of the sustainability targets and the applicability of the extracted causal loop network. The analysis of the causal loop networks emphasised the problems of poverty, proper sanitation and economic support in sustainable development.
Dörgő Gy., Sebestyén V., Abonyi J.:"Evaluating the Interconnectedness of the Sustainable Development Goals Based on the Causality Analysis of Sustainability Indicators", Sustainability 2018, 10(10), 3766, doi:10.3390/su10103766
Automated Analysis of the Interactions Between Sustainable Development Goals Extracted from Models and Texts of Sustainability Science
The design and monitoring of sustainable policies should rely on models that can handle complex and interconnected variables and subsystems of sustainability issues. Structuring knowledge has been identified as an essential first step in building models of sustainability science. Although it is known that all models yield a reduced view of the examined topic and no models can include all the variables that would make the representation closed and comprehensive, in the case of sustainability issues it is critical to synthesize as many critical aspects as possible that could have an impact on the studied problem. The key idea of our research is that strategic plans, sustainability reports and scientific studies reflect these variables, therefore, with the tools of text mining, the most important focus points and interactions can be determined. These key aspects and their connections can be represented by a network structure and compared to the subsystems of the dynamic models of sustainability to explore the deficiencies of the models or the lack of focus of the related policies and documentations. In the present work, the proposed methodology through the analysis of five strategical documents is demonstrated and the determined aspects with the structure of the famous World3 system dynamics model compared. The comparison highlighted the incomplete view of the original World3 model since certain topics were not critical issues whilst the World3 model was in development.
Dorgo, Gyula, Gergely Honti, and János Abonyi. "Automated Analysis of the Interactions Between Sustainable Development Goals Extracted from Models and Texts of Sustainability Science." Chemical Engineering Transactions70 (2018): 781-786.
Graph configuration model based evaluation of the education-occupation match
To study education—occupation matchings we developed a bipartite network model of education to work transition and a graph configuration model based metric. We studied the career paths of 15 thousand Hungarian students based on the integrated database of the National Tax Administration, the National Health Insurance Fund, and the higher education information system of the Hungarian Government. A brief analysis of gender pay gap and the spatial distribution of over-education is presented to demonstrate the background of the research and the resulted open dataset. We highlighted the hierarchical and clustered structure of the career paths based on the multi-resolution analysis of the graph modularity. The results of the cluster analysis can support policymakers to fine-tune the fragmented program structure of higher education.
All the files and the R code are available at: https://github.com/abonyilab/Edu_Mine_Graph
Multilayer Network-Based Production Flow Analysis
A multilayer network model for the exploratory analysis of production technologies is proposed. To represent the relationship between products, parts, machines, resources, operators, and skills, standardized production and product-relevant data are transformed into a set of bi- and multipartite networks. This representation is beneficial in production flow analysis (PFA) that is used to identify improvement opportunities by grouping similar groups of products, components, and machines. It is demonstrated that the goal-oriented mapping and modularity-based clustering of multilayer networks can serve as a readily applicable and interpretable decision support tool for PFA, and the analysis of the degrees and correlations of a node can identify critically important skills and resources. The applicability of the proposed methodology is demonstrated by a well-documented benchmark problem of a wire-harness production process. The results confirm that the proposed multilayer network can support the standardized integration of production-relevant data and exploratory analysis of strongly interconnected production systems.
Scalable co-Clustering using a Crossing Minimization ‒ Application to Production Flow Analysis
Production flow analysis includes various families of components and groups of machines. Machine-part cell formation means the optimal design of manufacturing cells consisting of similar machines producing similar products from a similar set of components. Most of the algorithms reorders of the machine-part incidence matrix. We generalize this classical concept to handle more than two elements of the production process (e.g. machine - part - product - resource - operator). The application of this extended concept requires an efficient optimization algorithm for the simultaneous grouping these elements. For this purpose, we propose a novel co-clustering technique based on crossing minimization of layered bipartite graphs. The present method has been implemented as a MATLAB toolbox. The efficiency of the proposed approach and developed tools is demonstrated by realistic case studies. The log-linear scalability of the algorithm is proven theoretically and experimentally.
Pigler, Csaba, Ágnes Fogarassy-Vathy, and János Abonyi. "Scalable co-Clustering using a Crossing Minimization‒Application to Production Flow Analysis." Acta Polytechnica Hungarica 13.2 (2016): 209-228.
Node Similarity Based Graph Clustering and Visualization
The basis of the presented methods for the visualization and clustering of graphs is a novel similarity and distance metric, and the matrix describing the similarity of the nodes in the graph. This matrix represents the type of connections between the nodes in the graph in a compact form, thus it provides a very good starting point for both the clustering and visualization algorithms. Hence visualization is done with the MDS (Multidimensional Scaling) dimensionality reduction technique obtaining the spectral decomposition of this matrix, while the partitioning is based on the results of this step generating a hierarchical representation. A detailed example is shown to justify the capability of the described algorithms for clustering and visualization of the link structure of Web sites.