Text Mining

Multilayer network based comparative document analysis (MUNCoDA)

The proposed multilayer network-based comparative document analysis (MUNCoDA) method supports the identification of the common points of a set of documents, which deal with the same subject area. As documents are transformed into networks of informative word-pairs, the collection of documents form a multilayer network that allows the comparative evaluation of the texts. The multilayer network can be visualized and analyzed to highlight how the texts are structured. The topics of the documents can be clustered based on the developed similarity measures. By exploring the network centralities, topic importance values can be assigned. The method is fully automated by KNIME preprocessing tools and MATLAB/Octave code.

•Networks can be formed based on informative word pairs of a multiple documents

•The analysis of the proposed multilayer networks provides information for multi-document summarization

•Words and documents can be clustered based on node similarity and edge overlap measures

V. Sebestyén, E. Domokos, J. Abonyi : Multilayer network based comparative document analysis (MUNCoDA), MethodsX, Volume 7, 2020