Overview of the Architecture

This section expands upon Section 4.3, explaining in more details what is done, and how everything fits into the process described there. We refer to Chapter 4 for a fuller conceptual description of the overall process.

Figure A.1 presents the architecture, the modules and how they interact. This figure is based on Figure 4.6. Differences are that a few additional modules are included, and that the directed ar- rows represent not the flow of data, as in Figure 4.6, but the interconnections between modules, i.e. references between modules.

The experiments module is not quite like the others. First, it is not a functional module, but rather the glue that keeps everything together. This is the module that utilize the other modules in order to perform the various experiments. Second, it is not really one module, but rather a collection of * experiments.py files, containing experiments concerning the various representations. Of the connections from experiments to the other moduels in Figure A.1, only the most important ones are shown to avoid cluttering the diagram. The contents of this “module” are described in Section A.3 below.

The util module is used by many of the modules. Also here have we left out the dependencies in order do avoid cluttering. The module contains miscellaneous utility functions that does not naturally fit into any of the other modules.

The three leftmost modules, preprocess, data, and report data, are responsible for reading and doing textual preprocessing of the cases. data handles all file I/O, and utilize preprocess to make the nec- essary changes to the text. All preprocessing tasks are done by preprocess itself, except dependency parsing which is handled by stanford parser. The report data is used by data to retrieve textual cases from HTML-formatted documents in the AIR dataset.

The middle column of modules handles representation of documents as feature vectors. freq representation represents documents as TF and TF-IDF vectors, while graph representation build networks from the text and create vectors based on node centrality. The actual graph data structures and functions are contained in graph. The plotter module is used to visualize the networks.

The three modules on the right are used to evaluate the feature vectors created by the above modules. The two evaluation methods described in Section 4.4 are implemented in classify and retrieval, and evaluate provide an interface to these.

Previous topic

Welcome to TextNet’s documentation!

Next topic

Test

This Page