Construct graph representations from text.
The module contains functions from creating networks based on text documents, and for converting the networks into feature-vectors. Feature vectors are created based on node centrality in the text networks.
The following text representations are supported:
random: | Will create a network with all distinct terms in the provided document as nodes. Edges are created at random between the nodes, based on provided probabilities. |
---|---|
co-occurrence: | Distinct terms in the document are used as nodes. Edges are created between any terms that occurs closely together in the text. |
dependency: | Words as nodes. Edges represent dependencies extracted from the text using the stanford dependency parser (see the ‘stanford_parser’ module). |
The module makes heavy use of the graph module.
Author: | Kjetil Valle <kjetilva@stud.ntnu.no> |
---|
Construct co-occurrence network from text.
direction must be ‘forward’, ‘backward’ or ‘undirected’, while context can be ‘window’ or ‘sentence’.
If context is ‘window’, already_preprocessed indicate whether doc already have been processed. Sentence contexts require unpreocessed *doc*s.
Any value for window_size is ignored if context is ‘sentence’.
A DiGraph is created regardless of direction parameter, but with ‘undirected’, edges are created in both directions.
Construct a dependency network from doc.
Creates a network form doc with distinct word used for nodes, and all dependency types defined by the stanford parser, except those listed in exclude used as edges.
direction must be ‘undirected’, ‘forward’ or ‘backward. Forward direction means head-dependent, while backward gives dependent-head relations.
Construct random network for use as baseline.
Create a random network based on doc, with words used for nodes. Edges are created between any given pair of nodes (a,b) with probability p.
All edges will have weight = 1.0
Return list of graph node evaluation metrics.
If weighted is not specified, or None, all metrics are returned. Otherwise metrics suited for (un)*weighted* networks are returned.
Return node values as dictionary
If icc is provided, values are TC-ICC, otherwise TC is calculated.
Create feature vector from a single graph.
The list of all_tokens is used as basis for the feature vector, and value for each word in graph g according to metric is calculated.
Create centrality based feature-vectors from graph representations
Takes a list of graphs and returns a numpy nd-matix of feature vectors, based on the provides metric.
Converts similarity matrix to weighted graph.
Author: | Gleb Sizov <sizov@idi.ntnu.no> |
---|