report_data

Helper module for data, used to extract problem description and solution parts from cases.

Parses reports formatted in HTML, structured as those in the AIR dataset, and split them into problem description part and solution part of textual CBR cases. Solutions are identified based on section titles in the reports. Titles matching words such as ‘finding’ or ‘conclusion’ are considered as part of the solution.

The remaining parts of the report is by default the problem description.

Author:Gleb Sizov <sizov@idi.ntnu.no>
class report_data.ReportCase(report)
Splits report into description and solution parts based on the section titles.
class report_data.ReportParser(path, raw=None)

Parser for canadian html reports.

It’s quite a hack so be carefull messing with it.

class report_data.Section(title='', parent=None)

Section brunch (tree structure) of a report. On the top level represents the report itself.

Contains the following information:

  • report title: string
  • section level: number
  • contained (sub)sections: list of Section instances
  • section paragraphs: list of strings
  • meta information: list of strings

Previous topic

data

Next topic

preprocess

This Page