Helper module for data, used to extract problem description and solution parts from cases.
Parses reports formatted in HTML, structured as those in the AIR dataset, and split them into problem description part and solution part of textual CBR cases. Solutions are identified based on section titles in the reports. Titles matching words such as ‘finding’ or ‘conclusion’ are considered as part of the solution.
The remaining parts of the report is by default the problem description.
Author: | Gleb Sizov <sizov@idi.ntnu.no> |
---|
Parser for canadian html reports.
It’s quite a hack so be carefull messing with it.
Section brunch (tree structure) of a report. On the top level represents the report itself.
Contains the following information: