|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectdataloader.XMLParseren
public class XMLParseren
The class reads config.xml and gets the URL or catalogue with the HTML-files one wishes to tokenize. These are then sent to HTMLStripper. When HTMLStripper has tokenized the documents, XMLParser generates the file tokenized.xml
Constructor Summary | |
---|---|
XMLParseren(java.io.File file,
boolean fromUrl)
The constructor to the class. |
|
XMLParseren(java.lang.String fromUrl,
java.lang.String toUrl)
Constructor, reads the XML file that is to be parsed. |
Method Summary | |
---|---|
java.lang.String |
getNumberOfTexts()
Method that returns number of texts (news) in the collection |
static void |
main(java.lang.String[] args)
Main method for testing. |
void |
makeXmlTokenizer(java.lang.String title,
java.lang.String url,
java.lang.String body)
The method creates the tokenized.xml file. |
void |
makeXmlTokenizer2(java.lang.String text,
java.lang.String url)
The method creates tokenized.xml, the file of news to be loaded into the system |
java.util.ArrayList |
parseUri(java.lang.String urls)
The method finds all URLs that were listed in the configuration file. |
java.lang.String |
readFile(java.io.File filename)
Metoden leser innholdet i fila og lagrer det i en string |
java.util.ArrayList |
readXML(java.io.File file)
The method reads the content in the configuration file (the XML file) |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public XMLParseren(java.io.File file, boolean fromUrl)
file
- File the config filefromUrl
- Checks if the input files comes from file or from url.public XMLParseren(java.lang.String fromUrl, java.lang.String toUrl)
fromUrl
- The config filetoUrl
- The dataset fileMethod Detail |
---|
public java.util.ArrayList readXML(java.io.File file)
file
- The input file
public java.util.ArrayList parseUri(java.lang.String urls)
urls
- The string of URLs.
public void makeXmlTokenizer(java.lang.String title, java.lang.String url, java.lang.String body) throws java.io.FileNotFoundException, java.lang.SecurityException
title
- url
- body
-
java.io.FileNotFoundException
java.lang.SecurityException
public void makeXmlTokenizer2(java.lang.String text, java.lang.String url) throws java.io.FileNotFoundException, java.lang.SecurityException
text
- The texturl
- The URL
java.io.FileNotFoundException
java.lang.SecurityException
public java.lang.String readFile(java.io.File filename) throws java.io.IOException
filename
- File
java.io.IOException
public java.lang.String getNumberOfTexts()
public static void main(java.lang.String[] args)
args
-
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |