|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectdataloader.HTMLStripper
public class HTMLStripper
The class loads an HTML file or an URL to an HTML site and strips this for HTML tags.
Constructor Summary | |
---|---|
HTMLStripper(java.lang.String url,
java.lang.String sBoundary,
java.lang.String pBoundary,
java.lang.String allsmall,
XMLParseren xmlP)
The constructor that uses an URL and parametres from the configuration file. |
|
HTMLStripper(java.lang.String filename,
XMLParseren xmlP,
java.lang.String sBoundary,
java.lang.String pBoundary)
The constructor that loads an HTML site from file. |
Method Summary | |
---|---|
java.lang.String |
findSource(java.lang.String urlForFrame)
The method finds the source code to an URL. |
java.util.ArrayList |
getMeta(java.lang.String file)
Method that collects metadata from file. |
java.util.ArrayList |
getNewsList()
Method that gets the Arraylist of news |
java.lang.String |
getOriginalFile()
Method that gets the original file. |
java.lang.String |
letterStripping(java.lang.String file)
The method changes letters that have special characters in HTML, to regular letters. |
static void |
main(java.lang.String[] args)
The main method that starts everything. |
void |
parseFile(java.lang.String file)
This method calls all the different methods that parse file. |
java.lang.String |
parseFilen(java.lang.String file)
Another method that calls all the different methods that parse the file. |
java.lang.String |
readFile(java.io.File filename)
The method reads the file and puts it in a string variable. |
java.util.ArrayList |
searchFrame(java.lang.String file)
Method that finds frames in the HTML file. |
java.lang.String |
searchNews(java.lang.String file)
The method finds a news in the HTML site. |
java.lang.String |
searchTitle(java.lang.String file)
The method finds the title to the HTML site. |
java.lang.String |
strip(java.lang.String file)
The method removes most HTML tags and spaces. |
java.lang.String |
stripSpecialChar(java.lang.String file)
The method removes special characters that might occur in HTML. |
java.lang.String |
stripWhiteSpace(java.lang.String file)
The method removes all whiteSpaces that are superfluous. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public HTMLStripper(java.lang.String url, java.lang.String sBoundary, java.lang.String pBoundary, java.lang.String allsmall, XMLParseren xmlP)
url
- StringsBoundary
- StringpBoundary
- Stringallsmall
- StringxmlP
- XMLParserpublic HTMLStripper(java.lang.String filename, XMLParseren xmlP, java.lang.String sBoundary, java.lang.String pBoundary)
filename
- xmlP
- sBoundary
- pBoundary
- Method Detail |
---|
public java.lang.String getOriginalFile()
public java.util.ArrayList getMeta(java.lang.String file)
file
- The input file.
public void parseFile(java.lang.String file)
file
- Stringpublic java.lang.String parseFilen(java.lang.String file)
file
- Stringpublic java.lang.String readFile(java.io.File filename) throws java.io.IOException
filename
- File
java.io.IOException
public java.lang.String searchTitle(java.lang.String file)
file
- String
public java.lang.String searchNews(java.lang.String file)
file
- String
public java.util.ArrayList getNewsList()
public java.util.ArrayList searchFrame(java.lang.String file)
file
- String
public java.lang.String findSource(java.lang.String urlForFrame) throws java.net.MalformedURLException, java.io.IOException
urlForFrame
- String
java.net.MalformedURLException
java.io.IOException
public java.lang.String strip(java.lang.String file)
file
- String
public java.lang.String stripSpecialChar(java.lang.String file)
file
- String
public java.lang.String stripWhiteSpace(java.lang.String file)
file
- String
public java.lang.String letterStripping(java.lang.String file)
file
- String
public static void main(java.lang.String[] args)
args
- String
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |