|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjava.lang.Thread
newsloader.Extractor
public class Extractor
A class that reads the contents of a file, and extracts news items.
Nested Class Summary |
---|
Nested classes/interfaces inherited from class java.lang.Thread |
---|
java.lang.Thread.State, java.lang.Thread.UncaughtExceptionHandler |
Field Summary | |
---|---|
int |
items
|
Fields inherited from class java.lang.Thread |
---|
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY |
Constructor Summary | |
---|---|
Extractor(java.lang.String url,
java.lang.String output,
java.lang.String dataSetName)
Constructor for the extractor class. |
Method Summary | |
---|---|
java.lang.String |
anchorTextItem(java.lang.String content)
Method for finding anchor-textx items |
java.lang.String |
cleanWebpage(java.lang.String content)
Method for removing scripts from a webpage |
int |
getNoOfFiles()
Method for getting the number of files |
java.lang.String |
prepareBbcNews(java.lang.String contents)
|
java.lang.String |
prepareFinancialTimes(java.lang.String contents)
|
void |
readFile(java.io.File file)
Method for reading a file |
java.lang.String |
removeTags(java.lang.String string)
Method for removing HTML tags from a string |
void |
run()
|
void |
setCharset(java.io.File file)
|
java.lang.String |
textBasedItem(java.lang.String content)
Method for finding text-based items |
void |
writeXML()
Method for writing the result to a XML-file |
Methods inherited from class java.lang.Thread |
---|
activeCount, checkAccess, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yield |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public int items
Constructor Detail |
---|
public Extractor(java.lang.String url, java.lang.String output, java.lang.String dataSetName)
url
- Location of the directoryoutput
- File to store the result indataSetName
- Name of the datasetMethod Detail |
---|
public void run()
run
in interface java.lang.Runnable
run
in class java.lang.Thread
public void setCharset(java.io.File file)
public java.lang.String prepareFinancialTimes(java.lang.String contents)
public java.lang.String prepareBbcNews(java.lang.String contents)
public void readFile(java.io.File file)
public int getNoOfFiles()
public java.lang.String textBasedItem(java.lang.String content)
content
- The content of the webpage
public java.lang.String anchorTextItem(java.lang.String content)
content
- The content of a webpage
public java.lang.String cleanWebpage(java.lang.String content)
content
- The content of a webpage
public java.lang.String removeTags(java.lang.String string)
string
- The string to remove tags from
public void writeXML()
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |