| Modifier and Type | Field and Description |
|---|---|
protected WebURL |
Page.url
The URL of this page.
|
| Modifier and Type | Method and Description |
|---|---|
WebURL |
Page.getWebURL() |
protected WebURL |
WebCrawler.handleUrlBeforeProcess(WebURL curURL)
This function is called before processing of the page's URL
It can be overridden by subclasses for tweaking of the url before processing it.
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
WebCrawler.handlePageStatusCode(WebURL webUrl,
int statusCode,
String statusDescription)
This function is called once the header of a page is fetched.
|
protected WebURL |
WebCrawler.handleUrlBeforeProcess(WebURL curURL)
This function is called before processing of the page's URL
It can be overridden by subclasses for tweaking of the url before processing it.
|
protected void |
WebCrawler.onContentFetchError(WebURL webUrl)
This function is called if the content of a url could not be fetched.
|
protected void |
WebCrawler.onParseError(WebURL webUrl)
This function is called if there has been an error in parsing the content.
|
protected void |
WebCrawler.onUnhandledException(WebURL webUrl,
Throwable e)
This function is called when a unhandled exception was encountered during fetching
|
void |
Page.setWebURL(WebURL url) |
boolean |
WebCrawler.shouldVisit(Page referringPage,
WebURL url)
Classes that extends WebCrawler should overwrite this function to tell the
crawler whether the given url should be crawled or not.
|
| Constructor and Description |
|---|
Page(WebURL url) |
| Modifier and Type | Method and Description |
|---|---|
PageFetchResult |
PageFetcher.fetchPage(WebURL webUrl) |
| Modifier and Type | Method and Description |
|---|---|
WebURL |
WebURLTupleBinding.entryToObject(com.sleepycat.bind.tuple.TupleInput input) |
| Modifier and Type | Method and Description |
|---|---|
List<WebURL> |
WorkQueues.get(int max) |
| Modifier and Type | Method and Description |
|---|---|
protected static com.sleepycat.je.DatabaseEntry |
WorkQueues.getDatabaseEntryKey(WebURL url) |
void |
WebURLTupleBinding.objectToEntry(WebURL url,
com.sleepycat.bind.tuple.TupleOutput output) |
void |
WorkQueues.put(WebURL url) |
boolean |
InProcessPagesDB.removeURL(WebURL webUrl) |
void |
Frontier.schedule(WebURL url) |
void |
Frontier.setProcessed(WebURL webURL) |
| Modifier and Type | Method and Description |
|---|---|
void |
Frontier.getNextURLs(int max,
List<WebURL> result) |
void |
Frontier.scheduleAll(List<WebURL> urls) |
| Modifier and Type | Method and Description |
|---|---|
Set<WebURL> |
TextParseData.getOutgoingUrls() |
Set<WebURL> |
ParseData.getOutgoingUrls() |
Set<WebURL> |
HtmlParseData.getOutgoingUrls() |
Set<WebURL> |
BinaryParseData.getOutgoingUrls() |
| Modifier and Type | Method and Description |
|---|---|
void |
TextParseData.setOutgoingUrls(Set<WebURL> outgoingUrls) |
void |
ParseData.setOutgoingUrls(Set<WebURL> outgoingUrls) |
void |
HtmlParseData.setOutgoingUrls(Set<WebURL> outgoingUrls) |
void |
BinaryParseData.setOutgoingUrls(Set<WebURL> outgoingUrls) |
| Modifier and Type | Method and Description |
|---|---|
boolean |
RobotstxtServer.allows(WebURL webURL)
Please note that in the case of a bad URL, TRUE will be returned
|
| Modifier and Type | Method and Description |
|---|---|
static Set<WebURL> |
Net.extractUrls(String input) |
Copyright © 2015. All rights reserved.