| 程序包 | 说明 |
|---|---|
| us.codecraft.webmagic |
Main class "Spider" and models.
|
| us.codecraft.webmagic.downloader |
Downloader is the part that downloads web pages and store in Page object.
|
| us.codecraft.webmagic.pipeline |
Pipeline is the persistent and offline process part of crawler.
|
| us.codecraft.webmagic.scheduler |
Scheduler is the part of url management.
|
| us.codecraft.webmagic.scheduler.component |
Component of scheduler.
|
| 限定符和类型 | 类和说明 |
|---|---|
class |
Spider
Entrance of a crawler.
|
| 限定符和类型 | 方法和说明 |
|---|---|
Task |
Site.toTask() |
| 限定符和类型 | 方法和说明 |
|---|---|
Page |
HttpClientDownloader.download(Request request,
Task task) |
Page |
Downloader.download(Request request,
Task task)
Downloads web pages and store in Page object.
|
protected Page |
HttpClientDownloader.handleResponse(Request request,
String charset,
org.apache.http.HttpResponse httpResponse,
Task task) |
| 限定符和类型 | 方法和说明 |
|---|---|
void |
ResultItemsCollectorPipeline.process(ResultItems resultItems,
Task task) |
void |
Pipeline.process(ResultItems resultItems,
Task task)
Process extracted results.
|
void |
FilePipeline.process(ResultItems resultItems,
Task task) |
void |
ConsolePipeline.process(ResultItems resultItems,
Task task) |
| 限定符和类型 | 方法和说明 |
|---|---|
int |
QueueScheduler.getLeftRequestsCount(Task task) |
int |
PriorityScheduler.getLeftRequestsCount(Task task) |
int |
MonitorableScheduler.getLeftRequestsCount(Task task) |
int |
QueueScheduler.getTotalRequestsCount(Task task) |
int |
PriorityScheduler.getTotalRequestsCount(Task task) |
int |
MonitorableScheduler.getTotalRequestsCount(Task task) |
Request |
Scheduler.poll(Task task)
get an url to crawl
|
Request |
QueueScheduler.poll(Task task) |
Request |
PriorityScheduler.poll(Task task) |
void |
Scheduler.push(Request request,
Task task)
add a url to fetch
|
void |
DuplicateRemovedScheduler.push(Request request,
Task task) |
void |
QueueScheduler.pushWhenNoDuplicate(Request request,
Task task) |
void |
PriorityScheduler.pushWhenNoDuplicate(Request request,
Task task) |
protected void |
DuplicateRemovedScheduler.pushWhenNoDuplicate(Request request,
Task task) |
| 限定符和类型 | 方法和说明 |
|---|---|
int |
HashSetDuplicateRemover.getTotalRequestsCount(Task task) |
int |
DuplicateRemover.getTotalRequestsCount(Task task)
Get TotalRequestsCount for monitor.
|
int |
BloomFilterDuplicateRemover.getTotalRequestsCount(Task task) |
boolean |
HashSetDuplicateRemover.isDuplicate(Request request,
Task task) |
boolean |
DuplicateRemover.isDuplicate(Request request,
Task task)
Check whether the request is duplicate.
|
boolean |
BloomFilterDuplicateRemover.isDuplicate(Request request,
Task task) |
void |
HashSetDuplicateRemover.resetDuplicateCheck(Task task) |
void |
DuplicateRemover.resetDuplicateCheck(Task task)
Reset duplicate check.
|
void |
BloomFilterDuplicateRemover.resetDuplicateCheck(Task task) |
Copyright © 2016. All rights reserved.