| 程序包 | 说明 |
|---|---|
| us.codecraft.webmagic |
Main class "Spider" and models.
|
| us.codecraft.webmagic.downloader |
Downloader is the part that downloads web pages and store in Page object.
|
| us.codecraft.webmagic.processor |
PageProcessor custom part of a crawler for specific site.
|
| us.codecraft.webmagic.processor.example |
| 限定符和类型 | 方法和说明 |
|---|---|
Page |
Page.setRawText(String rawText) |
Page |
Page.setSkip(boolean skip) |
| 限定符和类型 | 方法和说明 |
|---|---|
protected void |
Spider.extractAndAddRequests(Page page,
boolean spawnUrl) |
| 限定符和类型 | 方法和说明 |
|---|---|
protected Page |
AbstractDownloader.addToCycleRetry(Request request,
Site site) |
Page |
HttpClientDownloader.download(Request request,
Task task) |
Page |
Downloader.download(Request request,
Task task)
Downloads web pages and store in Page object.
|
protected Page |
HttpClientDownloader.handleResponse(Request request,
String charset,
org.apache.http.HttpResponse httpResponse,
Task task) |
| 限定符和类型 | 方法和说明 |
|---|---|
void |
SimplePageProcessor.process(Page page) |
void |
PageProcessor.process(Page page)
process the page, extract urls to fetch, extract the data and store
|
| 限定符和类型 | 方法和说明 |
|---|---|
void |
OschinaBlogPageProcessor.process(Page page) |
void |
GithubRepoPageProcessor.process(Page page) |
void |
BaiduBaikePageProcessor.process(Page page) |
Copyright © 2016. All rights reserved.