| 程序包 | 说明 |
|---|---|
| us.codecraft.webmagic |
Main class "Spider" and models.
|
| us.codecraft.webmagic.downloader |
Downloader is the part that downloads web pages and store in Page object.
|
| us.codecraft.webmagic.processor |
PageProcessor custom part of a crawler for specific site.
|
| us.codecraft.webmagic.processor.example |
| 限定符和类型 | 字段和说明 |
|---|---|
protected Site |
Spider.site |
| 限定符和类型 | 方法和说明 |
|---|---|
Site |
Site.addCookie(String name,
String value)
Add a cookie with domain
getDomain() |
Site |
Site.addCookie(String domain,
String name,
String value)
Add a cookie with specific domain.
|
Site |
Site.addHeader(String key,
String value)
Put an Http header for downloader.
|
Site |
Site.addStartRequest(Request startRequest)
已过时。
|
Site |
Site.addStartUrl(String startUrl)
已过时。
|
Site |
Site.enableHttpProxyPool() |
Site |
Task.getSite()
site of a task
|
Site |
Spider.getSite() |
static Site |
Site.me()
new a Site
|
Site |
Site.setAcceptStatCode(Set<Integer> acceptStatCode)
Set acceptStatCode.
|
Site |
Site.setCharset(String charset)
Set charset of page manually.
|
Site |
Site.setCycleRetryTimes(int cycleRetryTimes)
Set cycleRetryTimes times when download fail, 0 by default.
|
Site |
Site.setDomain(String domain)
set the domain of site.
|
Site |
Site.setHttpProxy(org.apache.http.HttpHost httpProxy)
set up httpProxy for this site
|
Site |
Site.setHttpProxyPool(List<String[]> httpProxyList)
Set httpProxyPool, String[0]:ip, String[1]:port
|
Site |
Site.setProxyReuseInterval(int reuseInterval) |
Site |
Site.setRetrySleepTime(int retrySleepTime)
Set retry sleep times when download fail, 1000 by default.
|
Site |
Site.setRetryTimes(int retryTimes)
Set retry times when download fail, 0 by default.
|
Site |
Site.setSleepTime(int sleepTime)
Set the interval between the processing of two pages.
|
Site |
Site.setTimeOut(int timeOut)
set timeout for downloader in ms
|
Site |
Site.setUseGzip(boolean useGzip)
Whether use gzip.
|
Site |
Site.setUserAgent(String userAgent)
set user agent
|
| 限定符和类型 | 方法和说明 |
|---|---|
protected Page |
AbstractDownloader.addToCycleRetry(Request request,
Site site) |
org.apache.http.impl.client.CloseableHttpClient |
HttpClientGenerator.getClient(Site site) |
protected org.apache.http.client.methods.HttpUriRequest |
HttpClientDownloader.getHttpUriRequest(Request request,
Site site,
Map<String,String> headers) |
| 限定符和类型 | 方法和说明 |
|---|---|
Site |
SimplePageProcessor.getSite() |
Site |
PageProcessor.getSite()
get the site settings
|
| 限定符和类型 | 方法和说明 |
|---|---|
Site |
OschinaBlogPageProcessor.getSite() |
Site |
GithubRepoPageProcessor.getSite() |
Site |
BaiduBaikePageProcessor.getSite() |
Copyright © 2016. All rights reserved.