| 程序包 | 说明 |
|---|---|
| us.codecraft.webmagic |
Main class "Spider" and models.
|
| us.codecraft.webmagic.downloader |
Downloader is the part that downloads web pages and store in Page object.
|
| us.codecraft.webmagic.selector |
Selectors for page extraction.
|
| 类和说明 |
|---|
| Html
Selectable html.
|
| Json
parse json
|
| Selectable
Selectable text.
|
| 类和说明 |
|---|
| Html
Selectable html.
|
| 类和说明 |
|---|
| AbstractSelectable |
| AndSelector
All selectors will be arranged as a pipeline.
|
| BaseElementSelector |
| CssSelector
CSS selector.
|
| ElementSelector
Selector(extractor) for html elements.
|
| Html
Selectable html.
|
| HtmlNode |
| Json
parse json
|
| OrSelector
All extractors will do extracting separately,
and the results of extractors will combined as the final result. |
| PlainText
Selectable plain text.
|
| RegexSelector
Selector in regex.
|
| Selectable
Selectable text.
|
| Selector
Selector(extractor) for text.
|
| SmartContentSelector
Borrowed from https://code.google.com/p/cx-extractor/
|
| XpathSelector
XPath selector based on Xsoup.
|
Copyright © 2016. All rights reserved.