| 程序包 | 说明 |
|---|---|
| us.codecraft.webmagic |
Main class "Spider" and models.
|
| us.codecraft.webmagic.downloader |
Downloader is the part that downloads web pages and store in Page object.
|
| us.codecraft.webmagic.pipeline |
Pipeline is the persistent and offline process part of crawler.
|
| us.codecraft.webmagic.processor |
PageProcessor custom part of a crawler for specific site.
|
| us.codecraft.webmagic.processor.example | |
| us.codecraft.webmagic.scheduler |
Scheduler is the part of url management.
|
| us.codecraft.webmagic.scheduler.component |
Component of scheduler.
|
| us.codecraft.webmagic.utils |
Static utils of webmagic.
|
| 类和说明 |
|---|
| Page
Object storing extracted result and urls to fetch.
|
| Request
Object contains url to crawl.
|
| ResultItems
Object contains extract results.
|
| Site
Object contains setting for crawler.
|
| Spider
Entrance of a crawler.
|
| Spider.Status |
| SpiderListener
Listener of Spider on page processing.
|
| Task
Interface for identifying different tasks.
|
| 类和说明 |
|---|
| Page
Object storing extracted result and urls to fetch.
|
| Request
Object contains url to crawl.
|
| Site
Object contains setting for crawler.
|
| Task
Interface for identifying different tasks.
|
| 类和说明 |
|---|
| ResultItems
Object contains extract results.
|
| Task
Interface for identifying different tasks.
|
| 类和说明 |
|---|
| Page
Object storing extracted result and urls to fetch.
|
| Site
Object contains setting for crawler.
|
| 类和说明 |
|---|
| Page
Object storing extracted result and urls to fetch.
|
| Site
Object contains setting for crawler.
|
| 类和说明 |
|---|
| Request
Object contains url to crawl.
|
| Task
Interface for identifying different tasks.
|
| 类和说明 |
|---|
| Request
Object contains url to crawl.
|
| Task
Interface for identifying different tasks.
|
| 类和说明 |
|---|
| Request
Object contains url to crawl.
|
Copyright © 2016. All rights reserved.