KEYWORDS: Databases, Telecommunications, Network architectures, Internet, Data storage, Data communications, Networks, Data centers, Image processing, Data processing
For monitoring of cutting-edge technologies by obtaining the massive data on the internet, the pyspider framework is used to regularly crawl the information of a large number of websites, and the crawled data is regularly imported into the business application system through python scripts. At the initial stage of the project, the distributed architecture is used as officially recommended by pyspider. Later, it is optimized as the clustered architecture in order to adapt to the actual network environment and the characteristics of crawler tasks. After testing, the work efficiency and stability of the improved architecture have been greatly improved and the expected results have been achieved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.