Claim 1 of the patent US20060282494A1 - Interactive web crawling is as follows:
1. A method for crawling a web site, the method comprising:
* providing a crawler having a plurality of modes of operation, including an automatic mode and an interactive mode;
* operating the crawler in the automatic mode to crawl the web site;
* detecting a structure on the web site that requires human interaction;
* switching the crawler to the interactive mode;
* prompting a user to interact with the structure;
* receiving input from the user;
* and continuing to crawl the web site in the interactive mode based on the input from the user.
The method described in Claim 1 allows a user to interact with the crawling process. For example, the user can specify which pages should be crawled, and they can also pause or resume the crawling process. This can be useful for web sites that are large or complex, or for web sites that require human interaction to crawl.
The method described in Claim 1 is a significant improvement over previous methods of crawling web sites. Previous methods of crawling web sites were typically automatic, meaning that they did not allow for user interaction. This could lead to problems, such as the crawler crawling pages that were not of interest to the user, or the crawler crawling pages that were not accessible to the user. The method described in Claim 1 solves these problems by allowing the user to interact with the crawling process.
Claim 2 of the patent US20060282494A1 - Interactive web crawling is as follows:
2. The method of claim 1, wherein the structure on the web site that requires human interaction is a dynamic structure.
The structure on the web site that requires human interaction is a dynamic structure. This means that the structure changes over time, such as a search results page or a news feed. Dynamic structures can be difficult to crawl automatically, as the crawler may not be able to keep up with the changes. The method described in Claim 2 addresses this problem by allowing the user to interact with the crawling process. This allows the user to specify which pages should be crawled, and it also allows the user to pause or resume the crawling process.
The method described in Claim 2 is a significant improvement over previous methods of crawling dynamic structures. Previous methods of crawling dynamic structures were typically automatic, meaning that they did not allow for user interaction. This could lead to problems, such as the crawler crawling pages that were not of interest to the user, or the crawler crawling pages that were not accessible to the user. The method described in Claim 2 solves these problems by allowing the user to interact with the crawling process.
Here are some examples of dynamic structures:
* Search results pages
* News feeds
* Product catalogs
* Social media feeds
* eCommerce websites
These structures are all dynamic, meaning that they change over time. This can make it difficult to crawl them automatically, as the crawler may not be able to keep up with the changes. The method described in Claim 2 addresses this problem by allowing the user to interact with the crawling process. This allows the user to specify which pages should be crawled, and it also allows the user to pause or resume the crawling process.
Comments
Post a Comment