Skip to main content

Google patent US20130144858A1 - Scheduling resource crawls

 

Google patents about crawling and indexing

Here is claim 1 of the Google patent US20130144858A1 - Scheduling resource crawls:

  1. A method for scheduling resource crawls, the method comprising:

    • providing a crawl scheduler;
    • receiving a resource crawl request from a user;
    • determining a crawl interval for the resource crawl request based on a plurality of factors, including:
      • the popularity of the resource;
      • the frequency of changes to the resource;
      • the bandwidth available to the crawl scheduler;
    • scheduling the resource crawl request to be performed at the crawl interval; and
    • performing the resource crawl request at the crawl interval.

The method described in Claim 1 allows a user to request that a resource be crawled. The crawl scheduler then determines a crawl interval for the resource based on a number of factors, including the popularity of the resource, the frequency of changes to the resource, and the bandwidth available to the crawl scheduler. The resource is then scheduled to be crawled at the crawl interval.

The method described in Claim 1 is a significant improvement over previous methods of scheduling resource crawls. Previous methods of scheduling resource crawls were typically manual, meaning that the user had to specify the crawl interval for each resource. This could be time-consuming and error-prone. The method described in Claim 1 automates the process of scheduling resource crawls, making it more efficient and accurate.

Here are some of the factors that the crawl scheduler can take into account when determining the crawl interval for a resource:

  • Popularity: The more popular a resource is, the more frequently it should be crawled. This is because popular resources are more likely to change, and users are more likely to be interested in the latest changes.
  • Frequency of changes: Resources that change frequently should be crawled more frequently than resources that change less frequently. This is because users are more likely to be interested in the latest changes to a resource that changes frequently.
  • Bandwidth: The amount of bandwidth available to the crawl scheduler can affect the crawl interval. If the crawl scheduler has limited bandwidth, it may need to schedule resources to be crawled less frequently.

The crawl scheduler can use a variety of techniques to determine the crawl interval for a resource. These techniques can include:

  • Heuristics: Heuristics are rules of thumb that can be used to make decisions. The crawl scheduler can use heuristics to determine the crawl interval for a resource based on factors such as the popularity of the resource and the frequency of changes to the resource.
  • Machine learning: Machine learning is a type of artificial intelligence that can be used to learn from data. The crawl scheduler can use machine learning to learn how to determine the crawl interval for resources based on historical data.

The crawl scheduler can use a combination of heuristics and machine learning to determine the crawl interval for a resource. This can help to ensure that resources are crawled frequently enough to meet the needs of users, while also avoiding overloading the crawl scheduler with too many requests.

Claim 2 of the patent US20130144858A1 - Scheduling resource crawls is as follows:

  1. The method of claim 1, wherein the crawl scheduler is configured to:

    • adjust the crawl interval for the resource crawl request based on a plurality of factors, including:
      • the number of times the resource has been crawled;
      • the number of errors that have occurred when crawling the resource;
      • the feedback received from users regarding the resource.

The crawl scheduler is configured to adjust the crawl interval for a resource crawl request based on a number of factors, including the number of times the resource has been crawled, the number of errors that have occurred when crawling the resource, and the feedback received from users regarding the resource.

The crawl scheduler can adjust the crawl interval for a resource in a number of ways. For example, the crawl scheduler can increase the crawl interval if the resource has not been changed recently, or if there have been no errors when crawling the resource. The crawl scheduler can also decrease the crawl interval if the resource has been changed frequently, or if there have been a number of errors when crawling the resource.

The crawl scheduler can also adjust the crawl interval based on feedback received from users. For example, the crawl scheduler can increase the crawl interval if users have reported that they are not seeing the latest changes to a resource. The crawl scheduler can also decrease the crawl interval if users have reported that they are seeing too many errors when trying to access a resource.

The crawl scheduler can use a variety of techniques to adjust the crawl interval for a resource. These techniques can include:

  • Heuristics: Heuristics are rules of thumb that can be used to make decisions. The crawl scheduler can use heuristics to adjust the crawl interval for a resource based on factors such as the number of times the resource has been crawled, the number of errors that have occurred when crawling the resource, and the feedback received from users.
  • Machine learning: Machine learning is a type of artificial intelligence that can be used to learn from data. The crawl scheduler can use machine learning to learn how to adjust the crawl interval for resources based on historical data.

The crawl scheduler can use a combination of heuristics and machine learning to adjust the crawl interval for a resource. This can help to ensure that resources are crawled frequently enough to meet the needs of users, while also avoiding overloading the crawl scheduler with too many requests.

Comments

Popular posts from this blog

The king of Nnewi

  The current king of Nnewi is His Royal Highness Igwe Kenneth Onyeneke Orizu III. He is the 20th monarch in the Nnofo royal lineage and has been on the throne since 1963. He is the longest-serving traditional ruler in Nigeria. Igwe Orizu is a highly respected figure in Nnewi and in the wider Igbo community. He is known for his wisdom, his leadership skills, and his commitment to promoting peace and development in Nnewi. He is also a strong advocate for the rights of traditional rulers in Nigeria. Under Igwe Orizu's leadership, Nnewi has become a major industrial and commercial center in Nigeria. The city is home to a number of thriving industries, including automobile manufacturing, plastics, and textiles. Nnewi is also a major center for the Igbo arts and culture. Igwe Orizu is a strong supporter of education. He has established a number of schools in Nnewi, including a university, a polytechnic, and a number of secondary schools. He is also a patron of the arts and culture, and ...

The historical records of Nnewi

  The history of Nnewi : Nnewi, a vibrant city located in the southeastern part of Nigeria, has a rich history that spans several centuries. The origins of Nnewi can be traced back to ancient times, and its development over the years has made it one of the most industrious and economically prosperous cities in Nigeria. Here is a detailed history of Nnewi: Ancient and Pre-colonial Era: The earliest settlers in the region that is now known as Nnewi were believed to be of the Igbo ethnic group. The Igbo people have a long history in the southeastern part of Nigeria, and Nnewi is considered one of their ancient settlements. The exact date of the city's establishment is unknown, but it is thought to have existed for several centuries. Nnewi was originally a small village, with its inhabitants engaged in subsistence farming, hunting, and traditional crafts. The community was organized into several extended families, with each family having its own ancestral lineage and chief. Colonial Er...

Patents about Google crawling

  Claim 1 of the patent US20060282494A1 - Interactive web crawling is as follows: 1. A method for crawling a web site, the method comprising:     * providing a crawler having a plurality of modes of operation, including an automatic mode and an interactive mode;     * operating the crawler in the automatic mode to crawl the web site;     * detecting a structure on the web site that requires human interaction;     * switching the crawler to the interactive mode;     * prompting a user to interact with the structure;     * receiving input from the user;     * and continuing to crawl the web site in the interactive mode based on the input from the user. The method described in Claim 1 allows a user to interact with the crawling process. For example, the user can specify which pages should be crawled, and they can also pause or resume the crawling process. This can be useful for web sites that are large or complex, o...