Refurbishing search tools for webmasters
Refurbishing the entire crawl tools is what Google has done recently. Issues encountered by GoogleBot while crawling are known as crawl errors.
So the changes are
Website and URL based errors
The crawl errors have been categorized into 2, namelywebsite errors and URL based errors. Website based errors occur over the entire site, unlike being specific to the URL.
Website errors have been classified into:
Errors based on DNS – DNS lookup, DNS domain and DNS errors. (Though these do not exist specifically as explained below)
Connectivity of Server – Network unreachable, no response, connection timed out are the common errors associated with this.
Fetching Robots.txt–This is a Robots.txt file specific error. Googlebot has no way of knowing if that file exists or what sites it blocks and thus terminates the crawling until it’s found, when encountered with this error.
URL errors are based on individual pages.
URL errors have been classified into:
Error in server –5xx errors, such as 503 errors for server maintenance is a typical example
Error Soft404 – These errors return an erroneous value, but they don’t conform to the usual 404 standard. This makes the Googlebot to crawl unnecessary pages which contain no data and thus the efficiency of the crawling reduces greatly. Additionally, appearance of these pages in the search results can hamper smooth user experience.
Access denial error–The URLs returning 401 errors are classified as access denied errors. Often this translates to request for permission rather than actual error. However blocking these sites from being crawled will improve efficiency.
Not found error –URLs returning 404 and 410 belong to this category.
Not being followed error –301 and 302 status codes are part of this usually. This error report lists URLs that can’t be followed because of too many redirects or redirecting loops . ’Not followed’ should not be the tag for 301 as it is being followed properly.
Other errors – Catch-all errors that is inclusive of all 403s
Evolution over time
Trends for the errors over the past 90 days are being showed by Google now. The aggregation of the URLs that Google knows about is listed here, not the crawled pages that Google has accomplished that day. Re-crawling of the same page with no error, removes the URL from the list and also it’s count decremented
Additionally, Google still listing the date the first error in URL was encountered and also the last time the URL was visited.
Fixing the status and Priorities
URL is listing in Sitemap, lots traffic data, and the number of links it has, are the ways by which Google are now listing URLs in priority order. A URL can be marked fixed and removed from the list. But if Google visits the page again and finds errors, it then adds it to the list again.
Fetch as Googlebot feature is recommended by Google to test the fixes. In fact the right side has a button to accomplish this, but the limiting factor is only 500 fetches per account, not per site per se. So this feature should be used judiciously.
There is a feeling that the majority of the updates have been done keeping in mind the small site developers and not the big enterprise-level (agency level) sites which handle large and diverse data. More data is better for these latter levels as they have the systems to parse and crunch data.
It’s a bit sad to see the features that have been worked hard during the initial launch stages of the webmaster tools have been dismantled or reduced in functionality. But then as a frequent user may want user functionality, this is more like a trade- off rather than serious handicap.