Just wanted to post and maybe clarify some things about Bigdaddy and the bots (AdSense MediaBot and Google search GoogleBot) from the way I understand it.
There is a LOT of stories going around about how Google is using AdSense MediaBot to index websites. I think the main concern is that there is some advantage to people who use AdSense in order to get there content into Google’s Search index faster and keep there content fresher. This is not the case.. although I can see how people would think that as I was confused myself..
Here is how it works:
When you have AdSense on a page MediaBot will come and index you just like GoogleBot and contribute to this joint cache. Then when your site is due to be spidered by GoogleBot it will first check to see if MediaBot has been there and if it has it will pull that data rather then waste your bandwidth. If it has not been there it will send GoogleBot to get a new index. I am guessing there is some time stamp they are using to determine the freshness.
I tested this by launching 10 new websites each on there own domain and ip all with AdSense. Each had the nofollow/noindex attribute for ONLY GOOGLEBOT. They were all indexed by MediaBot and showed relevant AdSense ads within a few minutes. Its been 3 days now and none of them have a single page indexed by GoogleBot. The only visit by GoogleBot was to retrieve the robots.txt.
Here is the actual bullet points from Matt Cutts slide on BigDaddy:
– Software infrastructure upgrade to crawling & indexing
– Live everywhere
– Lays the groundwork for future improvements
– Fresher indexing
– Smarter crawling with less bandwidth
– Better support for gzip encoding
– Crawl cache
I am going to go over my logs and make a post hopefully later tonight on exactly what changes I am seeing from GoogleBot in respect to search/index traffic vs other bots.
Also Greg has posted a follow-up here