What Happens To 10,000 Images When You Nofollow, Noindex

Posted by



For about six months now my gallery page has been nofollow/noindex to Google. I had a big problem in that there was over 50k results alone from the gallery in Google and it totally diluted my site pr value as a whole. (at one time the entire shoemoney.com domain was supplemental)

Recently I have heard a lot of people say that if you use Google Sitemaps on your blog and with gallery its the fastest way to go supplemental. It did happen to me but I was not so quick to blame Google for it but rather look into the issue.

The problem was that I was using Googles Sitemap generator script to rip through my accesslog and generate my sitemaps. This was really bad because it was feeding Google every url that every bot had gone to (basically every followed link everywhere on my site). This also really is bad because in WordPress for every post there is a feed and for every comments there is a feed. Basically there is a rss feed generated for every frickin thing done in WordPress and if you have links to those items then the bots will follow them and then boom there in your sitemap. So your sitemap ends up with a bunch of worthless crap.

Now this is all my fault and not Google Sitemaps or WordPress. They make great tools I just did not realize the “out of the box” configuration of the Google Sitemaps python script.

I have had TONS of success with Google Sitemaps on other sites with millions of pages indexed but those were all sites we had written from the ground up. There was no prefab website software being used.

So what happened when I nofollowed and noindexed all that content to Google? Well you can see here on Google the pages have the title of whatever the anchor text was used but no content was indexed. Some pretty interesting results ;)

So now I have built a new sitemap script that does not include any links to feeds and also crappy gallery stuff so we will see how it works out.

48 thoughts on “What Happens To 10,000 Images When You Nofollow, Noindex

  1. Bush Mackel

    I just took a look at my sitemap yesterday that was being generated by one of my WordPress plugins. Thankfully, nothing has been indexed in terms of my gallery, but I have had other things that have been indexed that I wish weren’t. Better for everyone to take the time now and again and check those automated sitemaps.

  2. jim

    So it’s still being indexed, just no information about them is being cached? I wonder if what’s there are just artifacts from when you did let it index and those results will just fall out the next time they refresh?

  3. Bush Mackel

    From my experience just takes time…I read if your site is constantly being updated, and has a good PR yada yada yada, I think this process may be expedited.

  4. ShoeMoney

    actually Matt Cutts has commented on this …. It was something to the effect that the url will remain in the index it just wont be indexed or followed

  5. Paul

    Any chance that we could get our hands on the sitemap script? :)

    I was thinking about adding a gallery but heard similar comments made on the forums as well.

  6. Pingback: Shoemoney on Google Sitemaps | Paul Bradish: Internet Business for the Masses.

  7. Jay Harper

    I’ve heard a number of cases where bad sitemaps got people in trouble… I’m personally of the opinion that if you didn’t code the sitemap yourself (by hand or with a script you wrote), you probably shouldn’t use a sitemap. For a site like this your RSS/Atom feed would serve the same purpose or possibly be even better since it gets you into stuff like Google Blog Search.

    Regardless, it’s old news that Google has stuff “in their index” that actually isn’t “indexed” – it’s just how they work. Any URL they’re aware of is “in the index”, but only URLs that are “indexed” show up in ‘real’ SERPs.

    What I don’t understand is how your gallery could “totally diluted [your] site pr value as a whole”… Are you saying having a bunch of supplemental pages lowered the PageRank of your home page or other pages on your site? I’m having a hard time believing that…

    Even if they’re supplemental I’d think they’d enhance the PR of the overall site or at least do nothing to the site PR, not lower it. AFAIK, site PR isn’t an average – it’s cumulative (unless you’re spamming or something – then you can get negatives).

    So I guess the question is what did you see that led you to believe the gallery was hurting your site PR?

  8. Pingback: Speedlinking 18 May 2007 : Graphics and Innovation

  9. Charity Hippy

    Jay wrote: “What I don’t understand is how your gallery could “totally diluted [your] site pr value as a whole”… ”

    I don’t understand this either. Can anyone explain this further for us please?

  10. Daniel

    I am very careful when using sitemaps, actually despite some SEO advice I would use them only if you have indexation problems.

  11. corey

    lots of pages that Google doesn’t think are very important–low page rank or few links pointing at them–can dilute the domain’s authority overall.

    total juice / amount of pages = juice per page

  12. corey

    i thought aaron wall’s suggestion was to use robots.txt to exclude content from google, not noindex,nofollow?

    i’d like to learn more about the differences between using these two approaches on a site with 30,000+ pages.

    one difference i can think of would be when using robots.txt to exclude content, google won’t attribute the page content to my site, thus decreasing the total amount of content i have to support my topic.

  13. Jay Harper

    What’s your source for that?

    It doesn’t sound right to me that well-established pages on a domain with a bunch of low-PR pages would do less well than if the domain didn’t have the low-PR pages (assuming the junky pages aren’t getting penalized for some reason).

    Also, the formula makes it sound like every page on the site has the same ‘juice’/PR… That just isn’t true. There’s a lot more to it – even assuming no external links – the internal linking structure would have an effect.

    And remember – the particular case is for URLs Google is aware of, but have been excluded from the index. If it were an issue of something like cleaning up duplicate content, that’s one thing, but that’s not the case here…

  14. Wealth Junkie

    Shoe,
    You said: “the url will remain in the index it just won’t be indexed or followed”

    Can you explain this a little more? I’m confused.

  15. Wealth Junkie

    I think it has to do with your site’s total “Trust Rank” versus the total number of pages on the site. 10,000 extra gallery pages totally dilutes the juice that the other content-filled pages might have and could affect how those other pages will rank for more competitive terms.

  16. Brian Mark

    They still find links to it, so they know it exists, but they don’t use any information that’s on the page for ranking. Occasionally, something like this can get google bombed to a top ranking yet, but that’s pretty rare any more.

  17. Brian Mark

    Agreed. Our rankings have done better since we quit updating our sitemap and just included our main categories for MSN’s sake (their bot still stinks IMO).

  18. corey

    ok sorry. this post and my previous comment are specifically about google supplemental results, and that’s maybe why i confused you.

    noindex, nofollow doesn’t always get the job done (for all SEs), as covered here… http://www.mattcutts.com/blog/handling-noindex-meta-tags/

    so i opt to use robots.txt exclusion instead of “noindex, nofollow”

    what do you think of that?

    and

    will a noindex, nofollow page still (or sometimes) contribute to the set of keywords that are related/used on my site?

  19. corey

    source for that = matt cutts
    “Having urls in the supplemental results doesn’t mean that you have some sort of penalty at all; the main determinant of whether a url is in our main web index or in the supplemental index is PageRank. If you used to have pages in our main web index and now they’re in the supplemental results, a good hypothesis is that we might not be counting links to your pages with the same weight as we have in the past.”
    http://www.mattcutts.com/blog/infrastructure-status-january-2007/

    “It doesn’t sound right to me that…”

    well ok.

    “Also, the formula makes it sound like every page on the site has the same ‘juice’/PR… That just isn’t true.”

    well of course it’s not. i could have included a variable in my forumla for weights, but i didn’t plan on that mathematical wonder getting interpreted so literally.

    “And remember – the particular case is for URLs Google is aware of, but have been excluded from the index. If it were an issue of something like cleaning up duplicate content, that’s one thing, but that’s not the case here”

    ya, i rtfa.

  20. ritchie

    I’m curious of the noindex-results. I also use a plugin-sitemap generator. Do you think it’s a good idea to index tag pages? Or should they rather be excluded?

  21. Jay Harper

    The way I interpret Matt Cutts’ comments is particular to a single page, not the site. I don’t see anything in his comments about supplemental affecting the authority (or PageRank) of the site as a whole.

  22. Jay Harper

    “Trust” has to do with the whether trustworthy/reputable sites are linking to your site, the age of your domain, and whether you’ve ever done spammy/black hat things with the domain. Trust has nothing to do with the average quality of your pages…

  23. Gecko Tales

    I have a problem with my images getting more searches than some of my pages – on some of my sites. It’s a pain, but it’s my fault for being so good with my attribute image tags.

Comments are closed.