What Happens To 10,000 Images When You Nofollow, Noindex

by Jeremy Schoemaker on May 17, 2007 · 48 comments

For about six months now my gallery page has been nofollow/noindex to Google. I had a big problem in that there was over 50k results alone from the gallery in Google and it totally diluted my site pr value as a whole. (at one time the entire shoemoney.com domain was supplemental)

Recently I have heard a lot of people say that if you use Google Sitemaps on your blog and with gallery its the fastest way to go supplemental. It did happen to me but I was not so quick to blame Google for it but rather look into the issue.

The problem was that I was using Googles Sitemap generator script to rip through my accesslog and generate my sitemaps. This was really bad because it was feeding Google every url that every bot had gone to (basically every followed link everywhere on my site). This also really is bad because in WordPress for every post there is a feed and for every comments there is a feed. Basically there is a rss feed generated for every frickin thing done in WordPress and if you have links to those items then the bots will follow them and then boom there in your sitemap. So your sitemap ends up with a bunch of worthless crap.

Now this is all my fault and not Google Sitemaps or WordPress. They make great tools I just did not realize the “out of the box” configuration of the Google Sitemaps python script.

I have had TONS of success with Google Sitemaps on other sites with millions of pages indexed but those were all sites we had written from the ground up. There was no prefab website software being used.

So what happened when I nofollowed and noindexed all that content to Google? Well you can see here on Google the pages have the title of whatever the anchor text was used but no content was indexed. Some pretty interesting results ;)

So now I have built a new sitemap script that does not include any links to feeds and also crappy gallery stuff so we will see how it works out.

full disclosure

About the author...

– who has written 2858 posts on ShoeMoney.com.

Jeremy "ShoeMoney" Schoemaker is the founder & CEO of the ShoeMoney Blog, Elite Retreat Internet Conference, & the PAR Program. In 2013 Jeremy released his #1 Amazon Best selling Autobiography titled "Nothing's Changed But My Change" - The ShoeMoney Story. Jeremy currently lives in Lincoln Nebraska with his wife and 2 daughters.


Anna recommends you check out these amazing posts:

  1. douchbag.congrats Are You a Conference Speaker Douchebag?
  2. imgres How to Sell Your Website
  3. 300px-Boschsevendeadlysins Seven Deadly Sins For People Trying to Make Money Online

{ 46 comments }

1 Bush Mackel

I just took a look at my sitemap yesterday that was being generated by one of my Wordpress plugins. Thankfully, nothing has been indexed in terms of my gallery, but I have had other things that have been indexed that I wish weren’t. Better for everyone to take the time now and again and check those automated sitemaps.

2 jim

So it’s still being indexed, just no information about them is being cached? I wonder if what’s there are just artifacts from when you did let it index and those results will just fall out the next time they refresh?

3 jim

By “it’s still being indexed,” I meant they still appear in the index.

4 fivecentnickel.com

Yeah, it seems odd that they would persist in the results when flagged as nofollow, noindex.

5 Bush Mackel

From my experience just takes time…I read if your site is constantly being updated, and has a good PR yada yada yada, I think this process may be expedited.

6 ShoeMoney

actually Matt Cutts has commented on this …. It was something to the effect that the url will remain in the index it just wont be indexed or followed

7 Paul

Any chance that we could get our hands on the sitemap script? :)

I was thinking about adding a gallery but heard similar comments made on the forums as well.

8 Link Snitch

Would definitely love to get my hands on that script too!

9 David Harrison

What did matt cutts say you have a link?

10 Glen

Jeremy, this was first on the advice of Aaron Wall if im correct?

11 coopreme

you could use webmaster tools to delist it though..

12 ShoeMoney

right there was just no need to once it was not counted against me ;)

13 ShoeMoney

yea part of it ;)

14 CPA Affiliates

Man thats an interesting result..

15 Ken Savage

robots.txt is my best friend.

http://www.kensavage.com/robots.txt

16 Ken Savage

just add /gallery/ to your robots.txt and install away.

17 Jay Harper

I’ve heard a number of cases where bad sitemaps got people in trouble… I’m personally of the opinion that if you didn’t code the sitemap yourself (by hand or with a script you wrote), you probably shouldn’t use a sitemap. For a site like this your RSS/Atom feed would serve the same purpose or possibly be even better since it gets you into stuff like Google Blog Search.

Regardless, it’s old news that Google has stuff “in their index” that actually isn’t “indexed” – it’s just how they work. Any URL they’re aware of is “in the index”, but only URLs that are “indexed” show up in ‘real’ SERPs.

What I don’t understand is how your gallery could “totally diluted [your] site pr value as a whole”… Are you saying having a bunch of supplemental pages lowered the PageRank of your home page or other pages on your site? I’m having a hard time believing that…

Even if they’re supplemental I’d think they’d enhance the PR of the overall site or at least do nothing to the site PR, not lower it. AFAIK, site PR isn’t an average – it’s cumulative (unless you’re spamming or something – then you can get negatives).

So I guess the question is what did you see that led you to believe the gallery was hurting your site PR?

18 ShoeMoney

totally… and not the little green bar.. just domain authority

19 Fashion Industry

I use google sitemap on my site…is it bad all around or what exactly did you screw up?

20 Charity Hippy

Jay wrote: “What I don’t understand is how your gallery could “totally diluted [your] site pr value as a whole”… ”

I don’t understand this either. Can anyone explain this further for us please?

21 suray

Thanks a lot for You information Shoe… Man, I’m totally shock for the money You have earn from adsense… Remarkable, I thought that no one can earn as much as You. Congratulation! If you got a time visit my blog at http://surayblog.blogspot.com

22 Al Davies

Totally. After shoe posted about using robots.txt to your advantage I went around and looked at others like Aaron Wall The Android and Quadzzzhad for robots and pretty much sliced and diced theirs into mine..

23 Mark

I third that… how could more pages indexed be a bad thing??

24 Daniel

I am very careful when using sitemaps, actually despite some SEO advice I would use them only if you have indexation problems.

25 John

Indeed,I have observed it too in some cases.

26 corey

lots of pages that Google doesn’t think are very important–low page rank or few links pointing at them–can dilute the domain’s authority overall.

total juice / amount of pages = juice per page

27 corey

i thought aaron wall’s suggestion was to use robots.txt to exclude content from google, not noindex,nofollow?

i’d like to learn more about the differences between using these two approaches on a site with 30,000+ pages.

one difference i can think of would be when using robots.txt to exclude content, google won’t attribute the page content to my site, thus decreasing the total amount of content i have to support my topic.

28 Lee Bandoni

WP plugins are great but I aways like to tweek then for personal use :)

29 Jay Harper

What’s your source for that?

It doesn’t sound right to me that well-established pages on a domain with a bunch of low-PR pages would do less well than if the domain didn’t have the low-PR pages (assuming the junky pages aren’t getting penalized for some reason).

Also, the formula makes it sound like every page on the site has the same ‘juice’/PR… That just isn’t true. There’s a lot more to it – even assuming no external links – the internal linking structure would have an effect.

And remember – the particular case is for URLs Google is aware of, but have been excluded from the index. If it were an issue of something like cleaning up duplicate content, that’s one thing, but that’s not the case here…

30 fivecentnickel.com

I don’t know what you’re asking… Doesn’t noindex/nofollow exclude it?

31 JeffPosaka

Using robots.txt to block gallery access saves on bandwidth too.

32 Wealth Junkie

Shoe,
You said: “the url will remain in the index it just won’t be indexed or followed”

Can you explain this a little more? I’m confused.

33 Wealth Junkie

I think it has to do with your site’s total “Trust Rank” versus the total number of pages on the site. 10,000 extra gallery pages totally dilutes the juice that the other content-filled pages might have and could affect how those other pages will rank for more competitive terms.

34 The Dino

I am not using automatic site maps and from this post if I am going to use it I will do it carefuly.

35 Ali

Yeah would love to get a hand on the script, PUHHHLLEEAASEE!!!

36 Brian Mark

They still find links to it, so they know it exists, but they don’t use any information that’s on the page for ranking. Occasionally, something like this can get google bombed to a top ranking yet, but that’s pretty rare any more.

37 Brian Mark

Agreed. Our rankings have done better since we quit updating our sitemap and just included our main categories for MSN’s sake (their bot still stinks IMO).

38 Chui - Turn Off Nofollow on Wordpress

I noticed an anectodal improvement after turning off nofollow.

39 Chui - Turn Off Nofollow on Wordpress

Check my sig for instructions how to turn off nofollow on RSS Feeds.

40 corey

ok sorry. this post and my previous comment are specifically about google supplemental results, and that’s maybe why i confused you.

noindex, nofollow doesn’t always get the job done (for all SEs), as covered here… http://www.mattcutts.com/blog/handling-noindex-meta-tags/

so i opt to use robots.txt exclusion instead of “noindex, nofollow”

what do you think of that?

and

will a noindex, nofollow page still (or sometimes) contribute to the set of keywords that are related/used on my site?

41 corey

source for that = matt cutts
“Having urls in the supplemental results doesn’t mean that you have some sort of penalty at all; the main determinant of whether a url is in our main web index or in the supplemental index is PageRank. If you used to have pages in our main web index and now they’re in the supplemental results, a good hypothesis is that we might not be counting links to your pages with the same weight as we have in the past.”
http://www.mattcutts.com/blog/infrastructure-status-january-2007/

“It doesn’t sound right to me that…”

well ok.

“Also, the formula makes it sound like every page on the site has the same ‘juice’/PR… That just isn’t true.”

well of course it’s not. i could have included a variable in my forumla for weights, but i didn’t plan on that mathematical wonder getting interpreted so literally.

“And remember – the particular case is for URLs Google is aware of, but have been excluded from the index. If it were an issue of something like cleaning up duplicate content, that’s one thing, but that’s not the case here”

ya, i rtfa.

42 ritchie

On the other hand, I do blame google – they do index pictures, so it’s a fault in the algorithm in my opinion.

43 ritchie

I’m curious of the noindex-results. I also use a plugin-sitemap generator. Do you think it’s a good idea to index tag pages? Or should they rather be excluded?

44 Jay Harper

The way I interpret Matt Cutts’ comments is particular to a single page, not the site. I don’t see anything in his comments about supplemental affecting the authority (or PageRank) of the site as a whole.

45 Jay Harper

“Trust” has to do with the whether trustworthy/reputable sites are linking to your site, the age of your domain, and whether you’ve ever done spammy/black hat things with the domain. Trust has nothing to do with the average quality of your pages…

46 Gecko Tales

I have a problem with my images getting more searches than some of my pages – on some of my sites. It’s a pain, but it’s my fault for being so good with my attribute image tags.

Previous post:

Next post: