WordPress robots.txt tips against duplicate content

Been getting some questions about my robots.txt file and what certain things do.

Thankfully some regular expressions are supported in the robots.txt (but not many).

$ in regex means the end of the file. So if you do .php$ it your robots.txt that means it will match anything that ends in .php

This is really handy when you want to block all .exe .php or other files. For example:

Disallow: /*.PDF$
Disallow: /*.jpeg$
Disallow: /*.exe$

Specifically this is some of the things I use in my robots.txt

Disallow: /*? – this blocks all urls with a ? in them. A good way to avoid duplicate content issues with wordpress blogs. Obviously you only want to use this if you have changed your url structure to not be 100% ?=.

Disallow: /*.php$ – This blocks all .php files. Another good way to avoid duplicate content with a wordpress blog.

Disallow: /*.inc$ – you should not be showing .inc or include files to bots (google code search will eat you alive)

Disallow: /*.css$ – why would you show css files for indexing seems silly.. The wildcard is used here in case there are many css files.

Disallow: */feed/ feeds being indexed dilute your site equity. The wildcard * is used incase there is preceding chars.

Disallow: */trackback/ – no reason a trackback url should be indexed. The wildcard * is used incase there is preceding chars.

Disallow: /page/ – assloads of duplicate content in pages for wordpress.

Disallow: /tag/ – more douplicate content.

Disallow: /category/ – even more duplicate content.

SO what if you want to ALLOW a page. Like for instance my serps tool is serps.php and from the above rules that would not fly.

Allow: /serps.php – this does the trick!

Keep in mind I am not a SEO but I have picked up a few tricks along the way.

About The Author

Comments 83

  1. bob
    • Kamal Hasa
  2. Keith Cash
  3. bob c
  4. ShoeMoney
  5. Ian
  6. bob c
  7. RacerX
  8. RacerX
  9. Arejay
  10. brad
  11. Michelle
  12. TheMadHat
    • Unpublished Guy
  13. Solo Programmer
  14. Hustle Strategy
  15. Mayank Rocks
  16. Mayank Rocks
  17. Paid Surveys Reviewed
  18. Money Blog
  19. Exposed SEO
  20. eMarketing Chat
  21. Guy
  22. TheOfficeCubicle
  23. Guy
  24. Homefinding Book
  25. Paul
  26. Terry Tay
  27. jtGraphic
  28. Deibson Albernas
  29. anty
  30. anty
  31. oakling
  32. ShoeMoney
  33. ShoeMoney
  34. ShoeMoney
  35. TheMadHat
  36. Syed Balkhi
  37. Syed Balkhi
  38. Gary R. Hess
  39. Gary R. Hess
  40. Affiliate Confession
  41. Douglas Karr
  42. Tom Beaton
  43. David Harrison
  44. Squeaky
  45. Charlie
  46. Uzair
  47. Uzair
  48. Uzair
  49. Affiliate Confession
  50. Dexter | Techathand.net
  51. Dexter | Techathand.net
  52. Reynder (SEO)
  53. Too Much Vodka
  54. John
  55. Nullamatix
  56. Nullamatix
  57. Nullamatix
  58. RacerX
  59. Yiwu
  60. Yiwu
  61. Too Much Vodka
  62. Andy Beard
  63. Downloading...
  64. Secrets Of Cash Gifting
  65. Erica DeWolf
  66. HardGeek
  67. Chip
  68. SEO hosting
  69. No Regrets Cash Gifting
  70. Erken Rezervasyon
  71. Tatil
  72. bursa
  73. bursa emlak
  74. Bogan Marketing
  75. olay
  76. istanbul otelleri
  77. kültür turları
  78. Bursa Emlak
  79. bursa
  80. Bursa Devlet Hastanesi
  81. oldbaby