With the spreading Panda algorithm updates, the almighty Google has had more efficiency in its quest for sites of poor quality and low amount content in order to penalize them. That is especially true of minimal websites where images abound at the expense of content which is hardly any to be found. Unfortunately, sites that have limited content for users, above the page fold, are also grouped with minimalist sites by the search engine. Websites whose content is duplicate and is found on them as well as on the web are also under continuous penalties.
The last fact is alarming, and people who would like to take care of their websites to avoid content duplication should benefit from some tips in this respect. Content presentation as well as content promotion on websites is also involved, and tips in these areas are welcome.
As everyone can guess, the best approach to avoiding content duplication is by not creating such duplicate content. But the usefulness of duplicate content for users in some cases is also doubtless. Because such cases are frequent, some advice to help avoiding the visibility of duplication for search engines is the most indispensable help. And so, how do we start?
Writing Robots.txt Rules
The Robots Disallow method is good, though it should be used cautiously, as it could lead to the removal of the entire website from the indexing process. That is why only the pages that contain duplicate content should be disallowed. When webmasters wish to inform robots not to index specific pages, what they should include in their robots.txt files as pages not to be indexed should be typed in, and care should be taken that the addresses of other pages of the same website do not contain words that are found in the instructions for skipping. Otherwise such pages will also be removed from the indexing process. People who would like more detailed information on robots.txt files can find excellent sources dwelling on the topic.
URL Parameters Option in Google Webmaster Tools
The URL Parameters section of the Google Webmasters Tools achieves the same results as robots.txt files. Using that section, webmasters can manually submit information to Google on what to track and what to skip when performing the pagination and the translation procedures on specific pages of their websites. Here again care should be taken to perform the submission well, and again the risk involved in not doing so represents a danger of being skipped, not only for the specific pages indicated, but for other website pages as well. Insufficiently experienced people should remember the warning that appears on the first screen, and take care to learn more details before they start using the tool.
Resubmission of your .xml Sitemap
When webmasters have put in practice the two former submissions, they can proceed to remove the specific pages they do not want to be indexed from the website sitemap.xml file. When the links have been removed from the .xml file, webmasters can resubmit that file to Google Webmasters Tools.
Internal Links with ‘Nofollow’
There is an older approach that helps to consolidate the effect from the above approaches. When the ‘nofollow’ attribute is added to the links on pages that should be indexed, but point to pages that should not be indexed, that is good information for search engines. They know that the duplicate content on pages that should not be indexed is published there for users purposes, and not for benefits to be gained from online searches.
The Power of Canonicalization
- There are different reasons for which canonicalization can be employed. A list of such reasons is given below: When pages can be reached from two or more location (by means of site structure)
- Web pages where Session Ids are used; common examples are shopping carts and booking pages
- Web pages which remain the same following login, but are secured (http before login, then https again)
- Web pages that can be reached from affiliate links
- Web pages in which URLs change in conformity with changes in fields; example: a travel site incorporating a calendar feature
In the examples listed above, links can be added to the original pages to inform search engines that the content on such pages is truly duplicate content, and that the original source of the content can be found by the canonicalization linking to the original page.
Using Noindex / nofollow in Meta
This approach is simple and easy to implement, and is commonly used on pages such as blog categories, tags, or archive pages; in such pages the content serves the purpose of helping users to find the pages they are searching for. In addition to the removal of such pages from the index, the addition of the nofollow & noindex to the meta tags the information to achieve the desired results. Yet another idea is to mix and match tags, if the purpose is to want the search engine robots to see the duplicate content but not index that content.
Choose between www. and non-www Version of Your Site
Another simple approach involves the http://www.website.com type addresses and http://website.com addresses. When two versions of one page are redirected to each other using the two address versions, there will be no risk of indexing pages as duplicate. The two address versions method can be achieved by means of URL rewrites at server level such as 301 redirects.
Don’t forget to USe the Rel=Prev and Rel=Next
This approach can pose some difficulties, and it should be noted that it is mostly used on component pages of websites. The method is very similar to the canonicalization method outlined above, and it enables search engines to see the relationship between certain URLs in paginated content series. More information on the matter can be found in Google’s Webmaster Central Blog.
Language Related Duplicate Issues
In many cases duplicate website versions are designed to be used by audiences in different countries, each group of online users speaking the language used in the specific version. Common examples include websites designed for UK audiences and other versions of the same websites designed for use by US or Australian audiences. To avoid penalties owing to duplicate content, such websites can be set up to be reached in the manner similar to the one outlined above.
There are also methods that can help to prevent search engines from classifying content across different TLDs as duplicate content. They are simple to implement, and are listed below.
- From the Settings tab in Google Webmaster Tools, the geographic target should be set to the country for which the audience is designed
- Websites for specific countries should possibly be available from servers in those specific countries
- All information such as addresses, phone numbers, as well as currencies, should be relevant for the specific country
- Adding geo-meta location tags to pages is also helpful. The local profiles created should be specific, and they should link to relevant websites
- Local profile websites should be used for the countries where the intended audiences reside. Too many links from other countries should be avoided
- Hreflang is yet another useful method
Don’t Forget About the Demo Sites
Demo sites should preferably not be left live; they should be set on demo servers not accessible to the Internet. Of course, when they are in a subdomain, having them live cannot be avoided. Let us remind that to ensure their non-indexing by robots, they should be disallowed in robots.txt files, and the parameters should be used in Webmaster Tools. Then it can be necessary to add canonical links to the website test versions that point to the respective live versions.
Finally, demo sites should have passwords, to prevent random users from accidentally intruding there by mistyping the URL.