A major problem for search engines is to work out the source for documents that are available on multiple URLs. Content duplication can happen in many ways, like:
- Due to GET-parameters
- With multiple URLs due to CMS
- Due to accessibility on different hosts/protocols
- Due to print versions of websites
Duplicate content problems occur when the same content is accessible from multiple URLs. For example, http://www.example.com/page.html would be considered by search engines to be a completely different page from http://www.example.com/page.html?parameter=1, albeit both URLs may reference an equivalent content.
Google, Yahoo, and Microsoft announced support for the canonical link element In February 2009, which can be inserted into the <head> section of a web page, to allow webmasters to prevent these issues. The canonical link helps webmasters for creating clear to the search engines which page should be credited because the original.
How search engines handle canonical link
Search engines attempt to utilize canonical link definitions as an output filter for his or her search results. If more than one URL contains similar content within the result set, the canonical link URL definitions will likely be incorporated to figure out the source of the content. For example, when Google finds identical content instances, it decides to point out one among them. Its choice of the source to display within the search results will depend on the search query.
According to Google, the canonical link element isn’t considered to be a directive, but rather a touch that the ranking algorithm will honor strongly.
While the canonical link element has its benefits, Matt Cutts, then the top of Google’s webspam team, has said that the program prefers the utilization of 301 redirects. Google’s spiders can prefer to ignore a canonical link if they think it more beneficial to try to do so.
The canonical link can be used in 2 ways:
- Within HTML <head />
- sent it with the HTTP header of the document. For non HTML files, the HTTP header is an alternative method to set a canonical URL.
As per the HTML 5 standard, the <link rel=”canonical” href=”http://example.com/”> HTML element must be within the <head /> section of the document.
Here HTML code utilizes the rel=canonical inside the <head> tags. This is used on a page such as https://example.com/GM.php?parameter=1 to tell search engines that the https://example.com/GM.php is the preferred version of the webpage.
<!DOCTYPE html> <html> <head> <link rel="canonical" href="https://example.com/GM.php" /> </head> <body> ... </body> </html>
A canonical link element helps webmasters to prevent duplicate content issues in search engine optimization.