Wikipedia:Link rot

Like most large websites, Wikipedia suffers from the phenomenon known as link rot, where external links, often used as references and citations, gradually become irrelevant or broken (also called a dead link), as the linked websites disappear, change their content, or move. This presents a significant threat to Wikipedia's reliability policy and its source citation guideline.

The effort required to prevent link rot is significantly less than the effort required to repair or mitigate a rotten link. Therefore, prevention of link rot strengthens the encyclopedia. This guide provides strategies for preventing link rot before it happens. These include the use of web archiving services and the judicious use of citation templates.

Editors are encouraged to add an archive link as a part of each citation, or at least submit the referenced URL for archiving,[note 1] at the same time that a citation is created or updated.

However, link rot cannot always be prevented, so this guide also explains how to mitigate link rot by finding previously archived links and other sources. These strategies should be implemented in accordance with Wikipedia:Citing sources#Preventing and repairing dead links, which describes the steps to take when a link cannot be repaired.

Except for URLs in the External links section that have not been used to support any article content, do not delete cited information solely because the URL to the source does not work any longer. Recovery and repair options and tools are available. Verifiability does not require that all information be supported by a working link, nor does it require the source to be published online.

Preventing link rot

As you write articles, you can help prevent link rot in several ways. The first way to prevent link rot is to avoid bare URLs by recording as much of the exact title, author, publisher and date of the source as possible. Optionally, also add the accessdate. If the link goes bad, this added information can help a future Wikipedian, either editor or reader, locate a new source for the original text, either online or a print copy. This may be impossible with only an isolated, bare URL that no longer works. Local and school libraries are a good resource for locating such offline sources. Many local libraries have in-house subscriptions to digital databases or inter-library loan agreements, making it easier to retrieve hard-to-find sources.

As you edit, if an article has bare URLs in its citations, fix them or at least tag the References section with {{linkrot}} as a reminder to complete citation details as above, and to categorize the article as needing cleanup.

Web archive services

A second way to prevent link rot is to use a web archiving service. The two most popular services are the Wayback Machine, which crawls and archives many web pages as well as having a form to suggest a URL to be archived,[note 1] and WebCite, which provides on-demand web archiving. These services collect and preserve web pages for future use even if the original web page is moved, changed, deleted, or placed behind a pay wall. Web archiving is especially important when citing web pages that are unstable or prone to changes, like time sensitive news articles or pages hosted by financially distressed organizations. Once you have the URL for the archived version of the web page, use the archiveurl= and archivedate= parameters in the citation template that you are using. The template will automatically incorporate the archived link into reference.

  • Dubner, Stephen J. (January 24, 2008). "Wall Street Journal Paywall Sturdier Than Suspected". The New York Times Company. Retrieved 2009-10-28.
  • Dubner, Stephen J. (January 24, 2008). "Wall Street Journal Paywall Sturdier Than Suspected". The New York Times Company. Archived from the original on 2011-08-15.

However, not every web page can be archived. Webmasters and publishers may use a Robots exclusion standard in their domain to disallow archiving, or rely on complicated JavaScript, Flash, or other code that is not easily copied. In these cases, alternate methods of preserving the data may be available.

Robots.txt

A quirk in the way the Wayback Machine operates means archived copies of sites sometimes become unavailable, for example, the Freakonomics blog previously hosted at freakonomics.blogs.nytimes.com. Those URLs were later excluded from archiving by the New York Times' robots.txt file; this also made the previously archived content unavailable. robots.txt changes, however, can unhide that which previous changes have hidden, so do not delete an archiveURL solely because the archived content is currently unavailable. Luckily, in this case, not only can the content be found on a new site that is still open to archiving, but the site's robots.txt later changed to allow archiving again, and so the old archives are now unhidden (example).

Alternative methods

Most citation templates have a quote= parameter that can be used to store text quotes of the source material. This can be used to store a limited amount of text from the source within the citation template. This is especially useful for sources that cannot be archived with web archiving services. It can also provide insurance against failure of the chosen web archiving service.

When using the quote parameter, choose the most succinct and relevant material possible that preserves the context of the reference. Storing the entire text of the source is not appropriate under fair use policies, so choose only the most important portions of the text that most support the assertions in the Wikipedia article.

A quote also helps searching for other on-line versions of the source in the event that the original is discontinued.

Where applicable, public domain materials can be copied to Wikisource.

Other Languages
Mìng-dĕ̤ng-ngṳ̄: Wikipedia:失效鏈接
Simple English: Wikipedia:Link rot