Information:Link rot

(Redirected from BTW:DEADLINK)

Like most large websites, BattleTechWiki suffers from the phenomenon known as w:link rot, where external links become dead, as the linked web pages or complete websites disappear, change their content, or move without HTTP redirection. URLs have a median lifespan of about 1 year.

Link rot is a significant danger to BattleTechWiki because of the reliability policy and source citation guideline.

In general, do not delete cited information solely because the URL to the source does not work any longer. Tools, procedures, and processes are available as outlined in this document.

Preventing link rot[edit]

Manual archiving[edit]

Suggestions for ways to manually improve archiving:

  • Avoid bare URLs. Use citation templates such as {{cite web}} for citations, and {{webarchive}} for external links sections.
  • Use a web archiving service such as Internet Archive or Archive.today. Within citation templates, put the archive URL in |archive-url= and add an |archive-date=. If the link is still valid, include |url-status=live, otherwise set |url-status=dead.
  • To add more than one archive URL, as extra insurance against provider outage, {{webarchive}} accepts up to 10 archive provider URLs. The |format=addlarchives option produces output appropriate for trailing a CS1|2 template. e.g. {{cite web|archive-url=..}}{{webarchive|format=addlarchive|url1=..|url2=..|url3..}} will show 4 archive URLs (one from the cite web and three from the webarchive).
  • If the link is still live but not yet archived, visit the web site of the archive service of your choice and request that the page be archived.

Alternative methods[edit]

Most citation templates have a |quote= parameter that can be used to store text quotes of the source material. This can be used to store a limited amount of text from the source within the citation template. This is especially useful for sources that cannot be archived with web archiving services. It can also provide insurance against failure of the chosen web archiving service. Storing the entire text of the source is not appropriate under fair use policies, so choose only the most important portions of the text that most support the assertions in the BattleTechWiki article.

Repairing a dead link[edit]

There are several ways to try to repair a dead link, detailed below. In general, avoid removing citations (or cited material) simply because a URL no longer works, especially if the citation is formatted with other information (like a title, author, date and publication name) that could alternatively be used to find the source.

Searching[edit]

If the dead link includes enough information (article title, names, etc.) it is often possible to use it to find the web page at a different location, either on the same site or elsewhere.

Often web pages simply move within the same site. A site index or site-specific search feature is a useful place to locate the moved page, searching for the title or other information. If these tools are not available, many Internet search engines allow a search on a specified site. For example, with Google add site:en.BattleTechWiki.org to the search string to search English BattleTechWiki only. Occasionally changing http:// to https:// works.

Failing this, searching the Web for the page title can find alternative sites. Searching the Web for the data to support can find a different source.

If you find a suitable new URL, then you can edit the parameters within the citation. If the citation uses one of the common citation templates (e.g. {{cite web}}, {{cite news}}, {{Citation}}), you can:

  • Change the |url= to point to the new URL;
  • Change or add |access-date= to refer to the current date.

Internet archives[edit]

Check for archived versions at one of the many web archive services. The "Big 3" archive services are web.archive.org, webcitation.org and archive.today. These account for over 90% of all archives on BattleTechWiki, with web.archive.org being over 80% of all archive links. The Mementos interface allows one to search multiple archiving services with a single search. The Memento database is cached, meaning results are returned quickly, but the cache also becomes out of date, and should not be relied on as the final word – it will often incorrectly report that no archives are available. You may still need to check individual archive sites, but Mementos can be a quick first check.

Bookmarklets to check common archive sites for archives of the current page
(all open in a new tab or window)
Archive site Bookmarklet
Archive.org
javascript:void(window.open('https://web.archive.org/web/*/'+location.href))
UKGWA
javascript:void(window.open('https://webarchive.nationalarchives.gov.uk/ukgwa/*/'+location.href))

If multiple archive dates are available, use the one that is most likely to be the contents of the page seen by the editor who entered the reference on the |access-date=. If that parameter is not specified, a search of the article's revision history can be performed to determine when the link was added to the article.

View the archive to verify that it contains valid page information. Usually dates closer to the time the link was placed in the BattleTechWiki page, or earlier, are more likely to show valid information.

If you find a suitable archive URL, then you can add it to the citation. If the citation uses one of the common templates (e.g. {{cite web}}, {{cite news}}, {{Citation}}), then you can edit as follows:

  • Leave the |url= unchanged, pointing to the source URL.
  • Add |archive-url=, pointing to the archive URL.
  • Add |archive-date=, specifying the date when the archived copy was saved. YYYY-MM-DD format is usually easiest but any format can be used.
  • Add or change |url-status=. Use |url-status=dead if the old URL does not work. Use |url-status=unfit or |url-status=usurped if the old URL has been usurped for the purposes of spam, advertising, or is otherwise unsuitable. Use |url-status=live if |url= still works and still gives the correct information, but you want to preemptively add an |archive-url=.
  • Leave the |access-date= unchanged, referring to the date when a previous editor last accessed the |url=. Some editors believe |access-date= should be removed once a working |archive-url= is established since the |url= is no longer available, as maintaining an |access-date= is redundant clutter.

Mitigating a dead link[edit]

At times, all attempts to repair the link will be unsuccessful. In that event, consider finding an alternative source so that the loss of the original does not harm the verifiably of the article. Alternative sources about broad topics are usually easily located. A simple search engine query might locate an appropriate alternative, but be extremely careful to avoid citing mirrors and forks of BattleTechWiki itself, which would violate BattleTechWiki:Verifiability. Sometimes a link is dead because the website moved the URL (e.g. http://example.com moved to http://example.co.uk).

In general, the fact that a URL is broken does not mean that a source has ceased to exist entirely, and a broken URL in a citation does not mean it must be removed. See the guidance at BTW:DEADREF for when it is appropriate to remove citations with dead links. Crucially, books, magazines, journals and other print sources exist offline, and continue to do so even if websites go down or change locations; the lack of a functioning URL for a book does nothing to decrease its value as a source for BattleTechWiki content. Permanently inaccessible convenience links for print sources can be removed, but the reference should be retained. Before removing a citation with a dead URL, consider whether it would be possible to track down the source without using the URL at all; if so, it should probably be kept.

Keeping dead links[edit]

A dead, unarchived source URL may still be useful. Such a link indicates that information was (probably) verifiable in the past, and the link might provide another user with greater resources or expertise with enough information to find the reference. It could also return from the dead. With a dead link, it is possible to determine if it has been cited elsewhere, or to contact the person originally responsible for the source. For example, one could contact the Yale Computer Science department if http://www.cs.yale.edu/~EliYale/Defense-in-Depth-PhD-thesis.pdf[dead link] were dead.

Place {{dead link|date=January 2025}} after the dead citation, immediately before the </ref> tag if applicable, leaving the original link intact. Marking dead links signals to editors and to link rot bots that this link needs to be replaced with an archive link. Placing {{dead link}} also auto-categorizes the article into Articles with dead external links project category, and into specific monthly date range category based on |date= parameter. Do not delete a citation just because it has been tagged with {{dead link}} for a long time.

Glossary[edit]

Glossary of terms and concepts.

  • Beyond-404. Conceptually and ideally, every link that is dead will return a status code of 404. In the wilds of the Internet, many pages that are "dead" can return other codes. This is the realm beyond-404, and often requires special tools and foreknowledge to detect and fix. It might account for 30% or more of all inoperable links. Some of the beyond-404 types are described in this glossary. Links can be combination of types, for example a URL that is: Soft-redirect --> Soft-404 --> Redirect --> Destination.
  • Bot Blocker. Any kind of mechanism that prevents automated tools from detecting the status of a page. Most common are CloudFlare, rate limiters and IP blockers. Bot Blockers can cause false 404s.
  • Hard-404 or Dead link. A page that returns 404 status code, a dead link.
  • Soft-404. A URL that redirects to a page with different content from the original. For example, https://example.com/page1.html redirects to https://example.com/home.html (redirection to home page). Soft-404s can be domain name squatters, blank pages, spam sites, bot blockers, rate limiters, the possibilities are endless. This is the most common type of "Beyond 404" dead link. Conceptually, the page does not return 404, but is also not returning the intended content, in-effect a 404 and thus "soft". Methods of Soft-404 detection including foreknowledge, the redirect URL, the page title, and content on the page.
  • Crunchy-404. A URL that falls somewhere between a Soft-404 and Hard-404. The content is different from the original page, but it still has content relevant to the original. Depending on what information the reader seeks, it could be considered a dead link, or a live link, relative to the viewer.
  • Redirect. A URL that automatically redirects to another page.
  • Soft-redirect. A URL that appears to be inoperable (404), but exists on the live web at a different URL ie. it is missing a redirect. This is a corollary to a soft-404.
  • Ruled and Inferred soft-redirect. Types of soft-redirects. Ruled soft-redirects can be resolved by transformation rules eg. a rule to change ".co.uk" -> ".com". Inferred soft-redirects guess what the new URL might be by parsing information from other sources, such as the |title= or |date= of a citation. Inferred redirects might have multiple guesses that are added to an inference table which are checked until one is found to work.
  • Ghost redirect. Redirect link rot. For example, a 301 header was deleted and became a 404, but the old 301 information is still preserved in the Wayback Machine. Useful to discover redirect information no longer on the live web. See also Ghostredir repo.
  • Soft-200 or False 404. A URL that appears to be dead but is actually live. This can be caused by bot blockers or a misconfiguration.
  • URL Move (or Migration). When a URL is moved from one scheme to another, for example migrating https://example.com/main.html to https://arthur.com/main.html .. the remote site changed domain names. Most of the time, sites will leave some of the old URLs behind and not migrate all of them, they typically turn into 404s and soft-redirects. When making URL Move on BattleTechWiki it is thus imperative to verify the new URL works. When it is not possible to verify (such as a Bot Blocker) this is called a "Blind URL Move".
  • Content drift. When content at a static URL changes with time. For example team rankings at https://espn.com/mlb/rangers/standings.html changes on a weekly basis. Weather and financial data are other classic examples. Even though the URL may be live, it is functionally dead, the page no longer displays the intended content, it is a variety of soft-404.

See also[edit]

Tools and how-to guides[edit]

External links[edit]