Copyright violations (WP:COPYVIO)

Caution Although we have a system in place to automatically detect copyright violations, it misses a large number of them. 100% reliance should never be placed on bots, which can also produce false-positives. Copyright infringement is a pervasive problem and it is not only important that we not host such material, but it often leads to significant additional work when not caught early. Accordingly, unless there are very good reasons to believe a copyright violation is unlikely, please check all new pages for copying from pre-existing material. Articles about organizations and music groups are especially prone to 'borrowing' content from other sources.

It is not a copyright violation to copy material that is in the public domain or has a compatible license if the material is properly attributed. (Templates are available but are not required.)

It is important to remember that any text that is a copyright violation should be removed from the article and the revisions deleted, even if the text doesn't qualify for G12 deletion.

Hallmarks of copying include:

  • The addition of a large portion of text in a single or few edits – especially when coupled with other indicia listed below
  • Single reference articles, or ones with large sections of text without inline references
  • Articles with text that seems 'too perfect to be true'
  • Articles whose text resembles that of a news article, press release, blog, or a book, that rarely occurs outside of a specific, invariably copyrighted use, or that has a strange tone of voice, such as an overly informal tone
  • First person pronouns and possessives (I, we, my, our), and contractions (I'm, we're, they're, can't, didn't, aren't, won't, etc.)
  • The inclusion of a slanted marketing voice with weasel words and other puffery; explicit or implicit claims of ownership of the text added and insider status as to the topic (inclusion of intellectual property symbols [©,™,®] is highly correlated)
  • Out of context and out of place words or phrases, smacking of an existing source or the navigation structure of an original website: "this site/page/book/whitepaper"; "top", "go to top", "next page", "click here", etc. and non-standard characters (e.g., Microsoft "smart quotes")
  • Articles whose style of referencing appears to be that of a book or other pre-existing source, not corresponding to the actual references in the article – such as reference numbers or author names in the text, including in-line footnote links such as "[1]", especially when no footnotes are given

Methods to check for copyright violations:

  1. Use filters in the page curation feed to see if any edits on a particular page has been flagged as a copyright violation.
  2. To see if content has been copied from pre-existing writing, copy and paste a limited but unique portion of text from the page into a search engine such as Google (between quotation marks), and try a few such snippets from each paragraph.
  3. Compare the article's content with the references and external links and look for copy/pastes or close paraphrasing.
  4. Even if not given as a reference or link, see if the person or organization has a dedicated website (it is often fruitful, once located, to look for an "about", "history" or other narrative section, which will not necessarily appear in Google). If you have access to them, Facebook and linkedIn are also widespread sources of copying.
  5. Earwig's Copyvio Detector and the Duplication detector are useful tools to find copyright violations. However, do not treat a negative result by either as conclusive – both are hit and miss, being unable to read some web content and are poor at finding closely-paraphrased content. Positive results too must be checked by a human, including to see whether the source is in the public domain or bears a suitable free copyright license. This user script can be added to create a link in your tools that will run the current page through Earwig's tool.
  6. Some copyright violations are from PDF files. To read them you will need to open them in your browser or download them.
  7. Although less likely to be relevant for new pages than existing articles, it is important to understand "backwards copyvios" – that Wikipedia content gets quickly picked up and duplicated by outside sources, and false-positives may be triggered by searches finding content copied from the Wikipedia article. The Wayback Machine is an invaluable tool in sorting these.

What to do if you find a copyright violation:

  • If substantially the entire page is an unambiguous copyright violation, and there's no non-infringing revision to revert to (which will usually but not always be true for new articles), tag the page for speedy deletion under CSD G12 using . Don't forget to warn the user with the warning notice template that will be provided to you in the text of the speedy deletion tag (If you are using Page Curation, it will do this for you, if you are examining an older page that has already been reviewed, Twinkle will also do it).
  • Note: for copyright violations where the content is copied from multiple sources, you can put more than one URL into twinkle, but page curation only has a single field, to get around this, simply put a space and write "and" between the URLs and enter them both in the single field.
  • Where you have not marked the page for speedy deletion – for example, because removing the infringement found would still leave substantial content – then:
  1. remove all of the copyrighted infringing material from the page, noting in your edit summary where it is from ("Remove copyright violation of http://www...."). Where the copying is from more than one source, it is often easiest to remove each infringement in a separate edit.
  2. post to the article's talk page ; just place a space between the URLs if there's more than one (note: this template automatically signs for you so place no tildes).
  3. if you are an administrator, revision delete the span of edits containing the copyright violations, and if you are not, mark the revisions in the page history (typically the first edit and second to last edit) for redaction by an administrator by placing and saving at the top of the page this template: Please be careful to search for the oldid and not the diff number when requesting revision deletion.
  • If you are a non-administrator, User:Enterprisey/cv-revdel and User:Primefac/revdel are both userscripts which will semi-automate the requesting of revision deletion and help speedup the process.
  • Where you have not marked the page for speedy deletion, and cannot clean it up yourself, or believe your suspicion of copying warrants further looking into, send the page for investigation to Wikipedia:Copyright problems, by marking it with , and then follow the instructions in the copyright investigation notice to list the page at "today's" copyright violations page and to warn the user.
