Wiktionary:Criteria for inclusion

Application-certificate Gion.svg This is a Wiktionary policy, guideline or common practices page.
It should not be modified without discussion and consensus. Any substantial or contested changes require a VOTE. [1]
Entries: CFI - EL - NORM - NPOV - QUOTE - DELETE. Languages: LT - AXX. Others: BLOCK - BOTS.

As an international dictionary, Wiktionary is intended to include “all words in all languages”.

General rule

A term should be included if it's likely that someone would run across it and want to know what it means. This in turn leads to the somewhat more formal guideline of including a term if it is attested and, when that is met, if it is a single word or it is idiomatic. [2]

Terms

A term need not be limited to a single word in the usual sense. Any of these are also acceptable:

Attestation

“Attested” means verified through [3]

  1. clearly widespread use, or
  2. use in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year ( different requirements apply for certain languages). [4]

Where possible, it is better to cite sources that are likely to remain easily accessible over time, so that someone referring to Wiktionary years from now is likely to be able to find the original source. As Wiktionary is an online dictionary, this naturally favors media such as Usenet groups, which are durably archived by Google. Print media such as books and magazines will also do, particularly if their contents are indexed online. Other recorded media such as audio and video are also acceptable, provided they are of verifiable origin and are durably archived. We do not quote other Wikimedia sites [5] [6] (such as Wikipedia), but we may use quotations found on them (such as quotations from books available on Wikisource). When citing a quotation from a book, please include the ISBN.

Conveying meaning

See use-mention distinction.

This filters out appearance in raw word lists, commentary on the form of a word, such as “The word ‘foo’ has three letters,” lone definitions, and made-up examples of how a word might be used. For example, an appearance in someone’s online dictionary is suggestive, but it does not show the word actually used to convey meaning. On the other hand, a sentence like “They raised the jib (a small sail forward of the mainsail) in order to get the most out of the light wind,” appearing in an account of a sailboat race, would be fine. It happens to contain a definition, but the word is also used for its meaning.

Number of citations

For languages well documented on the Internet, three citations in which a term is used is the minimum number for inclusion in Wiktionary. For terms in extinct languages, one use in a contemporaneous source is the minimum, or one mention is adequate subject to the below requirements. For all other spoken languages that are living, only one use or mention is adequate, subject to the following requirements:

  • the community of editors for that language should maintain a list of materials deemed appropriate as the only sources for entries based on a single mention,
  • each entry should have its source(s) listed on the entry or citation page, and
  • a box explaining that a low number of citations were used should be included on the entry page (such as by using the {{ LDL}} template). [7]

Independent

This serves to prevent double-counting of usages that are not truly distinct. Roughly speaking, we generally consider two uses of a term to be "independent" if they are in different sentences by different people, and to be non-independent if:

  • one is a verbatim or near-verbatim quotation of the other; or
  • both are verbatim or near-verbatim quotations or translations of a single original source; or
  • both are by the same author.

If two or more usages are not independent of each other, then only one of them can be used for purposes of attestation. [8]

Spanning at least a year

This is meant to filter out words that may appear and see brief use, but then never be used again. The one-year threshold is somewhat arbitrary, but appears to work well in practice.

Idiomaticity

An expression is idiomatic if its full meaning cannot be easily derived from the meaning of its separate components. Non-idiomatic expressions are called sum-of-parts (SOP).

For example, this is a door is not idiomatic, but shut up and red herring are.

This criterion is sometimes referred to as the fried egg test, as a fried egg generally means an egg (and generally a chicken egg or similar) fried in a particular way. It generally doesn't denote a scrambled egg, which is nonetheless cooked by frying.

See Wiktionary:Idioms that survived RFD for other examples. However, many idioms are clearly idiomatic, for example red herring. These tests are invoked only in discussion of unclear cases.

Phrasebook entries are very common expressions that are considered useful to non-native speakers. Although these are included as entries in the dictionary (in the main namespace), they are not usually considered in these terms. For instance, What's your name? is clearly a summation of its parts.

Unidiomatic terms made up of multiple words are included if they are significantly more common than single-word spellings that meet criteria for inclusion; for example, coalmine meets criteria for inclusion, so its more common form coal mine is also included. [9]

In rare cases, a phrase that is arguably unidiomatic may be included by the consensus of the community, based on the determination of editors that inclusion of the term is likely to be useful to readers.

Spellings

Misspellings, common misspellings and variant spellings: [10] Rare misspellings should be excluded while common misspellings should be included. [11] There is no simple hard and fast rule, particularly in English, for determining whether a particular spelling is “correct”. Published grammars and style guides can be useful in that regard, as can statistics concerning the prevalence of various forms.

Most simple typos are much rarer than the most frequent spellings. Some words, however, are frequently misspelled. For example, occurred is often spelled with only one c or only one r, but only occurred is considered correct.

It is important to remember that most languages, including English, do not have an academy to establish rules of usage, and thus may be prone to uncertain spellings. This problem is less frequent, though not unknown, in languages such as Spanish where spelling may have legal support in some countries.

Regional or historical variations are not misspellings. For example, there are well-known differences between British and American spelling. A spelling considered incorrect in one region may not occur at all in another, and may even dominate in yet another.

Combining characters (like this) should exist as main-namespace redirects to their non-combining forms (like this) if the latter exist. [12]

Formatting

Once it is decided that a misspelling is of sufficient importance to merit its own page, the formatting of such a page should not be particularly problematical. The usual language and part of speech headings can be used, followed by a simple definition using the following format:

#

An additional section explaining why the term is a misspelling should be considered optional.

Inflections

The entries for such inflected forms as cameras, geese, asked, and were should indicate what form they are, and link to the main entry for the word ( camera, goose, ask, or be, respectively, for the preceding examples). Except with multi-word idioms, they should not merely redirect.

At entries for inflected forms with idiomatic senses, such as blues and smitten, predictable meanings should be distinguished from idiomatic ones. [13]

Repetitions

Attested repetitive words formed by repeating letters or syllables in other attested words for emphasis, and having no other meaning in any language shall be treated as follows: [14]

  1. Each attested repetitive form that has no more than three repetitions shall have an entry.
  2. Each attested repetitive form that has more than three repetitions shall be hard-redirected to the entry having three repetitions. The three-repetition entry shall have a usage note indicating that additional instances of the letter or syllable may be added for the purely literary effect of indicating emphasis.

An example of repeated letters is the repeated "e" in "pleeease" and "pleeeeeease" compared to "please".

An example of repeated syllables is the repeated "ha" in "hahahahaha" compared to "hahaha".

As for repetition counting: "hahaha" is considered to have three repetitions, while "haha" has two repetitions.

To hard-redirect is to use "#REDIRECT", which immediately takes the reader to the target page.

The above treatment may be overriden by consensus, for example where a variation having four repetitions is more common, or where an additional repetition would cause the word to shift to a different pronunciation or intonation.

Other Languages