Wiktionary:Criteria for inclusion
|This is a |
It should not be modified without discussion and consensus. Any substantial or contested changes require a
|Entries: CFI - |
As an international dictionary, Wiktionary is intended to include “all words in all languages”, subject to the following criteria.
A term should be included if it's likely that someone would
A term need not be limited to a single word in the usual sense. Any of these are also acceptable:
Where possible, it is better to cite sources that are likely to remain easily accessible over time, so that someone referring to Wiktionary years from now is likely to be able to find the original source. As Wiktionary is an online dictionary, this naturally favors media such as Usenet groups, which are durably archived by Google. Print media such as books and magazines will also do, particularly if their contents are indexed online. Other recorded media such as audio and video are also acceptable, provided they are of verifiable origin and are durably archived. We do not quote other Wikimedia sites (such as Wikipedia), but we may use quotations found on them (such as quotations from books available on Wikisource). When citing a quotation from a book, please include the ISBN.
This filters out appearance in raw word lists, commentary on the form of a word, such as “The word ‘foo’ has three letters,” lone definitions, and made-up examples of how a word might be used. For example, an appearance in someone’s online dictionary is suggestive, but it does not show the word actually used to convey meaning. On the other hand, a sentence like “They raised the jib (a small sail forward of the mainsail) in order to get the most out of the light wind,” appearing in an account of a sailboat race, would be fine. It happens to contain a definition, but the word is also used for its meaning.
This serves to prevent double-counting of usages that are not truly distinct. Roughly speaking, we generally consider two uses of a term to be "independent" if they are in different sentences by different people, and to be non-independent if:
If two or more usages are not independent of each other, then only one of them can be used for purposes of attestation.
This is meant to filter out words that may appear and see brief use, but then never be used again. The one-year threshold is somewhat arbitrary, but appears to work well in practice.
An expression is idiomatic if its full meaning cannot be easily derived from the meaning of its separate components. Non-idiomatic expressions are called sum-of-parts (SOP).
This criterion is sometimes referred to as the fried egg test, as a
Phrasebook entries are very common expressions that are considered useful to non-native speakers. Although these are included as entries in the dictionary (in the main namespace), they are not usually considered in these terms. For instance,
Unidiomatic terms made up of multiple words are included if they are significantly more common than single-word spellings that meet criteria for inclusion; for example,
An attested integer word (such as
In rare cases, a phrase that is arguably unidiomatic may be included by the consensus of the community, based on the determination of editors that inclusion of the term is likely to be useful to readers.
A translation hub (translation target) is a common English multi-word term or collocation that is useful for hosting translations. Some attested translation hubs should be included despite being non-idiomatic and some excluded, but there is no agreement on precise, all-encompassing rules for deciding which are which. Therefore, the following criteria for inclusion of attested non-idiomatic translation hubs are tentative:
Numbers, numerals, and ordinals over 100 that are not single words or are sequences of digits should not be included in the dictionary, unless the number, numeral, or ordinal in question has a separate idiomatic sense that meets the CFI.
Misspellings, common misspellings and variant spellings: Rare misspellings should be excluded while common misspellings should be included. There is no simple hard and fast rule, particularly in English, for determining whether a particular spelling is “correct”. Published grammars and style guides can be useful in that regard, as can statistics concerning the prevalence of various forms.
Most simple typos are much rarer than the most frequent spellings. Some words, however, are frequently misspelled. For example, occurred is often spelled with only one c or only one r, but only occurred is considered correct.
It is important to remember that most languages, including English, do not have an academy to establish rules of usage, and thus may be prone to uncertain spellings. This problem is less frequent, though not unknown, in languages such as Spanish where spelling may have legal support in some countries.
Regional or historical variations are not misspellings. For example, there are well-known differences between British and American spelling. A spelling considered incorrect in one region may not occur at all in another, and may even dominate in yet another.
The entries for such inflected forms as
Attested repetitive words formed by repeating letters or syllables in other attested words for emphasis, and having no other meaning in any language shall be treated as follows:
An example of repeated letters is the repeated "e" in "pleeease" and "pleeeeeease" compared to "please".
An example of repeated syllables is the repeated "ha" in "hahahahaha" compared to "hahaha".
As for repetition counting: "hahaha" is considered to have three repetitions, while "haha" has two repetitions.
To hard-redirect is to use "#REDIRECT", which immediately takes the reader to the target page.
The above treatment may be overriden by consensus, for example where a variation having four repetitions is more common, or where an additional repetition would cause the word to shift to a different pronunciation or intonation.