Language production consists of several interdependent processes which transform a nonlinguistic message into a spoken or signed linguistic signal. Linguists debate whether the process of language production occurs in a series of stages (serial processing) or whether production processes occur in parallel. After identifying a message to be linguistically encoded, a speaker must select the individual words—known as lexical items—to represent that message in a process called lexical selection. The words are selected based on their meaning, which in linguistics is called semantic information. Lexical selection activates the word's lemma, which contains both semantic and grammatical information about the word.[a]
After an utterance has been planned,[b] it then goes through phonological encoding. In this stage of language production, the mental representation of the words are assigned their phonological content as a sequence of phonemes to be produced. The phonemes are specified for articulatory features which denote particular goals such as closed lips or the tongue in a particular location. These phonemes are then coordinated into a sequence of muscle commands that can be sent to the muscles, and when these commands are executed properly the intended sounds are produced. Thus the process of production from message to sound can be summarized as the following sequence:[c]
- Message planning
- Lemma selection
- Retrieval and assignment of phonological word forms
- Articulatory specification
- Muscle commands
- Speech sounds
Place of articulation
Sounds which are made by a full or partial construction of the vocal tract are called consonants. Consonants are pronounced in the vocal tract, usually in the mouth, and the location of this construction affects the resulting sound. Because of the close connection between the position of the tongue and the resulting sound, the place of articulation is an important concept in many subdisciplines of phonetics.
Sounds are partly categorized by the location of a construction as well as the part of the body doing the constricting. For example, in English the words fought and thought are a minimal pair differing only in the organ making the construction rather than the location of the construction. The "f" in fought is a labiodental articulation made with the bottom lip against the teeth. The "th" in thought is a linguodental articulation made with the tongue against the teeth. Constrictions made by the lips are called labials while those made with the tongue are called lingual.
Constrictions made with the tongue can be made in several parts of the vocal tract, broadly classified into coronal, dorsal and radical places of articulation. Coronal articulations are made with the front of the tongue, dorsal articulations are made with the back of the tongue, and radical articulations are made in the pharynx. These divisions are not sufficient for distinguishing and describing all speech sounds. For example, in English the sounds [s] and [ʃ] are both coronal, but they are produced in different places of the mouth. To account for this, more detailed places of articulation are needed based upon the area of the mouth in which the constriction occurs.
Articulations involving the lips can be made in three different ways: with both lips (bilabial), with one lip and the teeth (labiodental), and with the tongue and the upper lip (linguolabial). Depending on the definition used, some or all of these kinds of articulations may be categorized into the class of labial articulations. Bilabial consonants are made with both lips. In producing these sounds the lower lip moves farthest to meet the upper lip, which also moves down slightly, though in some cases the force from air moving through the aperture (opening between the lips) may cause the lips to separate faster than they can come together. Unlike most other articulations, both articulators are made from soft tissue, and so bilabial stops are more likely to be produced with incomplete closures than articulations involving hard surfaces like the teeth or palate. Bilabial stops are also unusual in that an articulator in the upper section of the vocal tract actively moves downwards, as the upper lip shows some active downward movement. Linguolabial consonants are made with the blade of the tongue approaching or contacting the upper lip. Like in bilabial articulations, the upper lip moves slightly towards the more active articulator. Articulations in this group do not have their own symbols in the International Phonetic Alphabet, rather, they are formed by combining an apical symbol with a diacritic implicitly placing them in the coronal category. They exist in a number of languages indigenous to Vanuatu such as Tangoa.
Labiodental consonants are made by the lower lip rising to the upper teeth. Labiodental consonants are most often fricatives while labiodental nasals are also typologically common. There is debate as to whether true labiodental plosives occur in any natural language, though a number of languages are reported to have labiodental plosives including Zulu, Tonga, and Shubi.
Coronal consonants are made with the tip or blade of the tongue and, because of the agility of the front of the tongue, represent a variety not only in place but in the posture of the tongue. The coronal places of articulation represent the areas of the mouth where the tongue contacts or makes a constriction, and include dental, alveolar, and post-alveolar locations. Tongue postures using the tip of the tongue can be apical if using the top of the tongue tip, laminal if made with the blade of the tongue, or sub-apical if the tongue tip is curled back and the bottom of the tongue is used. Coronals are unique as a group in that every manner of articulation is attested. Australian languages are well known for the large number of coronal contrasts exhibited within and across languages in the region. Dental consonants are made with the tip or blade of the tongue and the upper teeth. They are divided into two groups based upon the part of the tongue used to produce them: apical dental consonants are produced with the tongue tip touching the teeth; interdental consonants are produced with the blade of the tongue as the tip of the tongue sticks out in front of the teeth. No language is known to use both contrastively though they may exist allophonically. Alveolar consonants are made with the tip or blade of the tongue at the alveolar ridge just behind the teeth and can similarly be apical or laminal.
Crosslinguistically, dental consonants and alveolar consonants are frequently contrasted leading to a number of generalizations of crosslinguistic patterns. The different places of articulation tend to also be contrasted in the part of the tongue used to produce them: most languages with dental stops have laminal dentals, while languages with apical stops usually have apical stops. Languages rarely have two consonants in the same place with a contrast in laminality, though Taa (ǃXóõ) is a counterexample to this pattern. If a language has only one of a dental stop or an alveolar stop, it will usually be laminal if it is a dental stop, and the stop will usually be apical if it is an alveolar stop, though for example Temne and Bulgarian do not follow this pattern. If a language has both an apical and laminal stop, then the laminal stop is more likely to be affricated like in Isoko, though Dahalo show the opposite pattern with alveolar stops being more affricated.
Retroflex consonants have several different definitions depending on whether the position of the tongue or the position on the roof of the mouth is given prominence. In general, they represent a group of articulations in which the tip of the tongue is curled upwards to some degree. In this way, retroflex articulations can occur in several different locations on the roof of the mouth including alveolar, post-alveolar, and palatal regions. If the underside of the tongue tip makes contact with the roof of the mouth, it is sub-apical though apical post-alveolar sounds are also described as retroflex. Typical examples of sub-apical retroflex stops are commonly found in Dravidian languages, and in some languages indigenous to the southwest United States the contrastive difference between dental and alveolar stops is a slight retroflexion of the alveolar stop. Acoustically, retroflexion tends to affect the higher formants.
Articulations taking place just behind the alveolar ridge, known as post-alveolar consonants, have been referred to using a number of different terms. Apical post-alveolar consonants are often called retroflex, while laminal articulations are sometimes called palato-alveolar; in the Australianist literature, these laminal stops are often described as 'palatal' though they are produced further forward than the palate region typically described as palatal. Because of individual anatomical variation, the precise articulation of palato-alveolar stops (and coronals in general) can vary widely within a speech community.
Dorsal consonants are those consonants made using the tongue body rather than the tip or blade and are typically produced at the palate, velum or uvula. Palatal consonants are made using the tongue body against the hard palate on the roof of the mouth. They are frequently contrasted with velar or uvular consonants, though it is rare for a language to contrast all three simultaneously, with Jaqaru as a possible example of a three-way contrast. Velar consonants are made using the tongue body against the velum. They are incredibly common cross-linguistically; almost all languages have a velar stop. Because both velars and vowels are made using the tongue body, they are highly affected by coarticulation with vowels and can be produced as far forward as the hard palate or as far back as the uvula. These variations are typically divided into front, central, and back velars in parallel with the vowel space. They can be hard to distinguish phonetically from palatal consonants, though are produced slightly behind the area of prototypical palatal consonants. Uvular consonants are made by the tongue body contacting or approaching the uvula. They are rare, occurring in an estimated 19 percent of languages, and large regions of the Americas and Africa have no languages with uvular consonants. In languages with uvular consonants, stops are most frequent followed by continuants (including nasals).
Pharyngeal and laryngeal
Consonants made by constrictions of the throat are pharyngeals, and those made by a constriction in the larynx are laryngeal. Laryngeals are made using the vocal folds as the larynx is too far down the throat to reach with the tongue. Pharyngeals however are close enough to the mouth that parts of the tongue can reach them.
Radical consonants either use the root of the tongue or the epiglottis during production and are produced very far back in the vocal tract. Pharyngeal consonants are made by retracting the root of the tongue far enough to almost touch the wall of the pharynx. Due to production difficulties, only fricatives and approximants can produced this way. Epiglottal consonants are made with the epiglottis and the back wall of the pharynx. Epiglottal stops have been recorded in Dahalo. Voiced epiglottal consonants are not deemed possible due to the cavity between the glottis and epiglottis being too small to permit voicing.
Glottal consonants are those produced using the vocal folds in the larynx. Because the vocal folds are the source of phonation and below the oro-nasal vocal tract, a number of glottal consonants are impossible such as a voiced glottal stop. Three glottal consonants are possible, a voiceless glottal stop and two glottal fricatives, and all are attested in natural languages. Glottal stops, produced by closing the vocal folds, are notably common in the world's languages. While many languages use them to demarcate phrase boundaries, some languages like Huatla Mazatec have them as contrastive phonemes. Additionally, glottal stops can be realized as laryngealization of the following vowel in this language. Glottal stops, especially between vowels, do usually not form a complete closure. True glottal stops normally occur only when they're geminated.
A top-down view of the larynx.
The larynx, commonly known as the "voice box", is a cartilaginous structure in the trachea responsible for phonation. The vocal folds (chords) are held together so that they vibrate, or held apart so that they do not. The positions of the vocal folds are achieved by movement of the arytenoid cartilages. The intrinsic laryngeal muscles are responsible for moving the arytenoid cartilages as well as modulating the tension of the vocal folds. If the vocal folds are not close or tense enough, they will either vibrate sporadically or not at all. If they vibrate sporadically it will result in either creaky or breathy voice, depending on the degree; if don't vibrate at all, the result will be voicelessness.
In addition to correctly positioning the vocal folds, there must also be air flowing across them or they will not vibrate. The difference in pressure across the glottis required for voicing is estimated at 1 – 2 cm H20 (98.0665 – 196.133 pascals). The pressure differential can fall below levels required for phonation either because of an increase in pressure above the glottis (superglottal pressure) or a decrease in pressure below the glottis (subglottal pressure). The subglottal pressure is maintained by the respiratory muscles. Supraglottal pressure, with no constrictions or articulations, is equal to about atmospheric pressure. However, because articulations—especially consonants—represent constrictions of the airflow, the pressure in the cavity behind those constrictions can increase resulting in a higher supraglottal pressure.
According to the lexical access model two different stages of cognition are employed; thus, this concept is known as the two-stage theory of lexical access. The first stage, lexical selection provides information about lexical items required to construct the functional level representation. These items are retrieved according to their specific semantic and syntactic properties, but phonological forms are not yet made available at this stage. The second stage, retrieval of wordforms, provides information required for building the positional level representation.
When producing speech, the articulators move through and contact particular locations in space resulting in changes to the acoustic signal. Some models of speech production take this as the basis for modeling articulation in a coordinate system that may be internal to the body (intrinsic) or external (extrinsic). Intrinsic coordinate systems model the movement of articulators as positions and angles of joints in the body. Intrinsic coordinate models of the jaw often use two to three degrees of freedom representing translation and rotation. These face issues with modeling the tongue which, unlike joints of the jaw and arms, is a muscular hydrostat—like an elephant trunk—which lacks joints. Because of the different physiological structures, movement paths of the jaw are relatively straight lines during speech and mastication, while movements of the tongue follow curves.
Straight-line movements have been used to argue articulations as planned in extrinsic rather than intrinsic space, though extrinsic coordinate systems also include acoustic coordinate spaces, not just physical coordinate spaces. Models that assume movements are planned in extrinsic space run into an inverse problem of explaining the muscle and joint locations which produce the observed path or acoustic signal. The arm, for example, has seven degrees of freedom and 22 muscles, so multiple different joint and muscle configurations can lead to the same final position. For models of planning in extrinsic acoustic space, the same one-to-many mapping problem applies as well, with no unique mapping from physical or acoustic targets to the muscle movements required to achieve them. Concerns about the inverse problem may be exaggerated, however, as speech is a highly learned skill using neurological structures which evolved for the purpose.
The equilibrium-point model proposes a resolution to the inverse problem by arguing that movement targets be represented as the position of the muscle pairs acting on a joint.[d] Importantly, muscles are modeled as springs, and the target is the equilibrium point for the modeled spring-mass system. By using springs, the equilibrium point model can easily account for compensation and response when movements are disrupted. They are considered a coordinate model because they assume that these muscle positions are represented as points in space, equilibrium points, where the spring-like action of the muscles converges.
Gestural approaches to speech production propose that articulations are represented as movement patterns rather than particular coordinates to hit. The minimal unit is a gesture that represents a group of "functionally equivalent articulatory movement patterns that are actively controlled with reference to a given speech-relevant goal (e.g., a bilabial closure)." These groups represent coordinative structures or "synergies" which view movements not as individual muscle movements but as task-dependent groupings of muscles which work together as a single unit. This reduces the degrees of freedom in articulation planning, a problem especially in intrinsic coordinate models, which allows for any movement that achieves the speech goal, rather than encoding the particular movements in the abstract representation. Coarticulation is well described by gestural models as the articulations at faster speech rates can be explained as composites of the independent gestures at slower speech rates.