By the number of languages they contain, Austronesian and Niger–Congo are the two largest language families in the world. They each contain roughly one-fifth of the world's languages. The geographical span of Austronesian was the largest of any language family before the spread of Indo-European in the colonial period. It ranged from Madagascar off the southeastern coast of Africa to Easter Island in the eastern Pacific. Hawaiian, Rapa Nui, Maori and Malagasy (spoken on Madagascar) are the geographic outliers.
According to Robert Blust (1999), Austronesian is divided into several primary branches. Only one of these is found exclusively in Taiwan. The Formosan languages of Taiwan are grouped into as many as nine first-order subgroups of Austronesian. All Austronesian languages spoken outside Taiwan (including its offshore Yami language) belong to the Malayo-Polynesian branch. These are sometimes called Extra-Formosan.
Most Austronesian languages lack a long history of written attestation. This makes reconstructing earlier stages – up to distant Proto-Austronesian – all the more remarkable. The oldest inscription in the Cham language, the Đông Yên Châu inscription dated to the mid-6th century AD at the latest. It is also the first attestation of any Austronesian language.
It is difficult to make generalizations about the languages that make up a family as diverse as Austronesian. Very broadly, one can divide the Austronesian languages into three groups: Philippine-type languages, Indonesian-type languages and post-Indonesian type languages (Ross 2002):
The first group includes, besides the languages of the Philippines, the Austronesian languages of Taiwan, Sabah, North Sulawesi and Madagascar. It is primarily characterized by the retention of the original system of Philippine-type voice alternations, where typically three or four verb voices determine which semantic role the "subject"/"topic" expresses (it may express either the actor, the patient, the location and the beneficiary, or various other circumstantial roles such as instrument and concomitant). The phenomenon has frequently been referred to as focus (not to be confused with the usual sense of that term in linguistics). Furthermore, the choice of voice is influenced by the definiteness of the participants. The word order has a strong tendency to be verb-initial.
In contrast, the more innovative Indonesian-type languages, which are particularly represented in Malaysia and western Indonesia, have reduced the voice system to a contrast between only two voices (actor voice and "undergoer" voice), but these are supplemented by applicative morphological devices (originally two: the more direct *-i and more oblique *-an/-[a]kən), which serve to modify the semantic role of the "undergoer". They are also characterized by the presence of preposed clitic pronouns. Unlike the Philippine type, these languages mostly tend towards verb-second word-orders. A number of languages, such as the Batak languages, Old Javanese, Balinese, Sasak and several Sulawesi languages seem to represent an intermediate stage between these two types.
Finally, in some languages, which Ross calls "post-Indonesian", the original voice system has broken down completely and the voice-marking affixes no longer preserve their functions.