Commentary

Where Did the South Asians Come from?
By Nayyer Ali MD

South Asia is home to almost 2 billion people, making up about 25% of the Earth’s population. Are the inhabitants of the subcontinent descended from ancient populations that have been there since the end of the ice age, or do they represent subsequent migrations from outside South Asia? This is one of the most interesting questions in history, and until recently the answer was unclear. But thanks to dramatic advances in DNA sequencing, close analysis of the genomes of South Asians has yielded an answer.

The first clue that outsiders played a big role in South Asian history came from linguistics. In 1786 a British polymath, Sir William Jones, announced to the world that there was strong evidence that the north Indian and European languages were descended from a common root tongue. He was already learned in several European languages, including Latin and Greek, but had come to Bengal and took up the study of Sanskrit. He found to his amazement that there were obvious shared words and verb structures between them, implying that all came from a single source. Languages evolve over time, and can change and splinter into one or more daughter languages. This is how Latin turned into French, Spanish, and Italian in the centuries after the fall of Rome. Most educated English speakers can easily read the text of the US Constitution (1787) but have some challenges with Shakespeare (1610) and find Chaucer (1350) almost impossible to comprehend. So, it is expected that languages could be quite different today but share a common source in the past.

Linguistic analysis has shown that there once existed a language called Proto Indo-European (PIE) that was the source of most of the languages of Europe and South Asia and Iran. PIE branched in Europe into Greek, Italic (Latin), Germanic (German, English, Swedish etc.), Celtic (Irish and Welsh etc.), Balto-Slavic (Russian, Polish, Czech, etc.), Anatolian (Hittite and other extinct tongues), and Armenian. Another branch gave rise to Proto Indo-Iranian, with the Iranian branch becoming Persian, Kurdish, and Pashto, while the Indian branch gave rise to Sanskrit that eventually turned into Hindi-Urdu, Bengali, Punjabi, Sinhala, Kashmiri, Gujarati, etc. South India has its own Dravidian language family that is unrelated to PIE.

The big question was who were the original speakers of PIE and how did their language spread so widely? Also, what was the mechanism of language spread? Was it a cultural spread (like Arabic in North Africa and Latin in the Roman Empire), or a physical replacement of one population by another (like Turkish in Anatolia or English in North America)?

Thirty years ago the working theory was that PIE was the language of the first farmers from the Middle East, and they spread their language to Europe and South Asia along with their ability to grow wheat crops. This would explain a lot, but the problem was time. There was just too much of it. Wheat farming began almost 10,000 years ago, and spread into Europe and South Asia about 8,000 years ago. If PIE was that old, any linguistic connections would have been erased with the passage of time. PIE had to be a much younger language.

Genetic analysis of both Europeans and South Asians does show significant intrusion of DNA from Middle Eastern farmers about 8,000 years ago, from Anatolia in the case of Europe and from Iran in South Asia. These farmers were probably the seeds of the Harappan civilization in the Indus Valley, the ruins of which have been excavated in many sites in Pakistan. Up till about 4,000 years ago the population of South Asia consisted of a well-mixed group that derived its ancestry from the hunter-gatherers that originally lived there and the farmers from Iran. By 3,000 years ago (1000 BC) the population had split into what is called now Ancestral North Indian (ANI), and Ancestral South Indian (ASI). The ANI were Sanskrit speakers, while the ASI were Dravidian speakers.

What happened in north India also happened in Europe but several centuries earlier. A population that consisted of a blend of ancient hunter-gatherers and Anatolian farmers now had a totally different genetic signature, one that accounted for about 50% of the DNA of Europeans. This was found in about 40% of ANI. We know where this genetic signature came from, the Yamnaya people of the Pontic-Caspian Steppe, the vast grasslands of modern Ukraine and Southern Russia.

What gave the Yamnaya people the ability to spread their genes and their language across such a vast area from Britain to Bengal in about 10 centuries? They had a massive technological edge over all of their neighbors. It was these people who had domesticated the horse about 5,000 years ago not just for food and milk but for riding, and even more importantly, with the wheel and wagon, for pulling loads. This package gave them a critical advantage and allowed the Pontic-Caspian steppe to produce a surplus of population that had the means to move elsewhere in large numbers, and the power to subdue those they encountered with the horse and chariot as a weapon of war.

It seems incredible that this region could have produced enough people to spread over so vast a space. But we don’t fully realize how different the world of 3000 BC was to ours. Back then the total human population of the planet was less than 20 million people, with about 30% of them living in Egypt and building pyramids. If the Yamnaya exported only a few thousand people per year for 500 years or more, they would have easily accomplished this transformation.

Does this invasion of horse-riders from the north somewhere between 2000 and 1000 BC correlate with South Asian history in other ways? The geneticist David Reich, in his book on this, writes that “In the oldest text of Hinduism, the Rig Veda, the warrior god Indra rides against his impure enemies, or dasa, in a horse-drawn chariot, destroys their fortresses, or pur, and secures land and water for his people the arya, or Aryans.” This certainly sounds like a mythologized version of actual events.

One of the salient features of Hinduism is the caste system, maintained by a strong endogamy (marriage restricted to within group). When DNA from various Hindu caste members was analyzed, it showed the sort of bottlenecking that would be expected from strong endogamy. In fact, the endogamy appears to be best dated to beginning about 1000 BC, showing that this feature of Hinduism has been strongly embedded in South Asia for millennia, and was not a social construct of British rule, as some have contended. When South Asians were analyzed by social class, higher class groups, for example Brahmins, had higher proportion of ANI genetic signature. Over the last 3000 years there has been some mixing of DNA, but there still remains enough distinction to make these remarkable genetic discoveries.

The peoples of South Asia and Europe have rather amazing historical similarities. Both lands contained ancestral hunter gatherers that had lived there since the end of the ice age about 12,000 years ago. Roughly 8,000 years ago farmers came to both regions from the Middle East bringing wheat and also overwhelming and, to a large but not total extent, replacing the previous inhabitants. These new peoples built Mohenjo Daro and Stonehenge. Finally, another invasion of horse-riders from the Pontic Caspian Steppe between 5000 and 4000 years ago brought their language and their numbers to supplant the farmers that were already there. In some regions of Europe, they almost wiped out the genetic trace of the prior inhabitants. In India they became the dominant genetic signature in the north, giving rise to the Ancestral North Indian, and the remaining population in the South became the Ancestral South Indian.