Hidden in Plain Sight

Zaven Ayvazyan — Mon, 04 May 2026 12:59:34 GMT

This is approximately a 45-minute read. If you prefer, you can download the PDF version or save this post to read later.

Foreword

Every language has three major roles:

Existential: it emerges as a vital survival practice – a medium for expressing, conveying, and storing knowledge and thought
Ontological: settling into specific patterns of wording and phrasing, it proceeds to shape the thought itself – the way people perceive and comprehend reality
Epistemic: over time, it becomes knowledge itself, its metamorphoses recording the history of people speaking it – a linguistic analogue of annual rings or geological strata.

As a medium, the Armenian language is in every way extraordinary, breaking frameworks and challenging theories – a true “white whale” of linguistics. As a profound ontological vessel that encoded the civilisational roots of the Armenian Highlands – layering organic environmental, communal, and philosophical worldviews into its very linguistic fabric – Armenian has played a uniquely formative role in identity. Being a product of purposeful linguistic engineering, however, it also became an unwitting conduit for ecclesiastically normalised concepts of reality to supplant and overlay the original ones, thus enabling the reconstructed framework of thought to distort the worldview. And with its lexical borrowings and structural idiosyncrasies reflecting events chroniclers tried to spin or silence, Armenian presents as a rare source authentic to the least attested times - a living witness of history.

Sadly, the interest it rightfully attracts is not always followed by the scientific integrity it deserves. Biased theorisations, manipulating language for partial purposes, though not uncommon, generate false knowledge, which is more dangerous than plain ignorance. The issue extends beyond linguistics; fields related to Armenian studies are not free from co-optation in advancing politically motivated arguments – and the ‘pro-Armenian’ bias is, in fact, the most perilous one for the very cause it deems to support.

A nation – creating, successful, valuable to itself and to the world – cannot be sustainably built on faulty narratives. These do not convey a usable past, even less so bear a viable future vision the nation-building requires. They are damaging physically, not metaphorically: nationalist delusions of grandeur create vulnerability gaps by overestimating strengths and opportunities while underestimating weaknesses and threats – and generally misperceiving both. True strength comes from clear-eyed self-knowledge.

What follows is not a linguistic or historical paper per se. The author is neither linguist nor historian, but approaches Armenian linguistic history as an inter-disciplinary systems coherence problem. When linguistic conclusions contradict historical attestation and both conflict with archaeological evidence, genetic data, or simply logical frameworks, the problem is methodological. This paper treats linguistic findings as outputs to be tested for internal consistency and cross-domain alignment. The resulting tensions – between claimed timelines and attested mechanisms, or substrate structure and corpus limitations, or standardisation narrative and textual evidence – demand resolution through systematic elimination of internally contradictory hypotheses.

The objective is practical: clearing ground for sustainable nation-building. Armenian linguistic historiography is but one arena to be cleared from agendas that actively harm strategic thinking. Circular reasoning props up untenable timelines, survivorship bias masquerades as evidence, and substrate gets dismissed as ‘hidden’ or attributed to phantom languages rather than the documented indigenous population. The paper that follows dismantles specific claims through specific evidence. Linguistic truth, like any, matters not as mere academic integrity but as precondition for survival.

Introduction

Conventional narrative of Armenian language genesis and spread paints a rather straightforward picture:

Diverged from the parent Proto-Indo-European (PIE) ca. 3000-2500 BCE – argued on the basis of linguistic analysis
Presumed to have spread in the Armenian Highlands over the course of the 2^nd millennium BCE – based on early presence deduced from the PIE split dating, suggesting prolonged contact with the native Hurro-Urartian languages, the extent of resulting influence being contested
Purported to be widely present in the Kingdom of Urartu – mostly inferred from its later manifestations, in the absence of linguistic data on Urartian vernacular
Considered to be Strabo’s unnamed “single language” of Armenia by the 1^st century CE – in the context of his references to the Artaxiad Kingdom consolidation, interpreted as a unification of previously divided ethnolinguistic commonality
First recorded with the script created in the 5^th century by Mesrop Mashtots – assumed the millennia-old vernacular codified and standardised as Grabar, the Old Armenian, to deliver scriptures and liturgy in the mother tongue
Attested ca. the 12^th century as a morphologically distinct Middle Armenian – alleged to have branched off Grabar by absorbing external agglutinative and analytical influences – from which the modern standardised language evolved.

Notably, this picture implies a circular self-explanation of a) the remarkable scope of differences between Armenian and other Indo-European languages in structure and lexicon and b) the time of PIE split and subsequent spread:

the a) is explained by a prolonged isolation with intense non-Indo-European language contacts, taken to suggest early emergence and spread
the b), thus inferred, is assumed explaining the divergence.

Some researchers have come to question the very premise of Armenian’s Indo-European origins; if that is removed, the whole chain of conclusions crumbles, making room for more parsimonious explanations without resorting to multilayered phonetic shifts and overstretched semantic relations. Still, the structural and lexical Indo-European elements, however limited in comparison, appear to constitute the core of Armenian, thus making its Indo-European origins not dismissible outright.

Juxtaposed with the attestable course of events in the Highlands, this conventional narrative leaves more questions open than it tries to answer, pertaining to both the spread and the nature of the Armenian language. Notably:

If Grabar was, indeed, a codification of an already millennia-old vernacular, then why did that vernacular so diverge from it just a few centuries later, especially in its structural elements?
If Armenian had spread over the entire Armenian Highlands by the 1^st century CE, then what was the mechanism of that spread among the originally Hurro-Urartian-speaking population under the consecutive rule of Iranian-origin dynasts using Aramaic for administration and later Greek in economic and cultural contexts?
If Armenian was widely present in the Urartian period – with some going as far as to claim its dominating role, then why is the core of inherited PIE roots and structure so limited?

These questions do not appear unrelated. Taken together, they seem to collide with a single presumption beneath the whole timeline, each running against a specific aspect of it at a different point. The same presumption that the methodological circularity noted earlier – where divergence justifies early timing, and early timing explains divergence – boils down to. It’s a postulate of primordial omnipresent language continuity that needs to be unravelled, layer by layer.

The Problem of Agglutinative ‘Simplification’

Erosion that never happened

The medical treatise of the 12^th-century Cilician Armenian scholar Mkhitar Heratsi Ջերմանց Մխիթարութիւն (Relief of Fevers) is broadly considered the earliest written attestation of Middle Armenian. It remarkably uses the language along with and instead of Grabar, with the explicit purpose of making text accessible to the common audience – clearly indicating that a pure Grabar text would not be. The Middle Armenian being a common language in the 12^th century and limited intelligibility of Grabar is quite telling: the latter is considered the common language standardised in written form some eight centuries before, after a millennium or more of oral existence.

The differences between the two, specifically the structural ones, are prominent: Grabar is highly synthetic, fusional language, whereas Middle Armenian is more analytical and, importantly, distinctively agglutinative. Explanations of this disparity generally follow two main lines of argumentation:

Common erosion of the literary, ecclesiastical language in the vernacular, observed in other Indo-European cases – usually referencing Latin, Church Slavonic, or Greek
External influence from agglutinative languages being dominant in contact zones, especially bilingual – with some pointing to the close resemblance between Armenian and Turkish in this aspect.

Analytical shift can be endogenous; otherwise, these are not independent scenarios but two aspects of a single one: the agglutinative erosion hypothesis requires the source along the transformation. And neither aligns with the known linguistic history of Armenian.

Latin, Church Slavonic, Greek, and some Semitic examples, like Syriac and Classical Arabic, had been broadly attested and used in written form1 before becoming ecclesiastical:

the language first attested as common, mostly also as written
then was canonised as liturgical, retaining its common function and relation between the two
later got simplified through consecutive vernacular forms, while conserving the high register.

The trajectory of Armenian is nothing like that:

the language – common or not – spoken for an uncertain yet presumably long time, without attestation
first emerged in writing as straightaway liturgical, yet remaining unattested as common
about eight centuries later, re-emerged in writing as common – already different, with nothing in between to show how the former eroded into the latter.

Interestingly, some cited cases imply the imposition of a language on non-native speakers, which could explain intelligibility issues and the scope or pace of simplification, yet question the claim that Armenian was a prior common tongue.

Further yet, the thematic scope of extant Grabar texts, which is unusually limited, especially until the end of the 1^st millennium, is noteworthy. It covers:

Religion – translations (scripture, exegesis, patristic writings, etc.), some original texts
Philosophy – mostly translations
Science – translations, some original treatises (Anania Shirakatsi)
Historiography and hagiography – with line between the two oftentimes blurred (Agat’angeghos, Buzand, P’arpetsi, Khorenatsi, Eghishe, etc.)
Poetry – spiritual and theological works (Grigor Narekatsi, Grigor Magistros)
Law – canonical or canon-based (e.g., Canons of Aghven)
Administration – land grants to monasteries
Correspondence – theological debates and related clerical and political issues (e.g., Girk Tghtots), occasional private letters (Grigor Magistros).

On top of that, authors of the Grabar texts are almost exclusively clergymen (P’arpetsi, Khorenatsi, Eghishe, Narekatsi, to name a few) or church-affiliated (Agat’angeghos), Anania Shirakatsi and Grigor Magistros being the only notable exceptions.

Preservation bias would make a convenient explanation – in theory, had it not been so conspicuously selective: monasteries preserved land grants in Grabar, but none of what they were using land for – corvée records, rent, stock inventory, trade contracts, etc. Moreover, if that selectiveness is explained by the discretion of clerical institutions, being the sole loci of archiving and replication, it speaks to the level of control positioning Grabar as the church language, almost exclusively. Had it functioned as the ordinary written medium of a broader population, one should expect to see some trace of routine administrative, legal, commercial, or private usage in several centuries straight – at least by the elite, if not by common folks.

Conversely, once written, Middle Armenian manifests in that exact way, producing a slew of texts on a broad range of practical topics, which begs the question about which language people used daily before. Putting things in perspective: a long-time common language becomes written, standardised, and propagated through liturgy and church schools – yet centuries later, fluency in that common language apparently decreases to limited intelligibility. Granted, in a pre-printing era, fluency in standard language might not have grown universal, but it should not have declined either. Parts of this story do not quite fit together; discrepancies bring about several potential explanations, not necessarily mutually exclusive:

Armenian was not a common language by the 5^th century; a significant portion of the population spoke other language(s)
Grabar was not the Armenian vernacular codified and standardised, but a language that diverged from it – standardisation was actually a modification
Post-5^th century, the Armenian rite propagated to embrace diverse groups; these adopted Grabar as a single liturgical language, while speaking other language(s), originally.

Whichever explanation – or combination thereof – holds for Armenian in general, all of them suggest, on different grounds, that at least Grabar does not represent the common language of the country at any point - even at the time of its codification. And in any case, the absence of broader, mundane use cases, however explained, itself renders moot the claims of Grabar ‘eroding’ into Middle Armenian in the commoners’ use, because neither the erosion nor even such use can be demonstrated. This refutes explaining the divergence between Grabar and Middle Armenian by gradual attrition - all the more so, considering the nature of that purported attrition.

Many Indo-European languages have undergone analytical simplifications over time, yet the emergence of agglutinative morphology is not generally associated with PIE nor with natural language development. Agglutinative features in an Indo-European language suggest intense contact with a non-Indo-European source, e.g., Caucasian and Turkic for Ossetian (Alanic) or Uralic/Samoyedic for Tocharian. The case system is normally simplified through analytical shifts, English being a striking example of a nearly complete loss of it; simplification through agglutination is nowhere to be seen.

In terms of influences, no language present across the Armenian Highlands, especially to the extent of widespread bilingualism, in and around the period in question – Old Persian, Aramaic, Greek, Parthian, Middle Persian, Syriac, Arabic – shows comparable agglutinative features. Those employing agglutinative morphological structures – Kartvelian and Caucasian languages – have no historically attested contact regime of the spread, continuity, or sociolinguistic conditions required to restructure the language of a broader population within that time frame. Moreover, the purported structural influence, on top of lacking clear correspondences, has no parallels in lexicon: systematic borrowings from these languages are immaterially low. All in all, references to Caucasian or Kartvelian influence provide not an explanatory mechanism but a dummy for an otherwise inexplicable deep-rooted feature.

So must be seen the Sprachbund invocations. A Sprachbund suggests convergence through shared, observable features across multiple languages. Attributing a major structural development in a single language to diffuse regional influence in the absence of comparable patterns in neighbouring ones is another deus-ex-machina-type explanation.

Finally, Turkish references are simply anachronistic: Heratsi wrote in late 12^th-century Cilicia; the Principality of Cilicia was founded in 1080 – within a decade from the Battle of Manzikert in 1071 – and populated by people escaping Seljuk expansion. With a mere century of occasional contact, no realistic scenario can suggest their language adopting agglutination from Turkish until much later, under the Ottoman Empire.

Absence of evidence is not the evidence of absence. However, the absence of an agglutinative contact language is indisputable within the given time frame, which refutes explaining the agglutinative Middle Armenian by an external linguistic influence that Grabar had undergone. Thus, neither the fact nor the source of structural change can be demonstrated. Meanwhile, Middle Armenian is considered to descend from Grabar merely because the latter was attested in writing earlier, whereas the supposed overall dating of Armenian implies an even older vernacular. The evidence strongly suggests that the agglutinative spoken language predates Grabar; Middle Armenian takes after it, probably influenced but not likely mediated by Grabar.

In this regard, the problem of how the fusional language became agglutinative is false; the correct one is why Grabar – a language supposed to make scripture more accessible to people – is so different from the one they actually spoke.

Vernacular that was not

To be precise, presuming Armenian to be the common language by the 5^th century is itself a retrojection, hinging on the very premise that Grabar translations were intended, indeed, to make scripture more accessible to the people. However, it is a pure conjecture from later sources, unverifiable by contemporary evidence.

Firstly, no linguistic artefacts of the common language pre-5^th century are available.

Secondly and more crucially, any notions of “Armenian” in foreign pre-conversion sources (and there are no domestic ones), quite counterintuitively, do not carry ethnolinguistic information per se. The notion of “Armenia” post-Urartu meant:

a Persian exonym (Armina) synonymous with Assyrian Urartu (Babylonian Uraštu) – i.e., another external name for Biainili
a geopolitical/administrative designation (Achaemenid satrapy, Orontid/Artaxiad/Arsacid2 kingdom)
a regional identity marker (from/of Armenia3).

None of these indicate the language the general population would have spoken or understood. Through the lens of the Armina-Uraštu synonymity, mentions of the “Armenian” language would mean “Urartian”, if anything. However, since the Persian exonym gained traction far beyond Persian, such mentions are merely placeholders for whatever language the source, through its own lens, perceives as the language of Armenia.

Thirdly, complicating matters even more, yet another retrojection is interpreting the language referenced in the earliest available Armenian-language sources as Հայոց (Hayots) as the “common language of Armenia”: the mapping of the “land of Hayots language” onto 4^th-5^th-century Armenia is a habitual convention with no direct evidence similar to the trilingual Behistun inscription that straightforwardly maps the Persian Armina, Babylonian Uraštu, and Elamite Harminuya onto each other.

Meanwhile, during the conversion, in its most intense phase, the Armenian Apostolic Church (AAC) ostensibly did not seem concerned with intelligibility issues. According to Agat’angeghos4, St. Gregory required Tiridates III to establish special schools where noble and priestly offspring – specifically from provinces most defiant to conversion – were to be mandatorily brought in for indoctrination. Yet no endeavour was then undertaken to facilitate such an important ideological project by making the scripture more accessible to those allegedly Armenian-speaking kids. For a whole century, the AAC was performing liturgy predominantly in Syriac, as well as Greek.

Aramaic, the dialect of which Syriac is, had been present in Armenia since the second half of 6^th century BCE at the latest, being the administrative language of the Achaemenid Empire. The spread of Greek began in the late 4^th century BCE, following the Graeco-Macedonian conquest of the Achaemenid Empire and culminating in the Hellenistic period under the Artaxiad rule. Being of Persian origin themselves5, the Artaxiads practiced bilingualism: border stelae of Artashes I carried Aramaic inscriptions, Tigran II minted coins with Greek ones; Artavazd II is reported writing plays in Greek. The Parthian influence brought by the Arsacids with their Roman client king status maintained the balance between these two languages.

Interestingly, the AAC traces its tradition back to the apostles Thaddeus and Bartholomew, who introduced Christianity to the 1^st-century Armenia. Jews by origin, they would have preached in either Aramaic, spoken at the time in Palestine and across the Levant, or Greek, addressed to the broader audience within the Roman Empire. In other words, by the 5^th century, both languages had been ecclesiastical in the country for about three centuries – on top of being in state use for 700-900 years. It should not come as a surprise, then, that the AAC was shepherding up to four generations of its flock, many being forced converts, in these languages; surprising is why the intelligibility of liturgy would change by the 5^th century – or was it something else that changed?

In 387, the Treaty of Acilisene marked the first partition of Armenia between the Roman and Sassanid empires and the end of the semi-sovereign Kingdom of Greater Armenia. Some four-fifths of its territory, along with the capital and the seat of AAC, came under Persian suzerainty; the westernmost part, including Sophene and Azranene, was ceded to Rome. Arshak III, the last Roman client king of Greater Armenia, chose to relocate there. Initially, some part of the Christianised nobility decided to follow suit, but just two years later, after the king’s death, Rome abolished the monarchy in Roman-controlled Armenia, swiftly placing it under uniform Roman governance6.

In contrast, the Sassanids retained the traditional vassalage model, reinstating another Arsacid, Khosrov IV, to the Armenian throne. They did not intervene in the latter’s appointment, though without Shah’s prior consent, of the Gregorid heir Sahak Part’ev as Catholicos and even granted AAC tax exemptions. The nakharars’ property rights were not violated; their patrimonies were kept intact. The combination of Roman push and Persian pull restored the status quo: by the 390s, the Christian nobles were back in Persarmenia. It meant, however, that the AAC’s flock was now split between the two empires.

The partition brought a respite in direct military clashes between geopolitical adversaries, but the unrelenting tension between them was now guiding their domestic policies towards Armenian subjects. The AAC had found itself right on the rift: the seat was in Persia, but Catholicoi were being ordained in the Roman Caesarea. Both parties were seeing this duality as a strategic vulnerability, with the Sassanids arguably having more grounds for concern, being more dependent on Armenian levies and moving to counter Christianity in Rome and consolidate their authority by promoting Zoroastrianism as a state religion7.

Syriac, the dialect of Aramaic that was yet official in Iran, was adopted as the liturgical language of the autocephalous Church of the East – a “domestic” church of the Persian Empire; Greek was the language of the Constantinople Patriarchate. AAC, with most of its flock in Persarmenia and Greek suppressed as Roman influence by the Sassanids, was predominantly using Syriac liturgy; in its western parishes, with the Romans treating Syriac the same, the Greek one had to be used. Given the political pressure from both empires, liturgies in those languages could have enabled the transition of the AAC’s flock under the respective ecclesiastical bodies exerting theological authority.

Within this historical context, and factoring in the lack of active interest in linguistic reform before the major shifts in Armenia’s geopolitical status, it stands to reason that the reform was more likely conditioned and warranted by the precarious position the AAC found itself in. Languages long-used and well-attested in the country – Middle Iranian (Parthian/Middle Persian), Aramaic/Syriac, and Greek – all became political vulnerabilities. And here, from obscurity, a language emerges: unlike any of these, yet with numerous borrowings; a slew of ecclesiastical translations and related texts (e.g., Greek Platonism and Neo-Platonism), yet no records of lay usage. The borrowings, Indo-Iranian, Semitic, and Greek, are telling themselves, dominating in thematic categories of8:

cosmology and religion – 50% (mostly Semitic and Greek)
economy and trade – 67% (mostly Greek)
social organisation and governance – 72% (Indo-Iranian and Greek)
military and warfare – 77% (overwhelmingly Indo-Iranian).

The scripture translators (Surb Targmanichner) and the Hellenising school, of Mashtots’s legacy, acted as a language institute shaping it. The Grabar translations of Greek texts are remarkably literal, with the language structure intentionally adapted for that purpose – so much so, philologists consider them a matrix for reconstructing lost Greek originals. Structural adaptation affected the case system, declension and tenses, as well as introduced specific moods, numbers, and certain lexemes (affixes, prepositions, words calqued from Greek). Moreover, the structure carries traces of the superimposed Semitic/Syriac case system in declensional prepositions, often merged as prefixes.

Thus, Grabar presents as a language:

used predominantly, if not exclusively, for the purpose of translating and forming a corpus of authentic ecclesiastical texts and clerical writings
lacking PIE-inherited vocabulary for the very topics it was meant to convey, compensating for it with extensive borrowing and calques from languages more suited for that
being altered structurally to match the languages of translated sources, to the level of morphological re-categorisation.

The lexical and morphological discrepancies are the most crucial here: if Syriac or Greek were indeed not quite intelligible to the general populace, so would Grabar be, structurally modelled after them and full of loanwords and calques requiring explanation anyway. Should the urge for popular accessibility have driven the language reform, one would expect trade-offs in favour of language intelligibility instead of translation authenticity. Yet this riddle of “unpopular popular” language exists only if taken out of the historical context; within, one can plausibly assume that the language shift was meant to delineate the AAC’s domain, putting the clergy in full control of theological content and ideological framing.

Not uncommon in political practice to this day, language reforms follow and solidify regime changes and consolidation or secessionist projects. The case of AAC was all of it: a recently imposed ecclesiastical regime – still hereditary at the time – erecting a language barrier to secede from imperial hierarchs and consolidate its base. Official language replacement, dialect standardisation, alphabet change, linguistic purism, etc., are typical of reforms. Profound structural reworking to match a model foreign standard for applied purposes, though, is more reminiscent of developing a constructed or programming language with predetermined features. From that, one could suspect – which is not defensible but still worth mentioning – that the language engineers from the Hellenising School were not native to it. It seems psychologically more appropriate to “vivisect” a ‘plebeian’ vernacular, not the mother tongue. Whatever the case, their mindset appears deeply Greek or Syriac, rather than anything else.

Church Slavonic, another exception in the trajectory of liturgical languages, offers a control case - all the more instructive, superficially paralleling Grabar. Created in the 9^th century by Cyril and Methodius from their native Thessaloniki dialect, it was immediately intelligible to its target Slavic populations, developing regional liturgical variants in contact with their dialects, while the underlying vernacular evolved separately into distinct languages. A typical Indo-European language structurally, it overwhelmingly retained native Slavic lexicon, Greek borrowings confined to theological vocabulary.

Beyond the same ecclesiastical origins, the trajectory - authentic vernacular base, original Indo-European structure, immediate intelligibility, traceable regional development - sharply contrasts Grabar’s profile: no prior attestation, deliberate structural adaptation, substantial borrowings in elite-use domains, and problematic intelligibility requiring Middle Armenian pivot centuries later. The comparison highlights how a genuine vernacular would actually manifest through liturgical canonisation - inapplicable to Grabar.

The Problem of “Single Language”

Strabo as Unreliable Witness

“According to a report, Armenia, though a small country in earlier times, was enlarged by Artaxias and Zariadris, who formerly were generals of Antiochus the Great,⁠ but later, after his defeat, reigned as kings (the former as king of Sophenê, Acisenê, Odomantis, and certain other countries, and the latter as king of the country round Artaxata), and jointly enlarged their kingdoms by cutting off for themselves parts of the surrounding nations, — I mean by cutting off Caspianê and Phaunitis and Basoropeda from the country of the Medes; and the country along the side of Mt Paryadres and Chorzenê and Gogarenê, which last is on the far side of the Cyrus River, from that of the Iberians; and Carenitis and Xerxenê, which border on Lesser Armenia or else parts of it, from that of the Chalybians and the Mosynoeci; and Acilisenê and the country round the Antitaurus from that of the Cataonians; and Taronitis from that of the Syrians; and therefore they all speak the same language, as we are told.”
Strabo. (1928). The Geography of Strabo (H. L. Jones, Trans.; Vol. 5, Book XI, Chapter XIV). Harvard University Press.

This passage from Strabo’s Geography is treated by the conventional historiography as an attestation of Armenian being the common language of the Greater Armenia. Yet, nothing in the passage itself nor in the broader historical context points to that. The argument usually goes “Strabo makes that statement in conclusion of his description of how Artashes I consolidated the previously fragmented parts of the country, the single language being a result of it” – dragging in the presumption that all those parts were Armenian-speaking before consolidation, which is pure speculation.

It needs to be stated that the validity of Strabo’s accounts regarding Armenia is itself overrated. Scholars agree that his text is a mix of a limited first-hand experience with broader testimonies from other authors, military reports, travel notes, and hearsay, generally. The style varies accordingly; the citation above, enclosed in disclaiming references (“according to a report…”, “…as we are told”), visibly belongs to the second category. Further, Strabo positioned his work as neither historiography, nor ethnography – it’s “Geography”, and even as such not quite precise; River Tigris flowing through the Lake Van (Thopitis, i.e., Tosp/Tushpa), yet “because of its swiftness [keeping] its current unmixed with the lake”, is a vivid case in point. As for historicity, Strabo clearly confuses the dominions of Artaxias and Zariadris.

That said, even Strabo’s own rendering of events depicts a conquest, not unification: “Armenia, though a small country in earlier times, was enlarged by Artaxias […] by cutting off […] parts of the surrounding nations”, i.e., Medes, Iberians, Chalybians, Mosynoeci, Cataonians, and Syrians. The wording is unequivocal; the concluding “therefore” logically hints at a state language – administrative or dynast’s mother tongue – rather than the common one.

Beyond that, the invasion of Artaxerxes, future Artashes I, was likely staged from Media Atropatene; Armenian levies, from without the Orontid realm, could have been involved, but the rallied troops must have been Seleucid-affiliated. His conquest was the subjugation of Seleucid-dependent yet formally sovereign Armenia to direct imperial rule at the behest of Antiochus III. He himself was of Persian origin, later claiming Orontid relation legitimising his kingship. The subsequent Greater Armenia expansion was an imperial project typical of the time and region. Nothing in his story speaks to ‘Armenian unification.’

While the Artaxiads’ mother tongue is not specified, only speculated to be Indo-Iranian, the languages used under them are well attested. Consistent with the Achaemenid/Orontid tradition, the administrative matters were conducted in Aramaic. Indicatively of Hellenisation, Greek permeated trade, coinage, and art. With the Persianate civilisational influence ever present, no Indo-Iranian language promulgation is recorded – on the contrary, Greek seems to have become the prestige language. However, while the latter might have been confined to the elite and merchant strata in urban centres, the Aramaic of border stelae addressed the widest possible audience, including provincial farmers.

The country Artaxerxes captured from Erwand IV was a legacy Achaemenid satrapy formed of the fallen Kingdom of Urartu/Biainili; with no population replacement in evidence, it would have been inhabited by the same indigenous Hurro-Urartian tribes, Armenian-speakers, and other groups – South Caucasian, North Mesopotamian, etc. – all superimposed by the post-imperial Iranian elite. Based on facts at hand, two things are indisputable:

there’s no realistic scenario for Armenia, especially the Artaxiad Greater Armenia with freshly annexed territories, to be homoglossos in real sense, and
there must have been an administrative lingua franca, at least at the intercommunal level, if not common - that very Strabo’s “single language”.

The most plausible candidate for lingua franca is the language of border stelae – Aramaic – unless newly excavated linguistic artefacts prove otherwise in the future.

As for Armenian, its purportedly ubiquitous presence across the Armenian Highlands cannot be supported by data. Most probably, it originated in the contact zone between the indigenous culture and the immigrating Steppe one – the North-East of the Highlands. The derivative fusion – the Metsamor-Lchashen culture associated today with Armenian – is firmly localised in the Southern Caucasus. The Hurro-Urartian-speaking core lay further South-West, centred on the Lake Van-Mount Ararat axis; the Lake Van (Nairi/Tushpa) heartland supported a higher population density, especially with the Urartian infrastructural projects.

To man these projects and pacify the conquered territories, the Urartian kings famously practised forced relocations, ostensibly dispersing some Armenian-speaking groups in the wider country. At the same time, these relocations, not amounting evidently to a mass replacement/displacement, were multi-directional: the frontier fortresses in the North-East would be reciprocally manned by people from the heartland or another frontier. At any rate, these population swaps alone, given the probable demographic disparity, would rather contribute to the promulgation of the majority language (or lingua franca of the time) than the opposite.

On a larger scale, the Armenian-speakers probably pushed back South-West into the Ararat Plain around the fall of Urartu, allied with the invading Medes. By the Orontid times, when the domestic historiography went into “radio silence” for over a millennium, the presence of Armenian would most plausibly have been gradient9 along the Lake Sevan-Lake Van vector, fading beyond Mount Ararat. From then to the 5^th century CE, there are literally no attested mechanisms for the wider spread of Armenian:

it was neither native for the core Highlands population (that was Hurrian/Urartian),
nor administrative (that was Aramaic),
nor cultural or trade (that was Greek),
nor dynastic (those were Indo-Iranian languages),
nor military command language (that was first Imperial Aramaic, then Parthian/Middle Persian).

Additionally, language spread could not be uncontested nor uniform across the Highlands: the southern and south-eastern outskirts experienced an increasing Indo-Iranian influence10; south-western parts (notably, Sophene) were similarly affected by Aramaic, and the western and north-western ones by Greek. This sums up what can be plausibly deduced from facts, including known historical regularities.

The rest is a wide-open field for speculation. For example, one could connect the Medes-Etiuni alliance with the Orontid capital relocating North-East – to the district known in later sources as Ostan (Vostan) Hayots that hosted also their successors’ capitals and the seat of AAC – and speculate that the Iranian dynasties relied more on their historical South Caucasian allies than the rebellious heartland population. “Ostan” also denoted a special noble corps of royal guards (ostan gund); thus, given the Steppe legacy of the proto-Armenian-speakers, one could further speculate that their warlike descendants had comprised the core of the middle-rank military – or, going far, of the social stratum forming it, azats.

This could have given Armenian some credible boost, but with all speculations holding, it would still be insufficient. Even politically dominant, the language of the military is not guaranteed to take root: all Germanic conquerors shifted to Romance, Varangians dissolved into Russians, Seljuks and Mongols adopted Persian, Turkic Bulgars turned Slavs, and Normans, once francised, reverted to Germanic language – a different, simplified one. And Armenian was neither politically dominant nor attested as a command-tier language until post-Arsacid. Speculations aside, the fact of the matter is that no viable spread mechanism for Armenian is attested emerging until liturgy translated into Grabar.

Grabar Texts as Adverse Evidence

Sadly, the early Grabar corpus, breaking the millennium-long domestic “radio silence”, does not testify to the linguistic landscape of pre-5^th-century Armenia – if anything, it obfuscates it behind a layer of orthogonal terminology. Notably lacking Armina derivatives, its key designators – “land of Hayots language,” Hayots ashkhar (world), Hayk’11 and related – are routinely read as endonyms cognate with “Armenia”, yet such reading rests on retrospective alignment rather than demonstrable equivalence. As mentioned earlier, in the absence of parallel external attestations, these terms cannot be securely mapped onto the known geographic or ethnolinguistic entities of the time.

What they do attest, however, is a distinctive common space structured by the language of the texts themselves. Authors consistently treat that space as single yet divided between competing empires. Nationalist tropes of a country-wide ethnolinguistic homogeneity are as anachronistic to late antique Armenia, as the nationalist agenda itself is to early medieval clerical writers. Without suspecting them of deliberately misrepresenting the routine reality to their contemporaries, a more reasonable explanation would be that they meant something different, similarly evident to the audience. In theory, it could have been a smaller, truly homogeneous entity within the country, like the same Ostan Hayots or its surroundings. The discourse of division, however, suggests a much larger span.

That said, none of the early Grabar texts is authentic to the 5^th-6^th century in physical terms: the earliest extant copies are dated circa the 10^th century onwards – and these are not exactly copies. Philological analysis revealed extensive editing in transmission, including:

erasure and rewriting in palimpsest (Agat’angeghos)
theological insertions and clarifications (Agat’angeghos, Eghishe, Buzand)
clerical or otherwise biased interpretative interpolations and amendments (Eghishe, Koriwn, P’arpetsi, Buzand)
legendary genealogy and ethnography additions, lacunae filling, chronology reworking (Khorenatsi, P’arpetsi)
linguistic/orthographic standardisation, glossary normalisation (almost all).

An interesting example of the latter is the appearance of Hayastan in copies of the 5^th-century texts. Now a standard endonym for “Armenia”, it was normalised by the late 1^st millennium, supplanting the Grabar standard Hayk’ in Middle Armenian. Crucially, it is a direct cognate of Ostan12 Hayots: from the original Iranian etymology, Ostan Hayots and Hayastan explicitly render “Hay province” and “Hay land/country”, respectively. The former is conventionally interpreted as the “domain/land of Armenian (Hayots) kings/crown”, citing derivatives like vostanik (“of royal court’, “courtier”, “noble”). The original meaning, however, in no uncertain terms indicates an administrative unit – out of all others – specified as “Hayots”. This warrants proper explanation, not glossing over.

It is worth noting that Greater Armenia’s designators were nahang – another Persian borrowing – for “province” (also called ashkhar), and gavar – cognate of the German “gau” – for “district.” Whoever designated that territory ostan was thinking in classical Persian administrative categories13, possibly back in the satrapy period. Around the time the Orontid satraps proclaimed themselves kings, their capital relocates here. The term could have fallen out of official use sometime under the Artaxiads - probably even Artashes I himself, known for rearranging the administrative division through his border stelae. In the new division, Ostan Hayots, hosting the new Artaxiad capital, emerges as gavar within Ayrarat nahang/askhar – the real crown lands – while apparently retaining what became its proper name. Later, the Arsacids build their new capital in the district, further upholding its metropolitan status.

To recap, a territory with a name stemming from the Persian administrative term came to host consecutive royal capitals. Later, it is redesignated using the new administrative terminology; the old term loses its respective meaning and becomes associated with the royal capitals. This is a typical case of reverse etymology, explaining the concept through a later meaning. Interpreting ostan as “crown land” or “royal domain” puts the cart before the horse: the meaning shifted from “province” to “court” precisely in consequence of the capitals located there.

Such toponymic scaling finds a distinct parallel in French history, where the name Île-de-France eventually expanded to define the entire country – retroactively at that. The divergence that the reverse etymology misses, however, lies in the lineage: the French kings emerged from the Germanic Frankish tribal confederation – hence the name. In contrast, the Armenian dynasts – Orontides to Arsacids – were ethnically Iranian, some Hellenised; while they might adopt an ‘Armenian’ identity via the Persian exonym for their realm, they were hardly Hays in the modern sense. In fact, they had more grounds to style their seat as Parsits (Persian) or Part’enyats (Parthian) than Hayots. Furthermore, by the time of the early Grabar corpus, these dynasts were gone for good, leaving Ostan Hayots to Catholicoi and Sassanid-appointed “margraves.” If none of these dynasts had reason to specify the capital district as Hayots nor were they even present to elevate it to a country designator, then how did the toponymical expansion truly happen?

In the light of stark disparities between Middle Armenian and Grabar, it is often purported that the latter took after a specific dialect peculiar to the metropolitan region, implying Ayrarat. Yet the largest of late-Arsacid provinces would hardly have been mono-dialectal or even monolingual across all twenty-two of its districts. One district, however, that acted as metropolitan, could - and the linguonym, Hayots, is in the very name of it. With peculiarities characteristic of Grabar itself – a deeply modified product of monumental work by scripture translators – the vernacular spoken in Ostan Hayots (though probably not limited to it) was likely not some obscure marginal form but a fairly generic dialect of Armenian. This connection remains conventionally ignored due to the posited country-wide monolingualism – a presumption no reputable scholars would support – which deems Armenian as confined to only a portion of it unthinkable.

This geographic specificity is reinforced by Koriwn – Mashtots’s biographer and primary source on Grabar creation – referencing the “Torgomean (Թորգոմեան) people” speaking the “Hayots language”. Torgom (Togarmah, grandson of Japhet, Noah’s son) was seen as a forefather of Caucasian peoples. Both Armenian and Georgian medieval chroniclers situated their peoples within this biblical genealogical framework; Movses Khorenatsi famously referred to the country as “House of Torgom.” That Koriwn distinguishes “Hayots language” as particular to one group within the broader Caucasian biblical designation suggests Hayots marked a geographically anchored community – consistent with Ostan Hayots as a specific province in the Southern Caucasus, rather than the entire Highland population.

Confessional frameworks are frequently labelled by the language of rite, irrespective of the ethnicities shepherded; Greek and Syriac are prime examples themselves. When the AAC employed Armenian, a language not “proprietary” to either Roman or Iranian ecclesiastical spheres, to forge its own liturgy, the linguonym would have naturally differentiated its flock – specifically from the Greek and Syriac Christian hierarchies the reform was meant to fend off in the first place. This flock was undeniably diverse – the conversion did not discriminate between the Arsacid subjects – encompassing, for example, the Iranian-origin royalty and nobility. A single rite, conducted in a distinct language and named after it, became their only true commonality, distinguishing them at the convergence of differing vassalage relations, political affiliations, religious adherence, and kinship ties14.

This statement is not a conjecture, but is corroborated by an adverse witness. Historically, the AAC itself was quite forthright in deeming Hay a denominational marker. Zoroastrian Armenians were labelled as such and associated with the Persians – or as heathens, apostates, and traitors. Those adhering to the then mainstream Christianity – Armenians and non-Armenians alike – were referred to as “Chalcedonians” (and anathemised). Later, Catholic converts turned “Franks”15 – reportedly reciprocating by referring to the Apostolic Armenians as Hays. And Islamised Armenians became simply “Turks.” Under Ottoman rule, notably, even the Armina-derived demonym came to designate a confession – Millet-i Ermeniyân, “Armenian people” – “people” as an Islamic religious category (millet). For almost fifteen centuries, Hay – and, by association, “Armenian” – marked a religious community, not an ethnos.

Through this lens, the phrasing of early Grabar texts becomes historically consistent: Hayots ashkhar emerges as a denominational community distancing itself from other confessions – including broader Christianity – and the “land of Hayots language” as a canonical territory. The neglect of “Armenia” as an external administrative designator is even more understandable against the backdrop of three ‘Armenias’ at the time – a Persian and two Roman16 ones – while the concept of Hayq’ traversed them all. This is crucial: the surviving corpus was not written as an impartial chronicler’s testimony, even when titled as “histories”, but ecclesiastical edification. Clerical authors were projecting the intended vision; later editors updated it to reflect its eventual realisation.

Following the assertion of its canonical territory at the 484 Treaty of Nvarsak, the AAC enjoyed a period of largely unimpeded entrenchment lasting until the Arab conquest in the mid-7^th century, although not completely unchallenged. As the Sassanid authority declined, Greek influence in the western regions grew, and even heathen pockets reportedly survived in the southern regions as late as the Seljuk rule. Be it as it may, the AAC was recognised by the new overlords as representing the Armenian Apostolic people (millet) and granted tax exemptions. The earliest evidence of Grabar discourse independent of the AAC, albeit still not quite secular, dates to this very period: the heterodox movements of the Paulicians/Tondrakians, fiercely persecuted by the official church, also used it.

Geographically, though, the boundaries of Ostikanate Arminiya (Emirate of Armenia) Arabs carved by the mid-700s, encompassed not only the Greater Armenia territory but also Iberia (Georgia) and Caucasian Albania, all the way to Derbent. The Bagratid Kingdom that emerged with the waning Arab hold of the country is habitually referred to as Greater Armenia – being, however, far from the original Artaxiad or later Arsacid realms. Indeed, there were four other Armenian kingdoms; some of them recognised the Bagratid suzerainty only nominally, with one, the Kingdom of Vaspurakan, warring against the Bagratids, while also hosting the AAC seat for a time. Discrepancy between the political, ecclesiastical, and geographical designators became even more pronounced. Which one of them could “Armenia” be related to, except collectively in retrospect, when all were gone? That is, by the time the name of former metropolitan district, normalised as Hayastan in texts since the 10-11^th century, has become synonymous with “Armenia” in common use.

As speculative as it may seem, this scenario actually revealed itself in a special case of Caucasian Albania. While Albanian history has more gaps than Armenian overall, this case, by chance, presents some evidence throughout the process: the pre-existing condition, the pivotal points, and the outcome. Caucasian Albania thus presents a “control group” within a regional continuum affected by the political-ecclesiastical transformation of the mid-1^st millennium CE.

The population was a tribal union rather than a single ethnos, consolidated by a political superstructure. They had, however – apparently, as a lingua franca – a now attested indigenous Caucasian language, Old Udi, related to Lezgian. The country was converted shortly after Armenia, through the AAC; the then king Urnair was similarly challenged by Sassanid pressure and followed suit after the Roman-backed Tiridates III. Initially, the scripture was translated into Old Udi, using the alphabet reportedly developed by the same Mashtots; incidentally, the source was an Armenian translation17.

The first pivot occurred in the aftermath of the Treaty of Nvarsak: Vachagan III Pious, an Arsacid king, brought the Church of Caucasian Albania under the ecclesiastical umbrella of the victorious AAC. Four years after the Nvarsak, at the Council of Aghuen, it was aligned with the Armenian rite, canonical framework and hierarchy. At the First Council of Dvin in 506, Armenian, Georgian, and Albanian churches stood united in their opposition to the Roman “Chalcedonians”, distancing themselves from both Rome and Iran.

A century later, the second pivot happened when the Georgians deemed it more expedient for their interests to side with Rome. This provoked a deep schism in the Southern Caucasus; some part of the Albanian Christians chose to follow the Georgians, excommunicated by the AAC, thus creating a “control group within control group”. The majority of the Albanian elite and flock retained the Armenian rite, now sharing the same ecclesiastical isolation with intermarriages banned.

Within that seclusion, Grabar, as the liturgical language of a larger and still influential denomination, gradually relegated Old Udi to a secondary status. That, coupled with intraconfessional marriages, gradually led to the erosion of local ethnolinguistic identity and its absorption into a broader Armenian one. Likewise, the country’s name was first scaled down to the part of it south of the Kura River18, once under Armenian control, and not long after, dropped completely.

The Catholicosate of Caucasian Albania, however, as an at times autonomous exarchate within the AAC, persevered until 1836 when the final pivot happened. Under the Russian Empire, eight years into Russian rule in Eastern Armenia, the AAC was integrated into the imperial governance system in a manner similar to the Ottoman millet. An Orthodox monarchy, not interested in multiplying fringe Christian entities, reduced the Catholicosate to the Karabakh diocese of AAC. The historical toponym was finally phased out.

Today’s Artsakh Armenians, the descendants of diverse Apostolic Albanians, not only do not identify as such but also believe Vachagan III Pious, who put the assimilation in motion, to be an Armenian king. Conversely, the descendants of Orthodox Albanians – modern-day Udis – though few in numbers, retained their distinctive identity and language, living to this day on ancestral lands north of the Kura River. These two “control groups”, juxtaposed, vividly demonstrate how the “single language” factually emerged, shaping reality along the way – retrospectively at that.

The Problem of ‘Hidden’ Substrate

Majority Language

Since Urartu, with its distinctive language, became an established – though still rejected by some – historical reality by the mid-20^th century, the Urartians have been reconciled with the conventional narrative of Armenian history as a narrow elite group that reigned over the majority Armenian-speaking population. The scarcity of Urartian inscriptions, disappearing post-Urartu, seemed consistent with that, matching the cases of the Hittites, Mitanni Indo-Aryans, and other vanished elites. The limited number of Urartian loanwords in Armenian – initially estimated at around 200, eventually revised downward by an order of magnitude – supported the same, it would seem.

A parallel line of Hurrian studies, lagging by three-four decades, culminated in the 1970s with a proof of relation between Hurrian and Urartian, thus making them a language family of their own. Hurrian, crucially, is not that easy to fit in the same way. Though original texts survived in limited numbers, it has been attested from Asia Minor across the Armenian Highlands to Northern Mesopotamia, traced through linguistic contacts with Hittite, Hatti, Ugaritic, and Akkadian. The Armenian urag, “adze”, is considered an Akkadian borrowing mediated by Hurrian. The Hurrian presence in the region had been recorded since the late 3^rd millennium BCE - long before Urartu and spanning to the end of it.

After the fall of Mitanni in the late 14^th century BCE, parts of its Hurrian population retreated north into the Highlands, a natural bulwark against Assyrian expansion. Assyrians left multiple records campaigning in the ‘’Land of Nairi”, though never managing to subjugate it and possibly catalysing the Urartian consolidation. The Hurrian heartland to the South-West of Lake Van had been breaking away from several empires in the course of the late 2^nd-early 1^st millennium BCE as a state of Alše/Alzi before its absorption into Urartu under king Menua – to become the Armenian province of Aghdznik (Arzanene) later. The post-Mitanni rump state of Shubartu/Shupria, south of Lake Van and Northern Mesopotamia, had been a buffer between Urartu and Assyria before falling to the Medes, who crushed them both.

In other words, the Hurrians inhabited the highland heartland south and west of Lake Van, not as some fringe tribe, but a core population spread across empires and in-between. They surely were not a homogeneous population, but the language they spoke, possibly another lingua franca, is considered well-presented across their area and beyond, overlapping and cross-pollinating with the neighbouring linguistic zones. Once hypothesised association of certain entities, like Alše/Alzi and Shubartu/Shupria, with purportedly proto-Armenian groups referenced in Assyrian records as Arme/Urme, was based on the “Balkan (Mushki) migration” theory of Armenian ethnogenesis. With the latter effectively debunked by paleogenetics and the Kurgan/Yamnaya theory of Indo-European-language origins, the rest hinges on phonetic similarity, unsupported by the corresponding linguistic evidence, which is Hurrian.

Furthermore, the comparative linguistic analysis today holds that Urartian not only did not descend from Hurrian but might be as old, having developed from the same source and retaining archaic features dropped by the latter. This suggests that Urartian had already existed by the time of the first Hurrian attestations in the early 2^nd millennium BCE, with the two languages diverging around 3000-2500 BCE from a common proto-language dated even further back. As a linguistic isolate, the Hurro-Urartian family was indigenous to the Armenian Highlands – specifically the area around Lake Van, though not necessarily limited to that – long before the Hurrians expanded into Northern Mesopotamia and Asia Minor.

The claims of “majority Armenian-speaking population” of Urartu are therefore not only unfounded – they have never even been founded, only posited – but untenable against the evidence. Urartian might have been the dynastic mother tongue, indeed: the royal inscriptions switched to it around the reigns of Sarduri I19 and Ishpuini in the late 9^th century BCE, along with Biainili as the country name – replacing Assyrian (dialect of Akkadian) and Assyrian-coined Nairi, respectively. If Urartian was not the majority language – provided there was one – the next best candidate is Hurrian. Alternatively, the population could have spoken a range of related languages; associated with the Kura-Araxes culture, Hurro-Urartian proto-language – or variants of it – could have spread as far as the Southern Caucasus.

After Urartu – naturally under the new Persian overlords – royal inscriptions in Urartian disappear, though so do any other-language ones, until the Artaxiads. Yet the Hurro-Urartian-speaking heartland population did not. Probably reverted to its pre-consolidation distributed tribal state, it was seemingly active, staging revolts against the Achaemenids – who dispatched an ‘Armenian’ general Dadarsis to quell one. Notably, after the defeat in the 331 BCE Battle of Gaugamela, the Orontid satraps abandoned the old Urartian capital Tushpa for the former Urartian outpost, Argishtikhinili fortress, located in the land previously known as Etiuni, Ostan Hayots to be.

As mentioned earlier, that land - and those parts of the Southern Caucasus in general - is now seen as the probable cradle of Armenian. Here, since the Trialeti-Vanadzor period starting in 2500 BCE, the seeping-in Steppe culture had been merging with the aboriginal Kura-Araxes one. The settling-down nomadic migrants were adopting sedentary cultural practices; some of those the aboriginals had developed were distinguished in the region, among the oldest cradles of civilisation. The Armenian Highlands had been prominent for irrigation, metallurgy, and viticulture and winemaking. Adapted to their environment, locals have mastered the ways of utilising the mountains and subsisting with risky agriculture.

Technology transfer is generally accompanied by terminology borrowings; a clear example of that is a layer of nautical terms in Russian, adopted wholesale from Dutch shipbuilders and seamen employed by Peter I the Great to create the Russian navy in the late 17^th century. That was just one industry; the Steppe settlers in the Highlands had to adopt most of the sedentary economy, environment - both natural and anthropogenic - and lifestyle. It is to be expected for their language to reflect that; this mechanism is well-known in linguistics.

One of the most striking peculiarities of Armenian is the extraordinary high share of words of uncertain and undefined etymology – the majority of the corpus. A typical Indo-European language has 5-10%; Armenian has the entire etymological structure almost in reverse:

inherited PIE: <10%
established borrowings (Indo-Iranian, Greek, Semitic, etc.): 37-39%
- of those Hurro-Urartian: 0.2-2%
uncertain: 22-24%
undefined: ~30%.

Numbers fluctuate due to the ongoing debate; the key points of contention are PIE, Hurro-Urartian, and unknown etymologies. Essentially, it is a “tug of war” between the PIE proponents and others; the amount of effort put into stretching an Armenian word over a PIE root, however semantically remote and cognitively irrelevant20, is striking. Even more so, in the light of another anomaly – the negligible share of Hurro-Urartian loanwords, compared to borrowings from other language contacts, by the time Grabar emerged:

Indo-Iranian, the most intense contact for about a millennium – 14%,
Semitic, comparably protracted though less intense contact – around 6%,
Greek, the shortest and most limited contact – about 2%,
Hurro-Urartian, which must have been the closest contact longer than Indo-Iranian – the same 2%, and even that is being subjected to reduction.

All that – compared with the 52-54% unknown. Linguists are aware of this being indicative of substrate; Hrachya Ačaryan was the first to propose it, considering a Hurro-Urartian source, but he did not yet have the lexical corpus to compare against. However, the modern discourse revolves around a ‘hidden’ substrate or even another deus ex machina - an underlying ‘lost non-Indo-European language.’ If the substrate is obvious and the historical background largely established, then why is it hidden and the source lost?

Survivor’s Bias

Ačaryan’s corpus contains almost 11000 lemmas. The unearthed Urartian inscriptions contain some 500 words, of which around 250 have been translated, not all unambiguously. The 12500-strong Hurrian corpus is comparable to the Armenian one, but of them, only about 400-600 words are “understood” – and only 110 with “high confidence.” In other words, the referenceable Hurro-Urartian corpus is 800 words at best - and it is among them that nearly 6000 Armenian unknowns are sought.

It is a textbook case of survivor’s bias: drawing conclusions based on what survived. From that perspective, occasional matches of 6000-from-800 found, however limited, are the outcome of millennia-long linguistic contact with at least partial bilingualism. Yet those matches - rather, the vast absence thereof - are not in any way occasional: neither the reference corpus, nor the substrate is representative of the total lexicon.

The Urartian royal inscriptions often qualify as formulaic, with limited repetitive vocabulary. Some invoke that to argue Urartian as a dead, ceremonial language, not used in real life - thus promoting the thesis of ‘majority Armenian-speaking population’ of Urartu. Were that criterion sufficient, any official language today implying a chancery style would be ‘dead’ as well. Which is true, in some sense: the general populace does not usually speak in solemn or bureaucratic styles limited to stately discourse. The flaw in this reasoning is the inverted logic: one should not expect substantial matches between a comprehensive living vocabulary and a limited formal one, let alone dismissing linguistic relations on that ground.

The Hurrian corpus, altogether, is substantially larger than Urartian, potentially amounting to a fully operational lexicon, thematically. The translated fraction, however, was sourced from a limited number of bilingual texts that happened to be mostly myths, rituals, and other temple writings; the Kikkuli horse-training manual in Hittite with Hurrian terminology inclusions is a notable exception. Similar to Urartian royal inscriptions focused on state building and warfare, Hurrian temple tablets are skewed towards cosmology, religion and related topics. And yet, the smaller portions of both corpora, related to economy, environment, and nature, provided the majority of Hurro-Urartian roots in Armenian.

The substrate structure mirrors these regularities, flipped. AI-assisted estimate demonstrates the highest concentration of substrate words in the following thematic domains:

Agriculture, including viticulture: 22-25%
Topography, hydrology and landscape: 21-23%
Wild flora and fauna: 18-20%
Built environment and architecture: 13-15%
Metallurgy and mining: 8-10%,

with topics like cosmology and religion, social organisation and governance, family and kinship, anatomy, abstract concepts, etc. oscillating around 5%. More granular analysis of sample machine-readable entries from Ačaryan’s corpus21 reveals that the domains of substrate concentration are also most substrate-heavy:

Flora and fauna: 73%
Mining and metallurgy: 57%
Built environment and public infrastructure: 50%
Topography, hydrology and landscape: 50%
Crafts and tools: 44%
Agriculture: 44%.

These numbers do not present the actual share of substrate words per domain across the entire lexicon22. Still, they match the expected pattern – and so does, within the same sample, the original PIE-inherited vocabulary, which is substantial in the following domains:

Anatomy and basic bodily activities: 61%
Abstract concepts: 48%
Cosmology and religion: 44%
Agriculture: 31%
Family and kinship: 29%
Topography, hydrology and landscape: 27%.

Anatomy, abstract, and kinship terms comprise the basic lexicon of any language. Generic landscape terms develop early; some cosmology and religious concepts generally do too. This vocabulary is supposed to be inherited in large part, if not completely. The share of PIE kinship terms in the analysed sample is diluted by Indo-Iranian, which, apart from research bias, might reflect Iranian familial connections, if confirmed in larger samples.

The high share of PIE agricultural terms requires differentiation per branch: agriculture is not entirely a sedentary industry. Animal husbandry, particularly, shows strong PIE presence, whereas farming, especially viticulture, is heavily dominated by the substrate and known Hurro-Urartian loans. Some cases warrant even finer differentiation: the word for “bee” derived from “honey” is PIE-inherited, but for “beehive” is an unknown, same as for “cow” and “cowshed”, respectively – which may indicate diverse cognition paths through gathering, nomadic, or sedentary economy.

Combined, this distribution further shapes the pattern: substrate words not only cluster around the daily practices of settled life – which is to be expected – but also tend to make up the core of respective vocabulary. This suggests not occasional borrowing but rather adoption in bulk, in the course of cultural transition. The source that could provide such a substrate should be a language:

of non-Indo-European origin
present in the Armenian Highlands during the formation period (2-1^st millennium BCE)
with a lexicon comparable in size – 6000 words or more
containing a well-developed vocabulary covering the highland economy and environment.

Living language families meeting all four requirements – Caucasian and Kartvelian – have been extensively tested against Armenian substrate without conclusive results – despite having sizeable speaking communities, better preservation, more comparative data, and elevated scholarly attention. Indo-Iranian – apart from being Indo-European – has well-attested loanwords representing elite-layer borrowing, not deep substrate. As for the ‘lost non-Indo-European substrate’, it is long overdue to be eliminated from serious consideration: this is a matter of what is probable evidentially, not possible theoretically. Positing such a source presumes several simultaneous conditions:

emergence of yet another linguistic isolate - a separate family, effectively
localisation precisely between Hurro-Urartian, Kartvelian, and Caucasian language zones - in Trialeti-Vanadzor or Lchashen-Metsamor area
long enough presence to develop a sophisticated vocabulary matching the cultural practices
massive impact on Armenian
zero detectable influence on neighbouring languages
invisibility to ancient sources
complete extinction without any other attestation.

Any of that is possible; all of them coinciding is next to improbable. ‘Lost non-Indo-European language’ requires an implausible conjunction of seven specific rare conditions; the Hurro-Urartian source, having all the prerequisites, only misses a sufficient translated corpus. Linguistics may keep tracing one unknown to an even less known, but the Bayesian probability undoubtedly points to Hurro-Urartian.

Highland Creole

To put things further in perspective, it is worth reiterating that Armenian had long been a vernacular never attested in elite use until the 5^th century CE. As previously shown, its vocabulary for elite-associated topics, e.g., religion, governance, warfare, and trade, is significantly borrowed. Quite possibly, these loanwords were standardised along with the formation of Grabar, having previously been used either in the original linguistic discourses or as foreignisms in Armenian. If we exclude, then, the borrowings from languages that must have come into contact with Armenian via elite use 6^th century BCE onward, from Aramaic to Turkic, the remaining lexicon should, by and large, present the basic spoken Armenian pre-standardisation. That lexicon is quite telling:

roughly 17% inherited PIE versus 83% substrate, Hurro-Urartian, and occasional Akkadian loans (probably mediated by Hurrian)
centred on topics like abstract concepts, anatomy and basic activities, family and kinship, agriculture, metallurgy and crafts, built and natural environment.

It is a living language of the general populace – farmers, artisans – covering all of mundane life, but not so much religion, governance, warfare, or trade.

Furthermore, Hurro-Urartian languages are explicitly agglutinative. They do not use grammatical gender. Some Hurro-Urartian suffixes, notably -ni, are proposed to have been absorbed into Armenian, albeit with a certain functional shift. As an attested linguistic contact of the spread, continuity, and sociolinguistic conditions necessary to effect structural changes, they are the primary candidate source of agglutinative features in Armenian. So, what does all the above tell us about Armenian?

We see a language that:

is a massive substrate operating within a modified Indo-European structure with non-Indo-European features
contains a limited Indo-European core vocabulary supporting basic communication
shows signs of bulk-adopting notions associated with the localised sedentary culture, covering diverse aspects of everyday life
has been in sustained and intense contact with an indigenous Hurro-Urartian language family
has borrowed the most from Hurro-Urartian family in substrate-specific thematic domains, in which that family lacks a sufficiently referenceable corpus
developed structural features similar to both families – some deliberately modified – without directly replicating them
lacks grammatical gender, which is characteristic of the Hurro-Urartian family but atypical of the Indo-European one
is attributed to the Indo-European family - as a convention, rather than unreserved classification - with its atypical features sparking debates on the matter, coming to proposals of re-classification as “Indo-Europeanised”.

This is a language that has gone far beyond hybridisation – more a synthesis than a hybrid – a creole avant la lettre.

Creolistics emerged through the studies of modern-era creoles, developing a methodological framework tailored to the specifics of their evolution. Vis-à-vis that framework, Armenian fails against crucial criteria: indeed, even the superstrate language cannot be attested, much less its pidgin phase. But then again, so does any phase of the Armenian’s evolution until the late 1^st millennium CE – simply being outside the recorded history. The conventional creoles are a legacy of colonisation era overlapping with the era of printing. At the time of parchment and scribes, or clay and chisel, writing was resource-intensive and not nearly ubiquitous; a vernacular would have rarely been recorded, though in passing, let alone a pidgin. Does the lack of printing justify the absence of creolisation back then, or is creolisation an exclusively ‘colonial’ phenomenon somehow?

Every cause is an effect of preceding causes; the process of linguogenesis may be primary for a creole to emerge, but, from systems standpoint, still derivative to the prerequisites determining it. The historical circumstances of colonisation suggest a distinctive set of conditions to produce specifically a creole, thus shaping a socio-linguistic scenario of “balanced imbalance”:

native majority meeting alien minority
mutually unintelligible languages with no shared intermediate
power asymmetry, with aliens in a domineering position – but
no forced assimilation or population replacement
mutual dependence: natives forced to cooperate, aliens forced to rely on them for local expertise
need for daily communication
alien language lacking local concepts, with natives’ exposure to it limited thematically
prolonged contact.

Could this scenario take place somewhere before 1492? Based on evidence from the Trialeti-Vanadzor and succeeding Lchashen-Metsamor cultures, it basically aligns with what was happening in the Southern Caucasus23 since 2500 BCE. An outlier within the Indo-European family, Armenian has already been proposed as creolised yet missing a demonstrable creolisation process. Stripped of the circumstantial limitations of creolistic methodology, inapplicable to a language as old as Armenian, the evidence points toward that type of formation: a new linguistic system emerging from sustained contact between Indo-European-speaking incomers and Hurro-Urartian-speaking indigenous populations in a multilingual environment. The outcome displays all the characteristic features of contact-driven restructuring, and the Indo-European core vocabulary, missing the grammatical gender, could have well functioned as a pidgin.

Armenian was not necessarily creolised by Hurrian or Urartian per se, which were not directly attested in the north-eastern Highlands until Urartian expansion; creolisation must have begun earlier. But it could have been a dialect of these or another proto-Hurro-Urartian derivative present in those parts, which served as a bridge to further synthesis. Since the full structure of vernacular is not transparent in Grabar, it is also possible that creolisation went on as the Armenian liturgy embraced the population, revealing the final product only in Middle Armenian. Whatever the dynamics, the Indo-European/Hurro-Urartian creole is the most parsimonious conclusion of stated facts.

This could also shed some light on certain aspects of language genesis and spread. Reconstruction from a stripped-down PIE core – reflecting both parents but replicating neither – would better explain the massive lexical and structural divergence than protracted development from a full PIE system. The relatively rapid proliferation of Armenian, despite the issues with Grabar intelligibility, would be less confusing if the language that liturgy helped promote was not Grabar but a popular creole much closer to the indigenous Hurro-Urartian. Those intelligibility issues would themselves become more explicable if Grabar is seen as wrapping familiar roots into modified fusional morphology, while adding an extra 50% or so of foreign words. The language chosen to become liturgical would appear as not a fringe vernacular, but a close lexical relative to the one spoken by many yet already having the structural foundation to be aligned with Greek – a linguistic adapter.

Similar processes have occurred later in the history of Armenian. Several regional Armenian dialects—notably the now-extinct Mush and Agulis (Zok) vernaculars—demonstrate a creoloid development. Rather than mere lexical hybridisation, an Armenian lexical core was structurally modified through intense and prolonged contact with Turkic and Indo-Iranian adstrates, creating a synthesised grammar or phonology that mirror the functional logic of its neighbours – without replicating it. Granted, these examples do not represent the standard language – and neither did Armenian, before Grabar.

Conclusion: Looking Where Lost, Not Light

The most extraordinary discovery here is that none of the above is new. The substrate was suspected all along; its distribution is clearly consistent with established linguistic regularities, and translations of Hurro-Urartian texts have seen little progress in decades. Borrowings from later elite languages are long accounted for, and whatever PIE inheritance could have been securely established – without phonetic gymnastics – was mostly provided by Ačaryan a century ago, with increasingly dubious additions since. The historical context, though distorted by biased interpretations and retrojective overlays, is nonetheless known to academia. Even creolisation has been suspected. And yet, scholars regularly embark on another quest for PIE roots, as if the lack of referenceable Hurro-Urartian corpus somehow pointed in that direction.

The problem seems to be threefold. First, there is a political agenda interfering with the research, if not directly, then through underlying incentives. With ethnic states, as with religion, the language is politically weaponised, both for alignment and differentiation. In case of Armenian, promoting a continuous ethnicity narrative often bears an added motive of positioning Armenian as the source of the Indo-European family. A massive substrate that is non-Indo-European, let alone a creole, cannot serve as one. Ironically, since the Kurgan theory, the Hurro-Urartian family – on par with the PIE, undeniably indigenous, authentic to the Armenian Highlands and reflecting their unique civilisational roots – could serve any primordialist narratives the best. This is a damning testament to how damaging the box-minded clerical-nationalist conservatism can be – even to its own agenda.

Second, the scientific knowledge cannot be reliably grounded in loose methodology. Sound laws supported by bloated case sets of questionable validity (e.g., derivatives of the same root), remote semantic relations, and cognitively implausible mapping of concepts clutter the field, creating an illusion of equally acceptable alternative answers where none actually exist. There should be stricter methodological guardrails against that.

Third, the overall paradigm – arguably the biggest concern. Reshuffling the initial linguistic data will hardly add much value, unless new clues are unearthed. Without that, the reasonable choice would be abandoning the frequentist logic for Bayesian, thus relying on a broader set of inputs beyond linguistic, and switching to a proactive, rather than reactive, analytical approach.

Assuming that the substrate is Hurro-Urartian, unless proven otherwise – if and where proven – can align the effort in a more productive direction, out of this quagmire. Instead of going down the beaten PIE track once more, as archaeology has not found another bilingual artefact, one could try using Armenian as a key. Given the background burdened by problematic scientific integrity, one would do better to turn to large language models. With all their limitations, two things they indisputably excel at are semantic analysis and pattern recognition - which is exactly what this job entails.

Provided the digitisation of Ačaryan’s corpus and other necessary sources as soon as possible, AI-assisted analysis could generate valid phonetic and semantic regularities, reverse-engineer probable substrate originals, and decode Hurro-Urartian scribal conventions for sounds Akkadian cuneiform could not capture—extracting signal from noise without the biases that have plagued the field. This approach would provide the searchlight to look for answers hidden in the dark.

It is not guaranteed, of course, for every substrate word, even reverse-engineered, to hit a match in the Hurrian corpus. As mentioned, the immediate contact for Armenian could be a related language or dialect confined to the North-East, whereas Hurrian had developed in close contact with other languages for over a millennium; loanwords from these might have replaced the originals. Urartian, as a more isolated language preserving the proto-language legacy, could have been a better match if a more representative corpus of its words came to light over time. Whatever the result, such an endeavour could, if nothing else, give a better idea of what the native language of the Armenian Highlands was.

Except for Church Slavonic, which emerged from a regional dialect that had no written form but was nevertheless attested by contemporaneous evidence and linguistic analysis as spoken in that region.

All of them being Iranian-origin dynasties, notably.

E.g., a general with a Persian name “Dadarsis”, referred to as “Armenian”, assigned by Darius I to quell a revolt in Armenia.

Assumed the earliest Christian-era source, which itself survived only in much later, altered copies.

The Artaxiad dynasty was founded by the Seleucid Persian general Artaxerxes, who declared himself king (known as Artashes I) a decade after conquering Orontid Armenia.

Except for the South-Western provinces – satrapies, which retained for a while the semi-autonomous status they had enjoyed in earlier instances of Roman control.

This was not so much a religious as a political and administrative move, aimed at unifying the empire’s legal framework, with priestly hierarchy acting as the juridical and judicial backbone.

Based on a sample analysis of 263 machine-readable entries from H. Ačaryan’s Armenian Etymological Dictionary.

The gradient here marks the trend, not an even linear distribution. In a mountainous country, pockets of a different language or fringe dialects could have existed in any part of it, regardless of the general linguistic landscape.

It is hypothesised that the “Kurd” ethnonym comes from the southernmost district of the Greater Armenia – Corduene, which suggests the Iranisation of the indigenous Hurro-Urartian population.

Literally, the plural form of Hay.

Borrowed Middle Persian ostan – “province”; cognate with stan – “land/country”.

Modern-day Iran consists of 31 ostans.

St. Gregory, his grandson Sahak Part’ev, and Mihr-Narseh, the Grand Vizier of Iran during the Vardanants War, all came from the same Parthian House of Suren.

Same as the crusaders were called in the region, and still used every now and then colloquially.

Armenia Prima and Armenia Secunda; Romans later added two more.

As the linguistic analysis of the Sinai Palimpsest suggests.

As Hay-Aghuank’, notably, revealing the gradual “hijacking” of toponym (cf. Media Atropatene – Atropatene). Granted, that part had been annexed by the Greater Armenia from the Artaxiad conquests until the first partition of Armenia, but one of the two provinces established there carried a telling name Utik’ – “Udis.”

It remains uncertain if Sarduri’s father, Lutpiri, was a king or related to Arame, the first Urartian king; if not, Sarduri effectively started a new dynasty.

A telling case of cognitively irrelevant etymology is deriving yerkat’, “iron”, from either Armenian for “long” or PIE for “hard.”

Which is not available digitalised in full to this day.

And not only there: Norman barons in Saxon England are another obvious example.