NO GRAMMAR, LEXICON 0R ARTIFICIAL INTELLIGENCE WITHOUT SYNTAX

Both language folklore and some linguists share the assertion that words came some time before grammar, both in an evolutionary and synchronic sense. For the latter we know from native language acquisition that children first seem to utter words and repeat them and eventually combine two words that seem to have a rudimentary grammar, like ‘daddy car’. In bio-linguistic theory, however, this is not a case of rudimentary grammar but a case of very elementary grammar, named by Chomsky as MERGE, i.e. to combine two words to make a phrase. In other words, a child cannot combine two words to make a phrase without first having a grammar in situ.

Such a theory is supported by the observation that primates and other so-called intelligent animals can be trained to recognise and use single words but are unable to combine them to construct meaningful phrases, let alone use iteration to combine phrases to construct sentences – a facility that only humans have, namely via the language faculty in the human brain.

In terms of evolutionary, bio-linguistic speculation (or best educated guess/theory), the MERGE operation might have e-merged (sic) as a result of a brain mutation some 150,000 years ago. While single words might have preceded this event, there would have been an explosion of new words following the MERGE operation, i.e. a parallel development of syntax and lexicon. Once lexical-syntactic categorisation, e.g. verbs and nouns, had been established, the door was open to fill the slots. The constraints of vocalisation limited lexical output, and as various regional differences emerged, various techniques were employed to circumvent such limitations, e.g. via polysemy and/or use of tones. Equally as regional differences led to language diversity, various lexical items could become grammaticized, higher level grammatical rules could evolve in different ways, and language change further complicated the history of language evolution. Once various languages were ‘reduced’ to writing systems, we enter the linguistic landscape we live in today.

Going back to child language acquisition, we know that there is a rapid development of complex syntax so that by the age of 5 or so, a fully developed syntax – say English grammar rules – is in place. The extent of the lexicon depends on other factors, such as formal education and highly specific terminology needed for certain professions, e.g. medicine, physics, engineering, literature (Shakespeare is said to have used over 20,000 different words in his published works). As such a finite syntax together with a finite lexicon can generate an infinite output of sentences. An analogy is that of the mathematical axiom that establishes numbers, having an infinite output – noting that syntax is the precursor to that feat.

Since it is claimed by AI enthusiasts that generative AI via Large Language Models can or will operate on this human level, one can easily see a certain fallacy: while GAI/AGI can recombine words and phrases from an ever-increasing data set, and potentially generate new, i.e. never before used expressions, the output limitation is as finite as the data set. Only human syntax is able to generate truly creative, novel expressions, independent of all previous language use. This is not to deny the many potential positive uses of GAI, e.g. searching through enormous data sets of existing medical knowledge, to quicky arrive at a therapeutic solution. However, only the human interface can decide if it actually will work, possibly having an additional creative thought that will be so innovative as to constitute a brand-new extension to the old model. The dangers lie in the false advertising of GAI, in being able to solve all problems put to it, especially when reduced to rational concepts only. Given all the irrational components of human thought/language that are also part of the Large Language Model, we can only expect equally irrational outcomes, however much human censors at the programming level want to weed out the worst excesses – given that for some unknown reason an avalanche of new irrationality is added every second of the day. The idea that rational thinkers will combine with GAI to evolve some sort of singularity, as promoted by the likes of Kurzweil, and replicate paradise on earth, is as romantic as it is unfortunately unrealistic. An instance of GAI hubris is that of another GAI guru, Aschenbrenner, who seems to advocate a new Manhattan Project to develop a GAI tool to defeat the CCP. They are missing the point that the definition of ‘intelligence’, whether human or artificial, is based on generative syntax, and as such AI may never truly eventuate.

While there have been attempts to develop syntax-like computer programs to generate language, there has been little progress due to the complexities involved. The statistical/algorithmic methodologies developed by Mercer et al. have been very successful commercially in the meantime, but as they are totally different from human syntax, we only end up with finite translation and recombination machines, cutting out any sort of creativity, any sort of genuine progress, thus condemning us to an eternal status quo.

Since every major language on earth has been described in term of its grammar, often based on traditional models that hark back to ancient Sanskrit, Greek and Latin, one has to assume that, by and large, the core syntax described does reflect the psychological realties of the languages so described. Often there is an exaggerated division between theoretical models – still largely focussed on the cons and pros of Chomskian-style biolinguistics – giving rise to the so-called Linguistics Wars as promoted by the likes of Harris, culminating in a side show that has very little scientific merit. Claims that language arises from general cognitive principles are also popular, negating the necessity of specific syntactic principles. Even so we always seem to end up with certain rules resembling syntax that determine what makes language use comprehensible and what not. One has to be careful here to distinguish between what is generally an un-grammatical sentence construction rendering it incomprehensible (e.g. ‘Not mouse cat was today catch’) and irrational nonsense that is nevertheless expressed in perfectly acceptable, i.e. grammatically correct expressions (cf. Chomsky’s famous grammatically correct sentence ‘Green ideas sleep furiously.’). This echoes de Saussure’s well-known distinction between langue (structure, syntax, language competence) and parole (speech, language use). As such syntacticians (grammarians) have no particular interest in the use of language other than analysing unusual, complex sentences (they find or make up themselves) for their structural components yielding a measure of linguistic competence. This always seems odd for some linguists and language folklorists who are wedded to the idea that language equals communication, i.e. only the principles of communication determine the way we use language. That this is a patently one-sided approach can be seen in the commonsense observation that language is a tool for expressing thoughts, which may or may not be communicated to anyone else but oneself – the latter, BTW, points to one of the most interesting issues in syntax, namely reflexives. Hence I would refine Chomsky’s dictum that ‘anaphora are a window into the mind’ to ‘reflexives are the window into the mind’. Such thoughts, expressed in language, only seem to strengthen my assertion is that language equals thought. QED (no references needed).

wolfgang b sperlich - OLD but NEW

HI

Monday, July 22, 2024

NO GRAMMAR, LEXICON 0R ARTIFICIAL INTELLIGENCE WITHOUT SYNTAX

No comments:

Post a Comment

Followers

Blog Archive

Popular Posts

Translate