COMPUTATIONAL LINGUISTICS à la R. Mercer: GROSS AND CRUDE
Chomsky’s 1959 review of Skinner’s Verbal Behaviour saved us
from ‘gross and crude’ behaviourism in linguistics, if not in psychology
itself. In fact, behaviourism in the world of business (marketing and
advertising) still rules supreme, and if not checked will lead to neo-fascist
models of behaviourist manipulation, as Chomsky also warned.
It is my contention that this threat to human civilisation
has been further exacerbated by what one can call either an extension of
behaviourism or else a new development occasioned by computational linguistics.
Initially popular science was enamoured by the idea that language can be
compared to the computer in terms of the human brain being some sort of hardware
which can be programmed by some clever software. The software in question would
have to be something like Chomskyian parsing programmes, embedded in Artificial
Intelligence, with the ability to acquire language like children do. Given the
slow progress in this seemingly impossible task, this raised the ire of the
business community that wanted results so that language could be commercialized
– in combination with military applications of course.
The enfant terrible in this case, unlike a somewhat benign Skinner
before him, is one Robert Mercer, who not only subverted computational
linguistics but also made a fortune from it and now bankrolls the likes of
Trump and Bannon. The story is described somewhat diffidently in a Guardian
article subtitled ‘With links to Donald
Trump, Steve Bannon and Nigel Farage, the rightwing US computer scientist is at
the heart of a multimillion-dollar propaganda network’.
Mercer, a non-linguist, had the brilliant idea that voice
recognition and machine-translation can be achieved by simple statistical
matching: when you say ‘hello’ when you phone your insurance company about a
claim, the voice recognition program immediately constructs a digital
oscillation and compares it to a stored model recorded by an average speaker,
and if there is a match within an allowable range, the computer program accepts
your ‘hello’ and then responds with a phrase that has a high statistical value
in the context of an insurance claim, like ‘hello, we value your call, please
state your claim number’. Similarly if I want to translate this phrase into
German, the program will check the data bank for previous translations of this
phrase and select the one with the highest statistical value, given some
context that is calculated by some clever algorithm. Given the advent of ‘big
data’ just about everything that has ever been said and written can be stored
in digital format and can be statistically matched to anything you say or
write.
The commercial application is fantastic: language is
automated, making call centres redundant (even the ones that employ cheap
labour in India or the Philippines). The military complex is equally jubilant,
what with secret services now being able to monitor and analyse all voice and
written traffic all around the world. The Orwellian nightmare of your TV
watching you as much as you watch the TV has become a reality. Leonard Cohen’s
line that the rich will monitor the bedrooms of the poor – for entertainment –
has equally become true. The Huxleyan dystopian vision in Brave New World also
rings true: information overload as a sedative, pills that make you happy and
dissidents kept in human zoos. Orwellian newspeak and linguistic subversion
(‘all animals are equal but some animals are more equal than others’) have
become the stuff of fake news and Breitbart rhetoric.
So why has no eminent linguist debunked Robert Mercer? Why
has no academic linguist commented on the ‘gross and crude’ travesty visited
upon human language by Mercer and his ilk? After all he received quite a few
academic honours along the way. Why has no linguist pointed out that language
as a creative human facility cannot be restricted to what is stored in a data
base? Wasn’t it a Chomskyian dictum that language with its set of finite
syntactic rules can create an infinite output of sentences? Isn’t that the
basic idea of language? People who seek to stifle this creativity are of course
troubled by its potential, namely to bring unlimited (infinite) freedom of
expression to the people of the world, including ideas that provide social justice
and a measure of economic well-being for all. Neo-fascists (alt-right) like
Erdogan, Trump, Farage, Le Pen, Wilders, Petry, Bannon, Mercer and a million
others who call others fascists fascists in an Orwellian merry-go-round of
meaningless language, engaging in what Wilhelm Reich has called the ‘mass
psychology of fascism’, emptying language of meaning, and substituting complex
sentences with ever shorter slogans. The British author Ian McEwan quite
rightly noted that ‘Brexit’ reminded him of the Third Reich whereby the voice
of the people becomes a series of manipulated referendums.
Obviously Mercer and Co. exploit ‘big data’ not only for
human voice recognition and machine translation but also for a new brand of
‘manufacturing consent’ (à la Herman & Chomsky) that forces language into a
statistical straightjacket, allowing only for a algorithmic paradigm that
supports the dominant discourse of the alt-right. The traditional vehicle for
such manipulation – the mainstream media – has until recently played the part
of benign collaborator of neo-liberal politics and capitalist economics but is
now branded by Trump and Co. as the enemy lest they tow the line and begin to
support with great enthusiasm the narcissistic leaders of the alt-right.
Bypassing the traditional media with bizarre social media forums like Twitter
and Facebook, the new media will dictate what can and cannot be said. Ever more
blatant verbal attacks on perceived domestic opposition will eventually give
rise to brutish violence, given many a historical precedent, e.g. the Nazi
propaganda machine.
Unfortunately Mercer and Co. do understand the value of a
human-specific language, hence in order to de-humanize large sections of the
population, one has to limit if not to destroy language as the only faculty
that makes us human. Wars cannot be fought by being polite and considerate:
pathological aggression must be mirrored in narrowly prescribed language use –
as the handbooks of all armed forces around the world will tell you. The
categorical imperative of what one ‘should’ do is replaced by a simple ‘must’.
Computational linguistics as statistical modelling has
already reached new heights in English language testing, as for example in the
Pearson Test of English, which is totally computerized in all language modes,
i.e. speaking, listening, reading and writing. While the passive modes of
listening and reading have long been subject to education systems that control
and limit freedom of expression, it is now the active modes that have been
harnessed. The algorithms that check your essay writing will not allow
sentences that - while grammatically
correct – find no match in the prescribed data base. If you write, à la
Chomsky, that the United States are a terrorist state, along with North-Korea,
Israel, Saudi-Arabia and any other state you care to mention, you will fail
your English language test and in addition will be referred to various secret
service agencies that mine such data for dissenting language. That all this is
now possible without direct human intervention says a lot about the success of
computational linguistics, devised and run by non-linguists like Mercer.
Naturally these systems are ‘gross and crude’ and are subject to all kinds of
hacking and cyber warfare – and are being disclosed by the occasional
whistle-blowers like Snowden – simply because the underlying mechanisms of
language use are as ‘gross and crude’ as that of Skinner, if not more so. Computational
and corpus linguistics are therefore misnomers.
They reveal absolutely nothing about human language competence per se
but tell us everything about language use, like the very high statistical
probability that members of the Ku Klux Klan will use ‘race’ as a key concept
in their daily discourse. Statistics of this sort only confirms what we know
already. In a similar vein Chomsky pointed out that linguistic fieldwork of the
descriptive sort will only confirm what we know intuitively about language. Why
then are we sliding into this pseudo-scientific morass that elevates
computational linguistics to the absolute heights of the human sciences?
The LinguistList
used to mainly advertise jobs for linguists in universities; now there is a
preponderance of jobs advertised for a plethora of private companies that
specialize in computational linguistics. Sure, big money is to be made if you
crack the code and develop a program that will ghost-write perfect speeches for
Trump and Co. Obviously one of the requirements will be to repeat and repeat
key sentences (slogans) so that the message will not be lost on those millions
whose attention span is less than a millisecond. Tragically the computerized
speech writer will produce dumb text that will be celebrated as the height of
literary rhetoric (witness Reagan’s ‘axis of evil’, Obama’s ‘yes, we can’ and
Trump’s ‘make America great again’). Human language will be reduced to passive
click-bait consumption. The neo-feudalist class of super-managers surrounded by
computer geeks will reap all the material benefits of the vulture economy and
laugh all the way to the club of billionaires.
Eventually however, the irrepressible human facility for
creative language will give rise to yet another French/Russian/Chinese/Cuban-style
revolution that will transform societies as never before, and by the way
reinstate bio-linguistics to the top of human sciences.