HI

... this is an expanding selection of pics and of some of my shorter pieces of writing ... and other bits and pieces ... in German and mainly English ... and other strange languages ... COME BACK AND CHECK IT OUT ... COMMENTS WELCOME

wolfgangsperlich@gmail.com


Tuesday, March 21, 2017

COMPUTATIONAL LINGUISTICS à la R. Mercer: GROSS AND CRUDE


COMPUTATIONAL LINGUISTICS à la R. Mercer: GROSS AND CRUDE


Chomsky’s 1959 review of Skinner’s Verbal Behaviour saved us from ‘gross and crude’ behaviourism in linguistics, if not in psychology itself. In fact, behaviourism in the world of business (marketing and advertising) still rules supreme, and if not checked will lead to neo-fascist models of behaviourist manipulation, as Chomsky also warned.

It is my contention that this threat to human civilisation has been further exacerbated by what one can call either an extension of behaviourism or else a new development occasioned by computational linguistics. Initially popular science was enamoured by the idea that language can be compared to the computer in terms of the human brain being some sort of hardware which can be programmed by some clever software. The software in question would have to be something like Chomskyian parsing programmes, embedded in Artificial Intelligence, with the ability to acquire language like children do. Given the slow progress in this seemingly impossible task, this raised the ire of the business community that wanted results so that language could be commercialized – in combination with military applications of course.

The enfant terrible in this case, unlike a somewhat benign Skinner before him, is one Robert Mercer, who not only subverted computational linguistics but also made a fortune from it and now bankrolls the likes of Trump and Bannon. The story is described somewhat diffidently in a Guardian article subtitled ‘With links to Donald Trump, Steve Bannon and Nigel Farage, the rightwing US computer scientist is at the heart of a multimillion-dollar propaganda network’.

Mercer, a non-linguist, had the brilliant idea that voice recognition and machine-translation can be achieved by simple statistical matching: when you say ‘hello’ when you phone your insurance company about a claim, the voice recognition program immediately constructs a digital oscillation and compares it to a stored model recorded by an average speaker, and if there is a match within an allowable range, the computer program accepts your ‘hello’ and then responds with a phrase that has a high statistical value in the context of an insurance claim, like ‘hello, we value your call, please state your claim number’. Similarly if I want to translate this phrase into German, the program will check the data bank for previous translations of this phrase and select the one with the highest statistical value, given some context that is calculated by some clever algorithm. Given the advent of ‘big data’ just about everything that has ever been said and written can be stored in digital format and can be statistically matched to anything you say or write.

The commercial application is fantastic: language is automated, making call centres redundant (even the ones that employ cheap labour in India or the Philippines). The military complex is equally jubilant, what with secret services now being able to monitor and analyse all voice and written traffic all around the world. The Orwellian nightmare of your TV watching you as much as you watch the TV has become a reality. Leonard Cohen’s line that the rich will monitor the bedrooms of the poor – for entertainment – has equally become true. The Huxleyan dystopian vision in Brave New World also rings true: information overload as a sedative, pills that make you happy and dissidents kept in human zoos. Orwellian newspeak and linguistic subversion (‘all animals are equal but some animals are more equal than others’) have become the stuff of fake news and Breitbart rhetoric.

So why has no eminent linguist debunked Robert Mercer? Why has no academic linguist commented on the ‘gross and crude’ travesty visited upon human language by Mercer and his ilk? After all he received quite a few academic honours along the way. Why has no linguist pointed out that language as a creative human facility cannot be restricted to what is stored in a data base? Wasn’t it a Chomskyian dictum that language with its set of finite syntactic rules can create an infinite output of sentences? Isn’t that the basic idea of language? People who seek to stifle this creativity are of course troubled by its potential, namely to bring unlimited (infinite) freedom of expression to the people of the world, including ideas that provide social justice and a measure of economic well-being for all. Neo-fascists (alt-right) like Erdogan, Trump, Farage, Le Pen, Wilders, Petry, Bannon, Mercer and a million others who call others fascists fascists in an Orwellian merry-go-round of meaningless language, engaging in what Wilhelm Reich has called the ‘mass psychology of fascism’, emptying language of meaning, and substituting complex sentences with ever shorter slogans. The British author Ian McEwan quite rightly noted that ‘Brexit’ reminded him of the Third Reich whereby the voice of the people becomes a series of manipulated referendums.

Obviously Mercer and Co. exploit ‘big data’ not only for human voice recognition and machine translation but also for a new brand of ‘manufacturing consent’ (à la Herman & Chomsky) that forces language into a statistical straightjacket, allowing only for a algorithmic paradigm that supports the dominant discourse of the alt-right. The traditional vehicle for such manipulation – the mainstream media – has until recently played the part of benign collaborator of neo-liberal politics and capitalist economics but is now branded by Trump and Co. as the enemy lest they tow the line and begin to support with great enthusiasm the narcissistic leaders of the alt-right. Bypassing the traditional media with bizarre social media forums like Twitter and Facebook, the new media will dictate what can and cannot be said. Ever more blatant verbal attacks on perceived domestic opposition will eventually give rise to brutish violence, given many a historical precedent, e.g. the Nazi propaganda machine.

Unfortunately Mercer and Co. do understand the value of a human-specific language, hence in order to de-humanize large sections of the population, one has to limit if not to destroy language as the only faculty that makes us human. Wars cannot be fought by being polite and considerate: pathological aggression must be mirrored in narrowly prescribed language use – as the handbooks of all armed forces around the world will tell you. The categorical imperative of what one ‘should’ do is replaced by a simple ‘must’.

Computational linguistics as statistical modelling has already reached new heights in English language testing, as for example in the Pearson Test of English, which is totally computerized in all language modes, i.e. speaking, listening, reading and writing. While the passive modes of listening and reading have long been subject to education systems that control and limit freedom of expression, it is now the active modes that have been harnessed. The algorithms that check your essay writing will not allow sentences that  - while grammatically correct – find no match in the prescribed data base. If you write, à la Chomsky, that the United States are a terrorist state, along with North-Korea, Israel, Saudi-Arabia and any other state you care to mention, you will fail your English language test and in addition will be referred to various secret service agencies that mine such data for dissenting language. That all this is now possible without direct human intervention says a lot about the success of computational linguistics, devised and run by non-linguists like Mercer. Naturally these systems are ‘gross and crude’ and are subject to all kinds of hacking and cyber warfare – and are being disclosed by the occasional whistle-blowers like Snowden – simply because the underlying mechanisms of language use are as ‘gross and crude’ as that of Skinner, if not more so. Computational and corpus linguistics are therefore misnomers.  They reveal absolutely nothing about human language competence per se but tell us everything about language use, like the very high statistical probability that members of the Ku Klux Klan will use ‘race’ as a key concept in their daily discourse. Statistics of this sort only confirms what we know already. In a similar vein Chomsky pointed out that linguistic fieldwork of the descriptive sort will only confirm what we know intuitively about language. Why then are we sliding into this pseudo-scientific morass that elevates computational linguistics to the absolute heights of the human sciences?

The LinguistList used to mainly advertise jobs for linguists in universities; now there is a preponderance of jobs advertised for a plethora of private companies that specialize in computational linguistics. Sure, big money is to be made if you crack the code and develop a program that will ghost-write perfect speeches for Trump and Co. Obviously one of the requirements will be to repeat and repeat key sentences (slogans) so that the message will not be lost on those millions whose attention span is less than a millisecond. Tragically the computerized speech writer will produce dumb text that will be celebrated as the height of literary rhetoric (witness Reagan’s ‘axis of evil’, Obama’s ‘yes, we can’ and Trump’s ‘make America great again’). Human language will be reduced to passive click-bait consumption. The neo-feudalist class of super-managers surrounded by computer geeks will reap all the material benefits of the vulture economy and laugh all the way to the club of billionaires.

Eventually however, the irrepressible human facility for creative language will give rise to yet another French/Russian/Chinese/Cuban-style revolution that will transform societies as never before, and by the way reinstate bio-linguistics to the top of human sciences.