Tuesday, 10 February 2009

The social life of words

‘You shall know a word by the company it keeps.’

I recently stumbled across this lovely quote from J.R. Firth again and was reminded of how important its message is to the way we carry out language research here at Chambers. When working on a dictionary entry, it is crucial that we support our intuition by gathering evidence of the words with which the term being defined tends to keep company or collocate. This process of uncovering a word’s preferred companions is facilitated by a corpus - a collection of electronically encoded texts of written or spoken language which acts as a representative sample of ‘real-world’ language.

Here at Chambers we have developed the Chambers Harrap International Corpus (CHIC) – almost a billion words of modern, international English from a diversity of sources including newspapers, magazines, blogs, websites, published fiction and non-fiction, and transcribed speech. We use statistical analysis to identify significant collocations in the corpus. Collocations are word pairs with a strong association as opposed to words ending up in each other’s company by chance. Collocations extracted from CHIC for the word ‘powerful’ include ‘powerful politician’, ‘powerful computer’ and ‘powerful antioxidant’. Each collocation corresponds to a subtly different aspect of the word’s meaning which should be accounted for in its dictionary entry.

This sort of analysis is very useful when trying to tease out differences in meaning between two near-synonyms, for example ‘wind’ and ‘breeze’. According to CHIC, adjectives strongly associated with ‘wind’ rather than ‘breeze’ include ‘strong’, ‘fierce’ and ‘damaging’, while those displaying a strong preference for ‘breeze’ include ‘gentle’, ‘balmy’ and ‘cool’. This pattern is replicated in the words’ verbal associations: while a ‘wind’ tends to ‘howl’, ‘gust’ and ‘whip’, a ‘breeze’ will ‘rustle’, ‘caress’ and ‘cool’.

It is important also to observe and record any variation in the company being kept by a word as this will potentially point to new meanings or senses. Take for example the verb ‘burn’. While the primary meaning of ‘to burn something’ is to damage or destroy it with fire or heat, in the last decade or so the verb began associating significantly with digital media terms such as ‘DVD’ and ‘CD’. An examination of these cases alerted us to the ‘making a copy of’ sense of ‘burn’ which was duly recorded in the dictionary. More recently we added a new sense for the noun ‘mash-up’ to Chambers Reference Online (the web edition of The Chambers Dictionary). Originally a ‘mash-up’ referred to an audio file created by merging the vocal track from one song with the instrumental or rhythm track of another, made most famous by the Belgian duo 2ManyDJs. Now, a ‘mash-up’ also describes the combination of pre-existing audio, video, text or graphics to create a new multimedia file. YouTube contains many examples of users merging songs or dubbed dialogue with video clips. This new meaning of ‘mash-up’ is reflected in its more recent CHIC collocates which include ‘video’, ‘YouTube’, ‘ad’ and ‘trailer’ (movie trailers are a particularly rich source of mash-up fodder).

Sometimes the association between certain words is so strong that, rather than pointing to a new sense of one of the words, they point to a multiword term or phrase which has established itself in the language and is deserving of its own dictionary entry. In a recent update to Chambers Reference Online we added two such multiword units: ‘semantic web’ and ‘augmented reality’. Our decision to include these phrases was supported by analysis of the words typically modified by ‘semantic’ and ‘augmented’ in CHIC.

The corpus evidence shows that it’s impossible to underestimate the fickleness of word as social animal but regularly updating our corpus helps us to keep on top of the wealth of information that can be deduced from a word’s social proclivities. All assistance is appreciated however so if any of the words you observe in the wild appear to be behaving suspiciously or moving with a new crowd, we’d be very interested to hear about it.

Ruth O'Donovan

Bookmark this post

If you have any feedback on this entry, please email the author using the form below. They'd love to hear from you!
Your Name :
Your Email :
Subject :
Message :
Image (case-sensitive):