"Trying to get a computer to work out what words mean - distinguish between 'rider' and 'horse' say, and work out how they relate to each other - is a long-standing problem in artificial intelligence research.
One of the difficulties has been working out how to represent knowledge in ways that allow computers to use it. But suddenly that is not a problem any more, thanks to the massive body of text that is available, ready indexed, on search engines like Google (which has more than 8 billion pages indexed).
The meaning of a word can usually be gleaned from the words used around it. (...) Vitanyi and Cilibrasi have developed a statistical indicator based on these hit counts that gives a measure of a logical distance separating a pair of words. They call this the normalised Google distance, or NGD. The lower the NGD, the more closely the words are related.
By repeating this process for lots of pairs of words, it is possible to build a map of their distances, indicating how closely related the meanings of the words are. From this a computer can infer meaning, says Vitanyi. 'This is automatic meaning extraction. It could well be the way to make a computer understand things and act semi-intelligently,' he says.
The technique has managed to distinguish between colours, numbers, different religions and Dutch painters based on the number of hits they return."
New Scientist: Google's search for meaning (28 jan)