InOtherWords lexical database
While computing has been primarily concerned with the presentation of data, little has been developed
to date to make the actual information content generally available for electronic processing.
Extensive information about language itself has been relatively unavailable.
Uses of InOtherWords
Information content and linguistic data is important in a number of current fields.
CNS has developed an expert system for the English that everyone knows. It is a linguistic tool
which will be a step in solving technical problems like the following:
The computer does not in general know that coal, gas, and oil are fuels,
and that airlines operate airplanes that fly, take off, land,
carry passengers and use fuel. So the computer can generally not assist the user in finding all
articles in a database about air travel if (s)he specifies only the word airline in the query.
A human assistant would also pick out the articles that talk about aircraft, airports, jets, and other related concepts.
Most current PC applications store only an index of all the words in all the documents. It would be
helpful also to automatically cross-reference documents according to their aboutness, or to be able to
immediately access all other words in the relevant semantic domain.
An automatic translation system from Russian to English needs to know that, although both wide and
broad are usually translated as shiroko. In English there is a set of adjectives which involve a boundary,
and can be used with measurements: 5 feet wide, but not 5 feet broad; 6 pounds heavy, but not 6 pounds fat.
It needs to know that many English words ending in -er or -le involve repetitions, such as battle vs.
fight, batter vs. beat. It needs to know that although Russian boltat
can be translated as chat, chatter, gab, shoot the breeze, the first three
of these verbs in English typically have female subjects. It sounds funny to say He was chattering away.
Speech Recognition and OCR
If these programs had an idea what the message was all about, and could guess which words in a text were
likely to appear, they would be able to sort out the ambiguities with a far better speed and accuracy. If such a
program knew that an article was about writing, it could prioritize a reading of word
over work. If it knew by a syntactic parse, however, that the word had to be a verb modified
by hand, it might still prioritize work over word.
For archiving and telecommunications, its important to store as much data as possible in as little
space as possible, preferable without losing access speed. Compression is essentially the removal of
redundancy. If A can be predicted from B, one need not store A. Thus the trick to linguistic compression is
to have access to as many generalizations about language as possible.
Size and content
About 40,000 pages of linguistic information have been compiled and entered into the database up to now.
Up to 300 categories of information are available for over 100,000 words of the English language, each of
which is divided into an average of three to four senses. The complete set of relations and structures is a
network of millions of concepts and specifications. To do this, CNS has invented hundreds of proprietary
concepts and technologies.
For example, IOW knows monitor is a person, who monitors a situation; monitor is an object as in a
video monitor; monitor is an object as in a regulator; monitor is an object as a general device for the
observance of events or situations; Monitor is the name of the famous gunboat in the Civil War; monitor is a
name for a species of reptiles. Every one of these words monitor comes with a full entry of specifications
and relations. This includes, but is not limited to:
- Syntax: For example, you can say, "I like running", "I like to run", "I enjoy running", but not "I enjoy to run."
- Semantic Net: Includes Made of, Purpose, Result, Cause, Part of, Is, Shape, Texture, Color, Has,
Situation, Field, Synonyms, Antonyms, and numerous other relations.
- Semantic Constraints: For example, the word "leash" implies a dog, the verb "eat", unless used metaphorically,
must have an animal for a subject, and some kind of food for the object, "planting" occurs
in earth, etc.
- Morphology: For example, under((e/valu)ate)
- Idioms, Cliches Quotes, Cultural Literacy
CNS is planning to release a number of products for the OEM market that will be based on its
proprietary InOtherWords technology but will not rule out licensing the InOtherWords lexicon
itself on an OEM basis with certain restrictions.