Using Wordnet as a lexical database in applications

What is Wordnet?

WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms. Each of these sets have an associated meaning to them.

Wordnet can be used in a variety of NLP applications for myriad purposes like entity detection and recognition, finding semantic similarities etc.

NLTK provides a wonderful interface to Wordnet.  But this is useful only if one is coding in Python. There must be another way of accessing the Wordnet Ontology in a programming language agnostic way. Thus the need to convert Wordnet knowledge into a Database.

I’ll be sticking to the most important applications of wordnet, namely- finding synsets, finding hypernyms and hyponyms. Please see official site for wordnet if you want to know more about the diferent relations in wordnet.

The database can be explored further and used for a variety of other purposes such as POS tagging, Context gathering etc.

IMPORTING woRDNET AS A mysql db

Download a .sql backup file from here.

Import it to mysql either using the phpmyadmin UI or using mysql command line
mysql -u username -p database_name < /full-path-to-file/wordnet.sql

Wordnet provides semantic as well as lexical relations, we will consider only semantic ones here.

I’ll explain the schema a bit here –

  1. Words Table: This table contains words along a unique id.
  2. Synsets Table: This table consists of unique synset ids, what is their Part of speech and what do they mean.
  3. Senses Table: This table contains the actual mapping from wordids to synsetids. One wordid may have multiple synsetids.

I’ll explain some queries which have to be processed for getting important information

Check if word exists in wordnet

SELECT * FROM `words` WHERE `lemma` = 'enter word here'

If this query returns a row then the word exists in wordnet.

Get word synonyms

First get the wordid for the word you are looking for by:

SELECT `wordid` FROM `words` WHERE `lemma` = 'this'

Then get the corresponding synsetid for the given wordid

SELECT `synsetid` FROM `senses` WHERE `wordid` = "some number here"

This query might return multiple rows because one word can be a part of multiple synsets.

Then for all the found synsetids, find words occurring in the synset using the following query:

SELECT DISTINCT `wordid` FROM `senses` WHERE `synsetid` = "some number here"

This is just the tip of the iceberg. There are similar queries for finding hyernyms, hyponyms etc. Feel free to explore the schema and extract other semantic and lexical information from WordNet.  Please feel free to comment in case of mistakes , issues in code or explanation.

Happy Coding ! 😀

Standard

Leave a comment