The FrameNet corpus is a lexical database of English that is both human- and machine-readable, based on annotating examples of how words are used in actual texts. FrameNet is based on a theory of meaning called Frame Semantics, deriving from the work of Charles J. Fillmore and colleagues. The basic idea is straightforward: that the meanings of most words can best be understood on the basis of a semantic frame: a description of a type of event, relation, or entity and the participants in it. For example, the concept of cooking typically involves a person doing the cooking (Cook), the food that is to be cooked (Food), something to hold the food while cooking (Container) and a source of heat (Heating_instrument). In the FrameNet project, this is represented as a frame called Apply_heat, and the Cook, Food, Heating_instrument and Container are called frame elements (FEs). Words that evoke this frame, such as fry, bake, boil, and broil, are called lexical units (LUs) of the Apply_heat frame. The job of FrameNet is to define the frames and to annotate sentences to show how the FEs fit syntactically around the word that evokes the frame.
A Frame is a script-like conceptual structure that describes a particular type of situation, object, or event along with the participants and props that are needed for that Frame. For example, the "Apply_heat" frame describes a common situation involving a Cook, some Food, and a Heating_Instrument, and is evoked by words such as bake, blanch, boil, broil, brown, simmer, steam, etc.
We call the roles of a Frame "frame elements" (FEs) and the frame-evoking words are called "lexical units" (LUs).
FrameNet includes relations between Frames. Several types of relations are defined, of which the most important are:
To get a list of all of the Frames in FrameNet, you can use the frames() function. If you supply a regular expression pattern to the frames() function, you will get a list of all Frames whose names match that pattern:
>>> from pprint import pprint >>> from nltk.corpus import framenet as fn >>> len(fn.frames()) 1019 >>> pprint(fn.frames(r'(?i)medical')) [<frame ID=256 name=Medical_specialties>, <frame ID=257 name=Medical_instruments>, ...]
To get the details of a particular Frame, you can use the frame() function passing in the frame number:
>>> from pprint import pprint >>> from nltk.corpus import framenet as fn >>> f = fn.frame(256) >>> f.ID 256 >>> f.name 'Medical_specialties' >>> f.definition # doctest: +ELLIPSIS "This frame includes words that name ..." >>> len(f.lexUnit) 29 >>> pprint(sorted([x for x in f.FE])) ['Affliction', 'Body_system', 'Specialty', 'Type'] >>> pprint(f.frameRelations) [<Parent=Cure -- Using -> Child=Medical_specialties>]
The frame() function shown above returns a dict object containing detailed information about the Frame. See the documentation on the frame() function for the specifics.
You can also search for Frames by their Lexical Units (LUs). The frames_by_lemma() function returns a list of all frames that contain LUs in which the 'name' attribute of the LU matchs the given regular expression. Note that LU names are composed of "lemma.POS", where the "lemma" part can be made up of either a single lexeme (e.g. 'run') or multiple lexemes (e.g. 'a little') (see below).
>>> from nltk.corpus import framenet as fn >>> fn.frames_by_lemma(r'(?i)a little') [<frame ID=189 name=Quantity>, <frame ID=2001 name=Degree>]
A lexical unit (LU) is a pairing of a word with a meaning. For example, the "Apply_heat" Frame describes a common situation involving a Cook, some Food, and a Heating Instrument, and is _evoked_ by words such as bake, blanch, boil, broil, brown, simmer, steam, etc. These frame-evoking words are the LUs in the Apply_heat frame. Each sense of a polysemous word is a different LU.
We have used the word "word" in talking about LUs. The reality is actually rather complex. When we say that the word "bake" is polysemous, we mean that the lemma "bake.v" (which has the word-forms "bake", "bakes", "baked", and "baking") is linked to three different frames:
These constitute three different LUs, with different definitions.
Multiword expressions such as "given name" and hyphenated words like "shut-eye" can also be LUs. Idiomatic phrases such as "middle of nowhere" and "give the slip (to)" are also defined as LUs in the appropriate frames ("Isolated_places" and "Evading", respectively), and their internal structure is not analyzed.
Framenet provides multiple annotated examples of each sense of a word (i.e. each LU). Moreover, the set of examples (approximately 20 per LU) illustrates all of the combinatorial possibilities of the lexical unit.
Each LU is linked to a Frame, and hence to the other words which evoke that Frame. This makes the FrameNet database similar to a thesaurus, grouping together semantically similar words.
In the simplest case, frame-evoking words are verbs such as "fried" in:
"Matilde fried the catfish in a heavy iron skillet."
Sometimes event nouns may evoke a Frame. For example, "reduction" evokes "Cause_change_of_scalar_position" in:
"...the reduction of debt levels to $665 million from $2.6 billion."
Adjectives may also evoke a Frame. For example, "asleep" may evoke the "Sleep" frame as in:
"They were asleep for hours."
Many common nouns, such as artifacts like "hat" or "tower", typically serve as dependents rather than clearly evoking their own frames.
Details for a specific lexical unit can be obtained using this class's lus() function, which takes an optional regular expression pattern that will be matched against the name of the lexical unit:
>>> from pprint import pprint >>> from nltk.corpus import framenet as fn >>> len(fn.lus()) 11829 >>> pprint(fn.lus(r'(?i)a little')) [<lu ID=14744 name=a little bit.adv>, <lu ID=14733 name=a little.n>, ...]
You can obtain detailed information on a particular LU by calling the lu() function and passing in an LU's 'ID' number:
>>> from pprint import pprint >>> from nltk.corpus import framenet as fn >>> fn.lu(256).name 'foresee.v' >>> fn.lu(256).definition 'COD: be aware of beforehand; predict.' >>> fn.lu(256).frame.name 'Expectation' >>> fn.lu(256).lexemes[0].name 'foresee'
Note that LU names take the form of a dotted string (e.g. "run.v" or "a little.adv") in which a lemma preceeds the "." and a part of speech (POS) follows the dot. The lemma may be composed of a single lexeme (e.g. "run") or of multiple lexemes (e.g. "a little"). The list of POSs used in the LUs is:
v - verb n - noun a - adjective adv - adverb prep - preposition num - numbers intj - interjection art - article c - conjunction scon - subordinating conjunction
For more detailed information about the info that is contained in the dict that is returned by the lu() function, see the documentation on the lu() function.
The FrameNet corpus contains a small set of annotated documents. A list of these documents can be obtained by calling the documents() function:
>>> from pprint import pprint >>> from nltk.corpus import framenet as fn >>> docs = fn.documents() >>> len(docs) 78 >>> pprint(sorted(docs[0].keys())) ['ID', 'corpid', 'corpname', 'description', 'filename']
Detailed information about each sentence contained in each document can be obtained by calling the annotated_document() function and supplying the 'ID' number of the document. For detailed information about the info that is for each document, see the documentation on the annotated_document() function.