Discourse Checking

1 Introduction

The NLTK discourse module makes it possible to test consistency and redundancy of simple discourses, using theorem-proving and model-building from nltk.inference.

The DiscourseTester constructor takes a list of sentences as a parameter.

>>> from nltk.inference.discourse import *
>>> dt = DiscourseTester(['a boxer walks', 'every boxer chases a girl'])

The DiscourseTester parses each sentence into a list of logical forms. Once we have created DiscourseTester object, we can inspect various properties of the discourse. First off, we might want to double-check what sentences are currently stored as the discourse.

>>> dt.sentences()
s0: a boxer walks
s1: every boxer chases a girl

As you will see, each sentence receives an identifier s_i. We might also want to check what grammar the DiscourseTester is using (by default, grammars/sem4.fcfg):

>>> dt.grammar() 
% start S
# Grammar Rules
S[sem = <app(?subj,?vp)>] -> NP[num=?n,sem=?subj] VP[num=?n,sem=?vp]
NP[num=?n,sem=<app(?det,?nom)> ] -> Det[num=?n,sem=?det]  Nom[num=?n,sem=?nom]
NP[loc=?l,num=?n,sem=?np] -> PropN[loc=?l,num=?n,sem=?np]
...

A different grammar can be invoked by using the optional gramfile parameter when a DiscourseTester object is created.

2 Readings and Threads

Depending on the grammar used, we may find some sentences have more than one logical form. To check this, use the readings() method. Given a sentence identifier of the form s_i, each reading of that sentence is given an identifier s_i-r_j.

>>> dt.readings()

s0 readings:
------------------------------
s0-r0: exists x.(boxerdog(x) & walk(x))
s0-r1: exists x.(boxer(x) & walk(x))

s1 readings:
------------------------------
s1-r0: all x.(boxerdog(x) -> exists z1.(girl(z1) & chase(x,z1)))
s1-r1: all x.(boxer(x) -> exists z2.(girl(z2) & chase(x,z2)))

In this case, the only source of ambiguity lies in the word boxer, which receives two translations: boxer and boxerdog. The intention is that one of these corresponds to the person sense and one to the dog sense. In principle, we would also expect to see a quantifier scope ambiguity in s1. However, the simple grammar we are using, namely sem4.fcfg, doesn't support quantifier scope ambiguity.

We can also investigate the readings of a specific sentence:

>>> dt.readings('a boxer walks')
The sentence 'a boxer walks' has these readings:
    exists x.(boxerdog(x) & walk(x))
    exists x.(boxer(x) & walk(x))

Given that each sentence is two-ways ambiguous, we potentially have four different discourse 'threads', taking all combinations of readings. To see these, specify the threaded=True parameter on the readings() method. Again, each thread is assigned an identifier of the form d_i. Following the identifier is a list of the readings that constitute that thread.

>>> dt.readings(threaded=True)
d0: ['s0-r0', 's1-r0']
d1: ['s0-r0', 's1-r1']
d2: ['s0-r1', 's1-r0']
d3: ['s0-r1', 's1-r1']

Of course, this simple-minded approach doesn't scale: a discourse with, say, three sentences, each of which has 3 readings, will generate 27 different threads. It is an interesting exercise to consider how to manage discourse ambiguity more efficiently.

3 Checking Consistency

Now, we can check whether some or all of the discourse threads are consistent, using the models() method. With no parameter, this method will try to find a model for every discourse thread in the current discourse. However, we can also specify just one thread, say d1.

>>> dt.models('d1')
--------------------------------------------------------------------------------
Model for Discourse Thread d1
--------------------------------------------------------------------------------
% number = 1
% seconds = 0

% Interpretation of size 2

c1 = 0.

f1(0) = 0.
f1(1) = 0.

- boxer(0).
- boxer(1).

  boxerdog(0).
- boxerdog(1).

- girl(0).
- girl(1).

  walk(0).
- walk(1).

- chase(0,0).
- chase(0,1).
- chase(1,0).
- chase(1,1).

Consistent discourse: d1 ['s0-r0', 's1-r1']:
    s0-r0: exists x.(boxerdog(x) & walk(x))
    s1-r1: all x.(boxer(x) -> exists z8.(girl(z8) & chase(x,z8)))

There are various formats for rendering Mace4 models --- here, we have used the 'cooked' format (which is intended to be human-readable). There are a number of points to note.

The entities in the domain are all treated as non-negative integers. In this case, there are only two entities, 0 and 1.
The - symbol indicates negation. So 0 is the only boxerdog and the only thing that walks. Nothing is a boxer, or a girl or in the chase relation. Thus the universal sentence is vacuously true.
c1 is an introduced constant that denotes 0.
f1 is a Skolem function, but it plays no significant role in this model.

We might want to now add another sentence to the discourse, and there is method add_sentence() for doing just this.

>>> dt.add_sentence('John is a boxer')
>>> dt.sentences()
s0: a boxer walks
s1: every boxer chases a girl
s2: John is a boxer

We can now test all the properties as before; here, we just show a couple of them.

>>> dt.readings()

s0 readings:
------------------------------
s0-r0: exists x.(boxerdog(x) & walk(x))
s0-r1: exists x.(boxer(x) & walk(x))

s1 readings:
------------------------------
s1-r0: all x.(boxerdog(x) -> exists z9.(girl(z9) & chase(x,z9)))
s1-r1: all x.(boxer(x) -> exists z10.(girl(z10) & chase(x,z10)))

s2 readings:
------------------------------
s2-r0: boxerdog(John)
s2-r1: boxer(John)
>>> dt.readings(threaded=True)
d0: ['s0-r0', 's1-r0', 's2-r0']
d1: ['s0-r0', 's1-r0', 's2-r1']
d2: ['s0-r0', 's1-r1', 's2-r0']
d3: ['s0-r0', 's1-r1', 's2-r1']
d4: ['s0-r1', 's1-r0', 's2-r0']
d5: ['s0-r1', 's1-r0', 's2-r1']
d6: ['s0-r1', 's1-r1', 's2-r0']
d7: ['s0-r1', 's1-r1', 's2-r1']

If you are interested in a particular thread, the expand_threads() method will remind you of what readings it consists of:

>>> thread = dt.expand_threads('d6')
>>> for rid, reading in thread:
...     print rid, str(reading)
s0-r1 exists x.(boxer(x) & walk(x))
s1-r1 all x.(boxer(x) -> exists z12.(girl(z12) & chase(x,z12)))
s2-r0 boxerdog(John)

Suppose we have already defined a discourse, as follows:

>>> dt = DiscourseTester(['A student dances', 'Every student is a person'])

Now, when we add a new sentence, is it consistent with what we already have? The `` consistchk=True`` parameter of add_sentence() allows us to check:

>>> dt.add_sentence('No person dances', consistchk=True)
Inconsistent discourse d0 ['s0-r0', 's1-r0', 's2-r0']:
    s0-r0: exists x.(student(x) & dance(x))
    s1-r0: all x.(student(x) -> person(x))
    s2-r0: -exists x.(person(x) & dance(x))

>>> dt.readings()

s0 readings:
------------------------------
s0-r0: exists x.(student(x) & dance(x))

s1 readings:
------------------------------
s1-r0: all x.(student(x) -> person(x))

s2 readings:
------------------------------
s2-r0: -exists x.(person(x) & dance(x))

So let's retract the inconsistent sentence:

>>> dt.retract_sentence('No person dances', quiet=False)
Current sentences are
s0: A student dances
s1: Every student is a person

We can now verify that result is consistent.

>>> dt.models()
--------------------------------------------------------------------------------
Model for Discourse Thread d0
--------------------------------------------------------------------------------
% number = 1
% seconds = 0

% Interpretation of size 2

c1 = 0.

  dance(0).
- dance(1).

  person(0).
- person(1).

  student(0).
- student(1).

Consistent discourse: d0 ['s0-r0', 's1-r0']:
    s0-r0: exists x.(student(x) & dance(x))
    s1-r0: all x.(student(x) -> person(x))

4 Checking Informativity

Let's assume that we are still trying to extend the discourse A student dances. Every student is a person. We add a new sentence, but this time, we check whether it is informative with respect to what has gone before.

>>> dt.add_sentence('A person dances', informchk=True)
Sentence 'A person dances' under reading 'exists x.(person(x) & dance(x))':
Not informative relative to thread 'd0'

In fact, we are just checking whether the new sentence is entailed by the preceding discourse.

>>> dt.models()
--------------------------------------------------------------------------------
Model for Discourse Thread d0
--------------------------------------------------------------------------------
% number = 1
% seconds = 0

% Interpretation of size 2

c1 = 0.

c2 = 0.

  dance(0).
- dance(1).

  person(0).
- person(1).

  student(0).
- student(1).

Consistent discourse: d0 ['s0-r0', 's1-r0', 's2-r0']:
    s0-r0: exists x.(student(x) & dance(x))
    s1-r0: all x.(student(x) -> person(x))
    s2-r0: exists x.(person(x) & dance(x))

5 Adding Background Knowledge

Let's build a new discourse, and look at the readings of the component sentences:

>>> dt = DiscourseTester(['Vincent is a boxer', 'Fido is a boxer', 'Vincent is married', 'Fido barks'])
>>> dt.readings()

s0 readings:
------------------------------
s0-r0: boxerdog(Vincent)
s0-r1: boxer(Vincent)

s1 readings:
------------------------------
s1-r0: boxerdog(Fido)
s1-r1: boxer(Fido)

s2 readings:
------------------------------
s2-r0: married(Vincent)

s3 readings:
------------------------------
s3-r0: bark(Fido)

This gives us a lot of threads:

>>> dt.readings(threaded=True)
d0: ['s0-r0', 's1-r0', 's2-r0', 's3-r0']
d1: ['s0-r0', 's1-r1', 's2-r0', 's3-r0']
d2: ['s0-r1', 's1-r0', 's2-r0', 's3-r0']
d3: ['s0-r1', 's1-r1', 's2-r0', 's3-r0']

We can eliminate some of the readings, and hence some of the threads, by adding background information.

>>> import nltk.data
>>> bg = nltk.data.load('grammars/background1.fol')
>>> dt.add_background(bg)
>>> dt.background()
all x.(boxerdog(x) -> dog(x))
all x.(boxer(x) -> person(x))
all x.-(dog(x) & person(x))
all x.(married(x) <-> exists y.marry(x,y))
all x.(bark(x) -> dog(x))
all x.all y.(marry(x,y) -> (person(x) & person(y)))
-(Vincent = Mia)
-(Vincent = Fido)
-(Mia = Fido)

The background information allows us to reject three of the threads as inconsistent. To see what remains, use the filter=True parameter on readings().

>>> dt.readings(filter=True)
d2: ['s0-r1', 's1-r0', 's2-r0', 's3-r0']

The models() method gives us more information about the surviving thread.

>>> dt.models()
--------------------------------------------------------------------------------
Model for Discourse Thread d0
--------------------------------------------------------------------------------
No model found!

--------------------------------------------------------------------------------
Model for Discourse Thread d1
--------------------------------------------------------------------------------
No model found!

--------------------------------------------------------------------------------
Model for Discourse Thread d2
--------------------------------------------------------------------------------
% number = 1
% seconds = 0

% Interpretation of size 3

Fido = 0.

Mia = 1.

Vincent = 2.

f1(0) = 0.
f1(1) = 0.
f1(2) = 2.

  bark(0).
- bark(1).
- bark(2).

- boxer(0).
- boxer(1).
  boxer(2).

  boxerdog(0).
- boxerdog(1).
- boxerdog(2).

  dog(0).
- dog(1).
- dog(2).

- married(0).
- married(1).
  married(2).

- person(0).
- person(1).
  person(2).

- marry(0,0).
- marry(0,1).
- marry(0,2).
- marry(1,0).
- marry(1,1).
- marry(1,2).
- marry(2,0).
- marry(2,1).
  marry(2,2).

--------------------------------------------------------------------------------
Model for Discourse Thread d3
--------------------------------------------------------------------------------
No model found!

Inconsistent discourse d0 ['s0-r0', 's1-r0', 's2-r0', 's3-r0']:
    s0-r0: boxerdog(Vincent)
    s1-r0: boxerdog(Fido)
    s2-r0: married(Vincent)
    s3-r0: bark(Fido)

Inconsistent discourse d1 ['s0-r0', 's1-r1', 's2-r0', 's3-r0']:
    s0-r0: boxerdog(Vincent)
    s1-r1: boxer(Fido)
    s2-r0: married(Vincent)
    s3-r0: bark(Fido)

Consistent discourse: d2 ['s0-r1', 's1-r0', 's2-r0', 's3-r0']:
    s0-r1: boxer(Vincent)
    s1-r0: boxerdog(Fido)
    s2-r0: married(Vincent)
    s3-r0: bark(Fido)

Inconsistent discourse d3 ['s0-r1', 's1-r1', 's2-r0', 's3-r0']:
    s0-r1: boxer(Vincent)
    s1-r1: boxer(Fido)
    s2-r0: married(Vincent)
    s3-r0: bark(Fido)

In order to play around with your own version of background knowledge, you might want to start off with a local copy of background1.fol:

>>> nltk.data.retrieve('grammars/background1.fol')
Retrieving 'grammars/background1.fol', saving to 'background1.fol'

After you have modified the file, the parse_fol() function will parse the strings in the file into expressions of nltk.logic.

>>> mybg = parse_fol(open('background1.fol').read())