1
2
3
4
5
6
7
8
9
10
11
12 """
13 Basic data classes for representing feature structures, and for
14 performing basic operations on those feature structures. A X{feature
15 structure} is a mapping from feature identifiers to feature values,
16 where each feature value is either a basic value (such as a string or
17 an integer), or a nested feature structure. There are two types of
18 feature structure, implemented by two subclasses of L{FeatStruct}:
19
20 - I{feature dictionaries}, implemented by L{FeatDict}, act like
21 Python dictionaries. Feature identifiers may be strings or
22 instances of the L{Feature} class.
23 - I{feature lists}, implemented by L{FeatList}, act like Python
24 lists. Feature identifiers are integers.
25
26 Feature structures are typically used to represent partial information
27 about objects. A feature identifier that is not mapped to a value
28 stands for a feature whose value is unknown (I{not} a feature without
29 a value). Two feature structures that represent (potentially
30 overlapping) information about the same object can be combined by
31 X{unification}. When two inconsistent feature structures are unified,
32 the unification fails and returns C{None}.
33
34 Features can be specified using X{feature paths}, or tuples of feature
35 identifiers that specify path through the nested feature structures to
36 a value. Feature structures may contain reentrant feature values. A
37 X{reentrant feature value} is a single feature value that can be
38 accessed via multiple feature paths. Unification preserves the
39 reentrance relations imposed by both of the unified feature
40 structures. In the feature structure resulting from unification, any
41 modifications to a reentrant feature value will be visible using any
42 of its feature paths.
43
44 Feature structure variables are encoded using the L{nltk.sem.Variable}
45 class. The variables' values are tracked using a X{bindings}
46 dictionary, which maps variables to their values. When two feature
47 structures are unified, a fresh bindings dictionary is created to
48 track their values; and before unification completes, all bound
49 variables are replaced by their values. Thus, the bindings
50 dictionaries are usually strictly internal to the unification process.
51 However, it is possible to track the bindings of variables if you
52 choose to, by supplying your own initial bindings dictionary to the
53 L{unify()} function.
54
55 When unbound variables are unified with one another, they become
56 X{aliased}. This is encoded by binding one variable to the other.
57
58 Lightweight Feature Structures
59 ==============================
60 Many of the functions defined by L{nltk.featstruct} can be applied
61 directly to simple Python dictionaries and lists, rather than to
62 full-fledged L{FeatDict} and L{FeatList} objects. In other words,
63 Python C{dicts} and C{lists} can be used as "light-weight" feature
64 structures.
65
66 >>> from nltk.featstruct import unify
67 >>> unify(dict(x=1, y=dict()), dict(a='a', y=dict(b='b')))
68 {'y': {'b': 'b'}, 'x': 1, 'a': 'a'}
69
70 However, you should keep in mind the following caveats:
71
72 - Python dictionaries & lists ignore reentrance when checking for
73 equality between values. But two FeatStructs with different
74 reentrances are considered nonequal, even if all their base
75 values are equal.
76
77 - FeatStructs can be easily frozen, allowing them to be used as
78 keys in hash tables. Python dictionaries and lists can not.
79
80 - FeatStructs display reentrance in their string representations;
81 Python dictionaries and lists do not.
82
83 - FeatStructs may *not* be mixed with Python dictionaries and lists
84 (e.g., when performing unification).
85
86 - FeatStructs provide a number of useful methods, such as L{walk()
87 <FeatStruct.walk>} and L{cyclic() <FeatStruct.cyclic>}, which are
88 not available for Python dicts & lists.
89
90 In general, if your feature structures will contain any reentrances,
91 or if you plan to use them as dictionary keys, it is strongly
92 recommended that you use full-fledged L{FeatStruct} objects.
93 """
94
95 import re, copy
96
97 from nltk.sem.logic import Variable, Expression, SubstituteBindingsI
98 from nltk.sem.logic import LogicParser, ParseException
99 import nltk.internals
100
101
102
103
104
106 """
107 A mapping from feature identifiers to feature values, where each
108 feature value is either a basic value (such as a string or an
109 integer), or a nested feature structure. There are two types of
110 feature structure:
111
112 - I{feature dictionaries}, implemented by L{FeatDict}, act like
113 Python dictionaries. Feature identifiers may be strings or
114 instances of the L{Feature} class.
115 - I{feature lists}, implemented by L{FeatList}, act like Python
116 lists. Feature identifiers are integers.
117
118 Feature structures may be indexed using either simple feature
119 identifiers or 'feature paths.' A X{feature path} is a sequence
120 of feature identifiers that stand for a corresponding sequence of
121 indexing operations. In particular, C{fstruct[(f1,f2,...,fn)]} is
122 equivalent to C{fstruct[f1][f2]...[fn]}.
123
124 Feature structures may contain reentrant feature structures. A
125 X{reentrant feature structure} is a single feature structure
126 object that can be accessed via multiple feature paths. Feature
127 structures may also be cyclic. A feature structure is X{cyclic}
128 if there is any feature path from the feature structure to itself.
129
130 Two feature structures are considered equal if they assign the
131 same values to all features, and have the same reentrances.
132
133 By default, feature structures are mutable. They may be made
134 immutable with the L{freeze()} function. Once they have been
135 frozen, they may be hashed, and thus used as dictionary keys.
136 """
137
138 _frozen = False
139 """@ivar: A flag indicating whether this feature structure is
140 frozen or not. Once this flag is set, it should never be
141 un-set; and no further modification should be made to this
142 feature structue."""
143
144
145
146
147
148 - def __new__(cls, features=None, **morefeatures):
149 """
150 Construct and return a new feature structure. If this
151 constructor is called directly, then the returned feature
152 structure will be an instance of either the L{FeatDict} class
153 or the L{FeatList} class.
154
155 @param features: The initial feature values for this feature
156 structure:
157 - FeatStruct(string) -> FeatStructParser().parse(string)
158 - FeatStruct(mapping) -> FeatDict(mapping)
159 - FeatStruct(sequence) -> FeatList(sequence)
160 - FeatStruct() -> FeatDict()
161 @param morefeatures: If C{features} is a mapping or C{None},
162 then C{morefeatures} provides additional features for the
163 C{FeatDict} constructor.
164 """
165
166
167
168 if cls is FeatStruct:
169 if features is None:
170 return FeatDict.__new__(FeatDict, **morefeatures)
171 elif _is_mapping(features):
172 return FeatDict.__new__(FeatDict, features, **morefeatures)
173 elif morefeatures:
174 raise TypeError('Keyword arguments may only be specified '
175 'if features is None or is a mapping.')
176 if isinstance(features, basestring):
177 if FeatStructParser._START_FDICT_RE.match(features):
178 return FeatDict.__new__(FeatDict, features, **morefeatures)
179 else:
180 return FeatList.__new__(FeatList, features, **morefeatures)
181 elif _is_sequence(features):
182 return FeatList.__new__(FeatList, features)
183 else:
184 raise TypeError('Expected string or mapping or sequence')
185
186
187 else:
188 return super(FeatStruct, cls).__new__(cls, features,
189 **morefeatures)
190
191
192
193
194
195
196
197
199 """Return an iterable of the feature identifiers used by this
200 FeatStruct."""
201 raise NotImplementedError()
202
204 """Return an iterable of the feature values directly defined
205 by this FeatStruct."""
206 raise NotImplementedError()
207
209 """Return an iterable of (fid,fval) pairs, where fid is a
210 feature identifier and fval is the corresponding feature
211 value, for all features defined by this FeatStruct."""
212 raise NotImplementedError()
213
214
215
216
217
219 """
220 @return: True if C{self} and C{other} assign the same value to
221 to every feature. In particular, return true if
222 C{self[M{p}]==other[M{p}]} for every feature path M{p} such
223 that C{self[M{p}]} or C{other[M{p}]} is a base value (i.e.,
224 not a nested feature structure).
225
226 @param check_reentrance: If true, then also return false if
227 there is any difference between the reentrances of C{self}
228 and C{other}.
229
230 @note: the L{== operator <__eq__>} is equivalent to
231 C{equal_values()} with C{check_reentrance=True}.
232 """
233 return self._equal(other, check_reentrance, set(), set(), set())
234
236 """
237 Return true if C{self} and C{other} are both feature
238 structures, assign the same values to all features, and
239 contain the same reentrances. I.e., return
240 C{self.equal_values(other, check_reentrance=True)}.
241
242 @see: L{equal_values()}
243 """
244 return self._equal(other, True, set(), set(), set())
245
247 """
248 Return true unless C{self} and C{other} are both feature
249 structures, assign the same values to all features, and
250 contain the same reentrances. I.e., return
251 C{not self.equal_values(other, check_reentrance=True)}.
252 """
253 return not self.__eq__(other)
254
256 """
257 If this feature structure is frozen, return its hash value;
258 otherwise, raise C{TypeError}.
259 """
260 if not self._frozen:
261 raise TypeError('FeatStructs must be frozen before they '
262 'can be hashed.')
263 try: return self.__hash
264 except AttributeError:
265 self.__hash = self._hash(set())
266 return self.__hash
267
268 - def _equal(self, other, check_reentrance, visited_self,
269 visited_other, visited_pairs):
270 """
271 @return: True iff self and other have equal values.
272 @param visited_self: A set containing the ids of all C{self}
273 feature structures we've already visited.
274 @param visited_other: A set containing the ids of all C{other}
275 feature structures we've already visited.
276 @param visited_pairs: A set containing C{(selfid, otherid)} pairs
277 for all pairs of feature structures we've already visited.
278 """
279
280 if self is other: return True
281
282
283 if self.__class__ != other.__class__: return False
284
285
286
287
288 if len(self) != len(other): return False
289 if set(self._keys()) != set(other._keys()): return False
290
291
292
293
294
295
296 if check_reentrance:
297 if id(self) in visited_self or id(other) in visited_other:
298 return (id(self), id(other)) in visited_pairs
299
300
301
302
303
304 else:
305 if (id(self), id(other)) in visited_pairs:
306 return True
307
308
309 visited_self.add(id(self))
310 visited_other.add(id(other))
311 visited_pairs.add( (id(self), id(other)) )
312
313
314
315 for (fname, self_fval) in self._items():
316 other_fval = other[fname]
317 if isinstance(self_fval, FeatStruct):
318 if not self_fval._equal(other_fval, check_reentrance,
319 visited_self, visited_other,
320 visited_pairs):
321 return False
322 else:
323 if self_fval != other_fval: return False
324
325
326 return True
327
328 - def _hash(self, visited):
329 """
330 @return: A hash value for this feature structure.
331 @require: C{self} must be frozen.
332 @param visited: A set containing the ids of all feature
333 structures we've already visited while hashing.
334 """
335 if id(self) in visited: return 1
336 visited.add(id(self))
337
338 hashval = 5831
339 for (fname, fval) in sorted(self._items()):
340 hashval *= 37
341 hashval += hash(fname)
342 hashval *= 37
343 if isinstance(fval, FeatStruct):
344 hashval += fval._hash(visited)
345 else:
346 hashval += hash(fval)
347
348 hashval = int(hashval & 0x7fffffff)
349 return hashval
350
351
352
353
354
355
356
357 _FROZEN_ERROR = "Frozen FeatStructs may not be modified."
358
360 """
361 Make this feature structure, and any feature structures it
362 contains, immutable. Note: this method does not attempt to
363 'freeze' any feature values that are not C{FeatStruct}s; it
364 is recommended that you use only immutable feature values.
365 """
366 if self._frozen: return
367 self._freeze(set())
368
370 """
371 @return: True if this feature structure is immutable. Feature
372 structures can be made immutable with the L{freeze()} method.
373 Immutable feature structures may not be made mutable again,
374 but new mutale copies can be produced with the L{copy()} method.
375 """
376 return self._frozen
377
379 """
380 Make this feature structure, and any feature structure it
381 contains, immutable.
382 @param visited: A set containing the ids of all feature
383 structures we've already visited while freezing.
384 """
385 if id(self) in visited: return
386 visited.add(id(self))
387 self._frozen = True
388 for (fname, fval) in sorted(self._items()):
389 if isinstance(fval, FeatStruct):
390 fval._freeze(visited)
391
392
393
394
395
396 - def copy(self, deep=True):
397 """
398 Return a new copy of C{self}. The new copy will not be
399 frozen.
400
401 @param deep: If true, create a deep copy; if false, create
402 a shallow copy.
403 """
404 if deep:
405 return copy.deepcopy(self)
406 else:
407 return self.__class__(self)
408
409
410
412 raise NotImplementedError()
413
414
415
416
417
419 """
420 @return: True if this feature structure contains itself.
421 """
422 return self._find_reentrances({})[id(self)]
423
425 """
426 @return: A list of all feature structures that can be reached
427 from C{self} by multiple feature paths.
428 @rtype: C{list} of L{FeatStruct}
429 """
430 reentrance_dict = self._find_reentrances({})
431 return [struct for (struct, reentrant) in reentrance_dict.items()
432 if reentrant]
433
435 """
436 Return an iterator that generates this feature structure, and
437 each feature structure it contains. Each feature structure will
438 be generated exactly once.
439 """
440 return self._walk(set())
441
442 - def _walk(self, visited):
443 """
444 Return an iterator that generates this feature structure, and
445 each feature structure it contains.
446 @param visited: A set containing the ids of all feature
447 structures we've already visited while freezing.
448 """
449 raise NotImplementedError()
450
451 - def _walk(self, visited):
452 if id(self) in visited: return
453 visited.add(id(self))
454 yield self
455 for fval in self._values():
456 if isinstance(fval, FeatStruct):
457 for elt in fval._walk(visited):
458 yield elt
459
460
461
462
482
483
484
485
486
490
494
496 """@see: L{nltk.featstruct.find_variables()}"""
497 return find_variables(self)
498
500 """@see: L{nltk.featstruct.rename_variables()}"""
501 return rename_variables(self, vars, used_vars, new_vars)
502
504 """
505 @rtype: L{FeatStruct}
506 @return: The feature structure that is obtained by deleting
507 all features whose values are L{Variable}s.
508 """
509 return remove_variables(self)
510
511
512
513
514
515 - def unify(self, other, bindings=None, trace=False,
516 fail=None, rename_vars=True):
518
520 """
521 @return: True if C{self} subsumes C{other}. I.e., return true
522 if unifying C{self} with C{other} would result in a feature
523 structure equal to C{other}.
524 """
525 return subsumes(self, other)
526
527
528
529
530
532 """
533 Display a single-line representation of this feature structure,
534 suitable for embedding in other representations.
535 """
536 return self._repr(self._find_reentrances({}), {})
537
538 - def _repr(self, reentrances, reentrance_ids):
539 """
540 @return: A string representation of this feature structure.
541 @param reentrances: A dictionary that maps from the C{id} of
542 each feature value in self, indicating whether that value
543 is reentrant or not.
544 @param reentrance_ids: A dictionary mapping from the C{id}s
545 of feature values to unique identifiers. This is modified
546 by C{repr}: the first time a reentrant feature value is
547 displayed, an identifier is added to reentrance_ids for
548 it.
549 """
550 raise NotImplementedError()
551
552
553 _FROZEN_ERROR = "Frozen FeatStructs may not be modified."
554 _FROZEN_NOTICE = "\n%sIf self is frozen, raise ValueError."
556 """
557 Given a method function, return a new method function that first
558 checks if C{self._frozen} is true; and if so, raises C{ValueError}
559 with an appropriate message. Otherwise, call the method and return
560 its result.
561 """
562 def wrapped(self, *args, **kwargs):
563 if self._frozen: raise ValueError(_FROZEN_ERROR)
564 else: return method(self, *args, **kwargs)
565 wrapped.__name__ = method.__name__
566 wrapped.__doc__ = (method.__doc__ or '') + (_FROZEN_NOTICE % indent)
567 return wrapped
568
569
570
571
572
573
575 """
576 A feature structure that acts like a Python dictionary. I.e., a
577 mapping from feature identifiers to feature values, where feature
578 identifiers can be strings or L{Feature}s; and feature values can
579 be either basic values (such as a string or an integer), or nested
580 feature structures. Feature identifiers for C{FeatDict}s are
581 sometimes called X{feature names}.
582
583 Two feature dicts are considered equal if they assign the same
584 values to all features, and have the same reentrances.
585
586 @see: L{FeatStruct} for information about feature paths, reentrance,
587 cyclic feature structures, mutability, freezing, and hashing.
588 """
589 - def __init__(self, features=None, **morefeatures):
590 """
591 Create a new feature dictionary, with the specified features.
592
593 @param features: The initial value for this feature
594 dictionary. If C{features} is a C{FeatStruct}, then its
595 features are copied (shallow copy). If C{features} is a
596 C{dict}, then a feature is created for each item, mapping its
597 key to its value. If C{features} is a string, then it is
598 parsed using L{FeatStructParser}. If C{features} is a list of
599 tuples C{name,val}, then a feature is created for each tuple.
600
601 @param morefeatures: Additional features for the new feature
602 dictionary. If a feature is listed under both C{features} and
603 C{morefeatures}, then the value from C{morefeatures} will be
604 used.
605 """
606 if isinstance(features, basestring):
607 FeatStructParser().parse(features, self)
608 self.update(**morefeatures)
609 else:
610
611 self.update(features, **morefeatures)
612
613
614
615
616 _INDEX_ERROR = "Expected feature name or path. Got %r."
617
619 """If the feature with the given name or path exists, return
620 its value; otherwise, raise C{KeyError}."""
621 if isinstance(name_or_path, (basestring, Feature)):
622 return dict.__getitem__(self, name_or_path)
623 elif isinstance(name_or_path, tuple):
624 try:
625 val = self
626 for fid in name_or_path:
627 if not isinstance(val, FeatStruct):
628 raise KeyError
629 val = val[fid]
630 return val
631 except (KeyError, IndexError):
632 raise KeyError(name_or_path)
633 else:
634 raise TypeError(self._INDEX_ERROR % name_or_path)
635
636 - def get(self, name_or_path, default=None):
637 """If the feature with the given name or path exists, return its
638 value; otherwise, return C{default}."""
639 try: return self[name_or_path]
640 except KeyError: return default
641
643 """Return true if a feature with the given name or path exists."""
644 try: self[name_or_path]; return True
645 except KeyError: return False
646
648 """Return true if a feature with the given name or path exists."""
649 return name_or_path in self
650
652 """If the feature with the given name or path exists, delete
653 its value; otherwise, raise C{KeyError}."""
654 if self._frozen: raise ValueError(_FROZEN_ERROR)
655 if isinstance(name_or_path, (basestring, Feature)):
656 return dict.__delitem__(self, name_or_path)
657 elif isinstance(name_or_path, tuple):
658 if len(name_or_path) == 0:
659 raise ValueError("The path () can not be set")
660 else:
661 parent = self[name_or_path[:-1]]
662 if not isinstance(parent, FeatStruct):
663 raise KeyError(name_or_path)
664 del parent[name_or_path[-1]]
665 else:
666 raise TypeError(self._INDEX_ERROR % name_or_path)
667
669 """Set the value for the feature with the given name or path
670 to C{value}. If C{name_or_path} is an invalid path, raise
671 C{KeyError}."""
672 if self._frozen: raise ValueError(_FROZEN_ERROR)
673 if isinstance(name_or_path, (basestring, Feature)):
674 return dict.__setitem__(self, name_or_path, value)
675 elif isinstance(name_or_path, tuple):
676 if len(name_or_path) == 0:
677 raise ValueError("The path () can not be set")
678 else:
679 parent = self[name_or_path[:-1]]
680 if not isinstance(parent, FeatStruct):
681 raise KeyError(name_or_path)
682 parent[name_or_path[-1]] = value
683 else:
684 raise TypeError(self._INDEX_ERROR % name_or_path)
685
686 clear = _check_frozen(dict.clear)
687 pop = _check_frozen(dict.pop)
688 popitem = _check_frozen(dict.popitem)
689 setdefault = _check_frozen(dict.setdefault)
690
691 - def update(self, features=None, **morefeatures):
692 if self._frozen: raise ValueError(_FROZEN_ERROR)
693 if features is None:
694 items = ()
695 elif hasattr(features, 'has_key'):
696 items = features.items()
697 elif hasattr(features, '__iter__'):
698 items = features
699 else:
700 raise ValueError('Expected mapping or list of tuples')
701
702 for key, val in items:
703 if not isinstance(key, (basestring, Feature)):
704 raise TypeError('Feature names must be strings')
705 self[key] = val
706 for key, val in morefeatures.items():
707 if not isinstance(key, (basestring, Feature)):
708 raise TypeError('Feature names must be strings')
709 self[key] = val
710
711
712
713
714
716 memo[id(self)] = selfcopy = self.__class__()
717 for (key, val) in self._items():
718 selfcopy[copy.deepcopy(key,memo)] = copy.deepcopy(val,memo)
719 return selfcopy
720
721
722
723
724
728
729
730
731
732
734 """
735 Display a multi-line representation of this feature dictionary
736 as an FVM (feature value matrix).
737 """
738 return '\n'.join(self._str(self._find_reentrances({}), {}))
739
740 - def _repr(self, reentrances, reentrance_ids):
741 segments = []
742 prefix = ''
743 suffix = ''
744
745
746
747 if reentrances[id(self)]:
748 assert not reentrance_ids.has_key(id(self))
749 reentrance_ids[id(self)] = `len(reentrance_ids)+1`
750
751
752
753 for (fname, fval) in sorted(self.items()):
754 display = getattr(fname, 'display', None)
755 if reentrance_ids.has_key(id(fval)):
756 segments.append('%s->(%s)' %
757 (fname, reentrance_ids[id(fval)]))
758 elif (display == 'prefix' and not prefix and
759 isinstance(fval, (Variable, basestring))):
760 prefix = '%s' % fval
761 elif display == 'slash' and not suffix:
762 if isinstance(fval, Variable):
763 suffix = '/%s' % fval.name
764 else:
765 suffix = '/%r' % fval
766 elif isinstance(fval, Variable):
767 segments.append('%s=%s' % (fname, fval.name))
768 elif fval is True:
769 segments.append('+%s' % fname)
770 elif fval is False:
771 segments.append('-%s' % fname)
772 elif isinstance(fval, Expression):
773 segments.append('%s=<%s>' % (fname, fval))
774 elif not isinstance(fval, FeatStruct):
775 segments.append('%s=%r' % (fname, fval))
776 else:
777 fval_repr = fval._repr(reentrances, reentrance_ids)
778 segments.append('%s=%s' % (fname, fval_repr))
779
780 if reentrances[id(self)]:
781 prefix = '(%s)%s' % (reentrance_ids[id(self)], prefix)
782 return '%s[%s]%s' % (prefix, ', '.join(segments), suffix)
783
784 - def _str(self, reentrances, reentrance_ids):
785 """
786 @return: A list of lines composing a string representation of
787 this feature dictionary.
788 @param reentrances: A dictionary that maps from the C{id} of
789 each feature value in self, indicating whether that value
790 is reentrant or not.
791 @param reentrance_ids: A dictionary mapping from the C{id}s
792 of feature values to unique identifiers. This is modified
793 by C{repr}: the first time a reentrant feature value is
794 displayed, an identifier is added to reentrance_ids for
795 it.
796 """
797
798
799 if reentrances[id(self)]:
800 assert not reentrance_ids.has_key(id(self))
801 reentrance_ids[id(self)] = `len(reentrance_ids)+1`
802
803
804 if len(self) == 0:
805 if reentrances[id(self)]:
806 return ['(%s) []' % reentrance_ids[id(self)]]
807 else:
808 return ['[]']
809
810
811 maxfnamelen = max(len(str(k)) for k in self.keys())
812
813 lines = []
814
815
816 for (fname, fval) in sorted(self.items()):
817 fname = str(fname).ljust(maxfnamelen)
818 if isinstance(fval, Variable):
819 lines.append('%s = %s' % (fname,fval.name))
820
821 elif isinstance(fval, Expression):
822 lines.append('%s = <%s>' % (fname, fval))
823
824 elif isinstance(fval, FeatList):
825 fval_repr = fval._repr(reentrances, reentrance_ids)
826 lines.append('%s = %r' % (fname, fval_repr))
827
828 elif not isinstance(fval, FeatDict):
829
830 lines.append('%s = %r' % (fname, fval))
831
832 elif reentrance_ids.has_key(id(fval)):
833
834
835 lines.append('%s -> (%s)' % (fname, reentrance_ids[id(fval)]))
836
837 else:
838
839
840 if lines and lines[-1] != '': lines.append('')
841
842
843 fval_lines = fval._str(reentrances, reentrance_ids)
844
845
846 fval_lines = [(' '*(maxfnamelen+3))+l for l in fval_lines]
847
848
849 nameline = (len(fval_lines)-1)/2
850 fval_lines[nameline] = (
851 fname+' ='+fval_lines[nameline][maxfnamelen+2:])
852
853
854 lines += fval_lines
855
856
857 lines.append('')
858
859
860 if lines[-1] == '': lines.pop()
861
862
863 maxlen = max(len(line) for line in lines)
864 lines = ['[ %s%s ]' % (line, ' '*(maxlen-len(line))) for line in lines]
865
866
867 if reentrances[id(self)]:
868 idstr = '(%s) ' % reentrance_ids[id(self)]
869 lines = [(' '*len(idstr))+l for l in lines]
870 idline = (len(lines)-1)/2
871 lines[idline] = idstr + lines[idline][len(idstr):]
872
873 return lines
874
875
876
877
878
879
881 """
882 A list of feature values, where each feature value is either a
883 basic value (such as a string or an integer), or a nested feature
884 structure.
885
886 Feature lists may contain reentrant feature values. A X{reentrant
887 feature value} is a single feature value that can be accessed via
888 multiple feature paths. Feature lists may also be cyclic.
889
890 Two feature lists are considered equal if they assign the same
891 values to all features, and have the same reentrances.
892
893 @see: L{FeatStruct} for information about feature paths, reentrance,
894 cyclic feature structures, mutability, freezing, and hashing.
895 """
897 """
898 Create a new feature list, with the specified features.
899
900 @param features: The initial list of features for this feature
901 list. If C{features} is a string, then it is paresd using
902 L{FeatStructParser}. Otherwise, it should be a sequence
903 of basic values and nested feature structures.
904 """
905 if isinstance(features, basestring):
906 FeatStructParser().parse(features, self)
907 else:
908 list.__init__(self, features)
909
910
911
912
913 _INDEX_ERROR = "Expected int or feature path. Got %r."
914
916 if isinstance(name_or_path, (int, long)):
917 return list.__getitem__(self, name_or_path)
918 elif isinstance(name_or_path, tuple):
919 try:
920 val = self
921 for fid in name_or_path:
922 if not isinstance(val, FeatStruct):
923 raise KeyError
924 val = val[fid]
925 return val
926 except (KeyError, IndexError):
927 raise KeyError(name_or_path)
928 else:
929 raise TypeError(self._INDEX_ERROR % name_or_path)
930
932 """If the feature with the given name or path exists, delete
933 its value; otherwise, raise C{KeyError}."""
934 if self._frozen: raise ValueError(_FROZEN_ERROR)
935 if isinstance(name_or_path, (int, long)):
936 return list.__delitem__(self, name_or_path)
937 elif isinstance(name_or_path, tuple):
938 if len(name_or_path) == 0:
939 raise ValueError("The path () can not be set")
940 else:
941 parent = self[name_or_path[:-1]]
942 if not isinstance(parent, FeatStruct):
943 raise KeyError(name_or_path)
944 del parent[name_or_path[-1]]
945 else:
946 raise TypeError(self._INDEX_ERROR % name_or_path)
947
949 """Set the value for the feature with the given name or path
950 to C{value}. If C{name_or_path} is an invalid path, raise
951 C{KeyError}."""
952 if self._frozen: raise ValueError(_FROZEN_ERROR)
953 if isinstance(name_or_path, (int, long)):
954 return list.__setitem__(self, name_or_path, value)
955 elif isinstance(name_or_path, tuple):
956 if len(name_or_path) == 0:
957 raise ValueError("The path () can not be set")
958 else:
959 parent = self[name_or_path[:-1]]
960 if not isinstance(parent, FeatStruct):
961 raise KeyError(name_or_path)
962 parent[name_or_path[-1]] = value
963 else:
964 raise TypeError(self._INDEX_ERROR % name_or_path)
965
966 __delslice__ = _check_frozen(list.__delslice__, ' ')
967 __setslice__ = _check_frozen(list.__setslice__, ' ')
968 __iadd__ = _check_frozen(list.__iadd__)
969 __imul__ = _check_frozen(list.__imul__)
970 append = _check_frozen(list.append)
971 extend = _check_frozen(list.extend)
972 insert = _check_frozen(list.insert)
973 pop = _check_frozen(list.pop)
974 remove = _check_frozen(list.remove)
975 reverse = _check_frozen(list.reverse)
976 sort = _check_frozen(list.sort)
977
978
979
980
981
983 memo[id(self)] = selfcopy = self.__class__()
984 selfcopy.extend([copy.deepcopy(fval,memo) for fval in self])
985 return selfcopy
986
987
988
989
990
991 - def _keys(self): return range(len(self))
993 - def _items(self): return enumerate(self)
994
995
996
997
998
999
1000 - def _repr(self, reentrances, reentrance_ids):
1001
1002
1003 if reentrances[id(self)]:
1004 assert not reentrance_ids.has_key(id(self))
1005 reentrance_ids[id(self)] = `len(reentrance_ids)+1`
1006 prefix = '(%s)' % reentrance_ids[id(self)]
1007 else:
1008 prefix = ''
1009
1010 segments = []
1011 for fval in self:
1012 if id(fval) in reentrance_ids:
1013 segments.append('->(%s)' % reentrance_ids[id(fval)])
1014 elif isinstance(fval, Variable):
1015 segments.append(fval.name)
1016 elif isinstance(fval, Expression):
1017 segments.append('%s' % fval)
1018 elif isinstance(fval, FeatStruct):
1019 segments.append(fval._repr(reentrances, reentrance_ids))
1020 else:
1021 segments.append('%r' % fval)
1022
1023 return '%s[%s]' % (prefix, ', '.join(segments))
1024
1025
1026
1027
1028
1030 """
1031 @return: The feature structure that is obtained by replacing each
1032 variable bound by C{bindings} with its binding. If a variable is
1033 aliased to a bound variable, then it will be replaced by that
1034 variable's value. If a variable is aliased to an unbound
1035 variable, then it will be replaced by that variable.
1036
1037 @type bindings: C{dict} with L{Variable} keys
1038 @param bindings: A dictionary mapping from variables to values.
1039 """
1040 if fs_class == 'default': fs_class = _default_fs_class(fstruct)
1041 fstruct = copy.deepcopy(fstruct)
1042 _substitute_bindings(fstruct, bindings, fs_class, set())
1043 return fstruct
1044
1060
1062 """
1063 @return: The feature structure that is obtained by replacing each
1064 feature structure value that is bound by C{bindings} with the
1065 variable that binds it. A feature structure value must be
1066 identical to a bound value (i.e., have equal id) to be replaced.
1067
1068 C{bindings} is modified to point to this new feature structure,
1069 rather than the original feature structure. Feature structure
1070 values in C{bindings} may be modified if they are contained in
1071 C{fstruct}.
1072 """
1073 if fs_class == 'default': fs_class = _default_fs_class(fstruct)
1074 (fstruct, new_bindings) = copy.deepcopy((fstruct, bindings))
1075 bindings.update(new_bindings)
1076 inv_bindings = dict((id(val),var) for (var,val) in bindings.items())
1077 _retract_bindings(fstruct, inv_bindings, fs_class, set())
1078 return fstruct
1079
1081
1082 if id(fstruct) in visited: return
1083 visited.add(id(fstruct))
1084
1085 if _is_mapping(fstruct): items = fstruct.items()
1086 elif _is_sequence(fstruct): items = enumerate(fstruct)
1087 else: raise ValueError('Expected mapping or sequence')
1088 for (fname, fval) in items:
1089 if isinstance(fval, fs_class):
1090 if id(fval) in inv_bindings:
1091 fstruct[fname] = inv_bindings[id(fval)]
1092 _retract_bindings(fval, inv_bindings, fs_class, visited)
1093
1094
1096 """
1097 @return: The set of variables used by this feature structure.
1098 @rtype: C{set} of L{Variable}
1099 """
1100 if fs_class == 'default': fs_class = _default_fs_class(fstruct)
1101 return _variables(fstruct, set(), fs_class, set())
1102
1103 -def _variables(fstruct, vars, fs_class, visited):
1104
1105 if id(fstruct) in visited: return
1106 visited.add(id(fstruct))
1107 if _is_mapping(fstruct): items = fstruct.items()
1108 elif _is_sequence(fstruct): items = enumerate(fstruct)
1109 else: raise ValueError('Expected mapping or sequence')
1110 for (fname, fval) in items:
1111 if isinstance(fval, Variable):
1112 vars.add(fval)
1113 elif isinstance(fval, fs_class):
1114 _variables(fval, vars, fs_class, visited)
1115 elif isinstance(fval, SubstituteBindingsI):
1116 vars.update(fval.variables())
1117 return vars
1118
1119 -def rename_variables(fstruct, vars=None, used_vars=(), new_vars=None,
1120 fs_class='default'):
1121 """
1122 @return: The feature structure that is obtained by replacing
1123 any of this feature structure's variables that are in C{vars}
1124 with new variables. The names for these new variables will be
1125 names that are not used by any variable in C{vars}, or in
1126 C{used_vars}, or in this feature structure.
1127
1128 @type vars: C{set}
1129 @param vars: The set of variables that should be renamed.
1130 If not specified, C{find_variables(fstruct)} is used; i.e., all
1131 variables will be given new names.
1132
1133 @type used_vars: C{set}
1134 @param used_vars: A set of variables whose names should not be
1135 used by the new variables.
1136
1137 @type new_vars: C{dict} from L{Variable} to L{Variable}
1138 @param new_vars: A dictionary that is used to hold the mapping
1139 from old variables to new variables. For each variable M{v}
1140 in this feature structure:
1141
1142 - If C{new_vars} maps M{v} to M{v'}, then M{v} will be
1143 replaced by M{v'}.
1144 - If C{new_vars} does not contain M{v}, but C{vars}
1145 does contain M{v}, then a new entry will be added to
1146 C{new_vars}, mapping M{v} to the new variable that is used
1147 to replace it.
1148
1149 To consistantly rename the variables in a set of feature
1150 structures, simply apply rename_variables to each one, using
1151 the same dictionary:
1152
1153 >>> new_vars = {} # Maps old vars to alpha-renamed vars
1154 >>> new_fstruct1 = fstruct1.rename_variables(new_vars=new_vars)
1155 >>> new_fstruct2 = fstruct2.rename_variables(new_vars=new_vars)
1156 >>> new_fstruct3 = fstruct3.rename_variables(new_vars=new_vars)
1157
1158 If new_vars is not specified, then an empty dictionary is used.
1159 """
1160 if fs_class == 'default': fs_class = _default_fs_class(fstruct)
1161
1162
1163 if new_vars is None: new_vars = {}
1164 if vars is None: vars = find_variables(fstruct, fs_class)
1165 else: vars = set(vars)
1166
1167
1168 used_vars = find_variables(fstruct, fs_class).union(used_vars)
1169
1170
1171 return _rename_variables(copy.deepcopy(fstruct), vars, used_vars,
1172 new_vars, fs_class, set())
1173
1175 if id(fstruct) in visited: return
1176 visited.add(id(fstruct))
1177 if _is_mapping(fstruct): items = fstruct.items()
1178 elif _is_sequence(fstruct): items = enumerate(fstruct)
1179 else: raise ValueError('Expected mapping or sequence')
1180 for (fname, fval) in items:
1181 if isinstance(fval, Variable):
1182
1183 if fval in new_vars:
1184 fstruct[fname] = new_vars[fval]
1185
1186 elif fval in vars:
1187 new_vars[fval] = _rename_variable(fval, used_vars)
1188 fstruct[fname] = new_vars[fval]
1189 used_vars.add(new_vars[fval])
1190 elif isinstance(fval, fs_class):
1191 _rename_variables(fval, vars, used_vars, new_vars,
1192 fs_class, visited)
1193 elif isinstance(fval, SubstituteBindingsI):
1194
1195 for var in fval.variables():
1196 if var in vars and var not in new_vars:
1197 new_vars[var] = _rename_variable(var, used_vars)
1198 used_vars.add(new_vars[var])
1199
1200 fstruct[fname] = fval.substitute_bindings(new_vars)
1201 return fstruct
1202
1208
1210 """
1211 @rtype: L{FeatStruct}
1212 @return: The feature structure that is obtained by deleting
1213 all features whose values are L{Variable}s.
1214 """
1215 if fs_class == 'default': fs_class = _default_fs_class(fstruct)
1216 return _remove_variables(copy.deepcopy(fstruct), fs_class, set())
1217
1219 if id(fstruct) in visited: return
1220 visited.add(id(fstruct))
1221 if _is_mapping(fstruct): items = fstruct.items()
1222 elif _is_sequence(fstruct): items = enumerate(fstruct)
1223 else: raise ValueError('Expected mapping or sequence')
1224 for (fname, fval) in items:
1225 if isinstance(fval, Variable):
1226 del fstruct[fname]
1227 elif isinstance(fval, fs_class):
1228 _remove_variables(fval, fs_class, visited)
1229 return fstruct
1230
1231
1232
1233
1234
1235
1237 - def __repr__(self): return 'nltk.featstruct.UnificationFailure'
1238 UnificationFailure = _UnificationFailure()
1239 """A unique value used to indicate unification failure. It can be
1240 returned by L{Feature.unify_base_values()} or by custom C{fail()}
1241 functions to indicate that unificaiton should fail."""
1242
1243
1244
1245
1246
1247
1248 -def unify(fstruct1, fstruct2, bindings=None, trace=False,
1249 fail=None, rename_vars=True, fs_class='default'):
1250 """
1251 Unify C{fstruct1} with C{fstruct2}, and return the resulting feature
1252 structure. This unified feature structure is the minimal
1253 feature structure that:
1254 - contains all feature value assignments from both C{fstruct1}
1255 and C{fstruct2}.
1256 - preserves all reentrance properties of C{fstruct1} and
1257 C{fstruct2}.
1258
1259 If no such feature structure exists (because C{fstruct1} and
1260 C{fstruct2} specify incompatible values for some feature), then
1261 unification fails, and C{unify} returns C{None}.
1262
1263 @type bindings: C{dict} with L{Variable} keys
1264 @param bindings: A set of variable bindings to be used and
1265 updated during unification.
1266
1267 Bound variables are replaced by their values. Aliased
1268 variables are replaced by their representative variable
1269 (if unbound) or the value of their representative variable
1270 (if bound). I.e., if variable C{I{v}} is in C{bindings},
1271 then C{I{v}} is replaced by C{bindings[I{v}]}. This will
1272 be repeated until the variable is replaced by an unbound
1273 variable or a non-variable value.
1274
1275 Unbound variables are bound when they are unified with
1276 values; and aliased when they are unified with variables.
1277 I.e., if variable C{I{v}} is not in C{bindings}, and is
1278 unified with a variable or value C{I{x}}, then
1279 C{bindings[I{v}]} is set to C{I{x}}.
1280
1281 If C{bindings} is unspecified, then all variables are
1282 assumed to be unbound. I.e., C{bindings} defaults to an
1283 empty C{dict}.
1284
1285 @type trace: C{bool}
1286 @param trace: If true, generate trace output.
1287
1288 @type rename_vars: C{bool}
1289 @param rename_vars: If true, then rename any variables in
1290 C{fstruct2} that are also used in C{fstruct1}. This prevents
1291 aliasing in cases where C{fstruct1} and C{fstruct2} use the
1292 same variable name. E.g.:
1293
1294 >>> FeatStruct('[a=?x]').unify(FeatStruct('[b=?x]'))
1295 [a=?x, b=?x2]
1296
1297 If you intend for a variables in C{fstruct1} and C{fstruct2} with
1298 the same name to be treated as a single variable, use
1299 C{rename_vars=False}.
1300 """
1301
1302
1303 if fs_class == 'default':
1304 fs_class = _default_fs_class(fstruct1)
1305 if _default_fs_class(fstruct2) != fs_class:
1306 raise ValueError("Mixing FeatStruct objects with Python "
1307 "dicts and lists is not supported.")
1308 assert isinstance(fstruct1, fs_class)
1309 assert isinstance(fstruct2, fs_class)
1310
1311
1312 user_bindings = (bindings is not None)
1313 if bindings is None: bindings = {}
1314
1315
1316
1317
1318
1319
1320 (fstruct1copy, fstruct2copy, bindings_copy) = (
1321 copy.deepcopy((fstruct1, fstruct2, bindings)))
1322
1323
1324 bindings.update(bindings_copy)
1325
1326 if rename_vars:
1327 vars1 = find_variables(fstruct1copy, fs_class)
1328 vars2 = find_variables(fstruct2copy, fs_class)
1329 _rename_variables(fstruct2copy, vars1, vars2, {}, fs_class, set())
1330
1331
1332 forward = {}
1333 if trace: _trace_unify_start((), fstruct1copy, fstruct2copy)
1334 try: result = _destructively_unify(fstruct1copy, fstruct2copy, bindings,
1335 forward, trace, fail, fs_class, ())
1336 except _UnificationFailureError: return None
1337
1338
1339
1340 if result is UnificationFailure:
1341 if fail is None: return None
1342 else: return fail(fstruct1copy, fstruct2copy, ())
1343
1344
1345
1346 result = _apply_forwards(result, forward, fs_class, set())
1347 if user_bindings: _apply_forwards_to_bindings(forward, bindings)
1348
1349
1350 _resolve_aliases(bindings)
1351 _substitute_bindings(result, bindings, fs_class, set())
1352
1353
1354 if trace: _trace_unify_succeed((), result)
1355 if trace: _trace_bindings((), bindings)
1356 return result
1357
1359 """An exception that is used by C{_destructively_unify} to abort
1360 unification when a failure is encountered."""
1361
1364 """
1365 Attempt to unify C{fstruct1} and C{fstruct2} by modifying them
1366 in-place. If the unification succeeds, then C{fstruct1} will
1367 contain the unified value, the value of C{fstruct2} is undefined,
1368 and forward[id(fstruct2)] is set to fstruct1. If the unification
1369 fails, then a _UnificationFailureError is raised, and the
1370 values of C{fstruct1} and C{fstruct2} are undefined.
1371
1372 @param bindings: A dictionary mapping variables to values.
1373 @param forward: A dictionary mapping feature structures ids
1374 to replacement structures. When two feature structures
1375 are merged, a mapping from one to the other will be added
1376 to the forward dictionary; and changes will be made only
1377 to the target of the forward dictionary.
1378 C{_destructively_unify} will always 'follow' any links
1379 in the forward dictionary for fstruct1 and fstruct2 before
1380 actually unifying them.
1381 @param trace: If true, generate trace output
1382 @param path: The feature path that led us to this unification
1383 step. Used for trace output.
1384 """
1385
1386
1387
1388 if fstruct1 is fstruct2:
1389 if trace: _trace_unify_identity(path, fstruct1)
1390 return fstruct1
1391
1392
1393
1394
1395
1396 forward[id(fstruct2)] = fstruct1
1397
1398
1399 if _is_mapping(fstruct1) and _is_mapping(fstruct2):
1400 for fname in fstruct1:
1401 if getattr(fname, 'default', None) is not None:
1402 fstruct2.setdefault(fname, fname.default)
1403 for fname in fstruct2:
1404 if getattr(fname, 'default', None) is not None:
1405 fstruct1.setdefault(fname, fname.default)
1406
1407
1408
1409
1410
1411
1412 for fname, fval2 in sorted(fstruct2.items()):
1413 if fname in fstruct1:
1414 fstruct1[fname] = _unify_feature_values(
1415 fname, fstruct1[fname], fval2, bindings,
1416 forward, trace, fail, fs_class, path+(fname,))
1417 else:
1418 fstruct1[fname] = fval2
1419
1420 return fstruct1
1421
1422
1423 elif _is_sequence(fstruct1) and _is_sequence(fstruct2):
1424
1425 if len(fstruct1) != len(fstruct2):
1426 return UnificationFailure
1427
1428
1429 for findex in range(len(fstruct1)):
1430 fstruct1[findex] = _unify_feature_values(
1431 findex, fstruct1[findex], fstruct2[findex], bindings,
1432 forward, trace, fail, fs_class, path+(findex,))
1433
1434 return fstruct1
1435
1436
1437
1438 elif ((_is_sequence(fstruct1) or _is_mapping(fstruct1)) and
1439 (_is_sequence(fstruct2) or _is_mapping(fstruct2))):
1440 return UnificationFailure
1441
1442
1443 raise TypeError('Expected mappings or sequences')
1444
1447 """
1448 Attempt to unify C{fval1} and and C{fval2}, and return the
1449 resulting unified value. The method of unification will depend on
1450 the types of C{fval1} and C{fval2}:
1451
1452 1. If they're both feature structures, then destructively
1453 unify them (see L{_destructively_unify()}.
1454 2. If they're both unbound variables, then alias one variable
1455 to the other (by setting bindings[v2]=v1).
1456 3. If one is an unbound variable, and the other is a value,
1457 then bind the unbound variable to the value.
1458 4. If one is a feature structure, and the other is a base value,
1459 then fail.
1460 5. If they're both base values, then unify them. By default,
1461 this will succeed if they are equal, and fail otherwise.
1462 """
1463 if trace: _trace_unify_start(fpath, fval1, fval2)
1464
1465
1466 while id(fval1) in forward: fval1 = forward[id(fval1)]
1467 while id(fval2) in forward: fval2 = forward[id(fval2)]
1468
1469
1470
1471
1472
1473 fvar1 = fvar2 = None
1474 while isinstance(fval1, Variable) and fval1 in bindings:
1475 fvar1 = fval1
1476 fval1 = bindings[fval1]
1477 while isinstance(fval2, Variable) and fval2 in bindings:
1478 fvar2 = fval2
1479 fval2 = bindings[fval2]
1480
1481
1482 if isinstance(fval1, fs_class) and isinstance(fval2, fs_class):
1483 result = _destructively_unify(fval1, fval2, bindings, forward,
1484 trace, fail, fs_class, fpath)
1485
1486
1487 elif (isinstance(fval1, Variable) and
1488 isinstance(fval2, Variable)):
1489 if fval1 != fval2: bindings[fval2] = fval1
1490 result = fval1
1491
1492
1493 elif isinstance(fval1, Variable):
1494 bindings[fval1] = fval2
1495 result = fval1
1496 elif isinstance(fval2, Variable):
1497 bindings[fval2] = fval1
1498 result = fval2
1499
1500
1501 elif isinstance(fval1, fs_class) or isinstance(fval2, fs_class):
1502 result = UnificationFailure
1503
1504
1505 else:
1506
1507 if isinstance(fname, Feature):
1508 result = fname.unify_base_values(fval1, fval2, bindings)
1509
1510 elif isinstance(fval1, CustomFeatureValue):
1511 result = fval1.unify(fval2)
1512
1513 if (isinstance(fval2, CustomFeatureValue) and
1514 result != fval2.unify(fval1)):
1515 raise AssertionError(
1516 'CustomFeatureValue objects %r and %r disagree '
1517 'about unification value: %r vs. %r' %
1518 (fval1, fval2, result, fval2.unify(fval1)))
1519 elif isinstance(fval2, CustomFeatureValue):
1520 result = fval2.unify(fval1)
1521
1522 else:
1523 if fval1 == fval2:
1524 result = fval1
1525 else:
1526 result = UnificationFailure
1527
1528
1529
1530
1531 if result is not UnificationFailure:
1532 if fvar1 is not None:
1533 bindings[fvar1] = result
1534 result = fvar1
1535 if fvar2 is not None:
1536 bindings[fvar2] = result
1537 result = fvar2
1538
1539
1540
1541 if result is UnificationFailure:
1542 if fail is not None: result = fail(fval1, fval2, fpath)
1543 if trace: _trace_unify_fail(fpath[:-1], result)
1544 if result is UnificationFailure:
1545 raise _UnificationFailureError
1546
1547
1548 if isinstance(result, fs_class):
1549 result = _apply_forwards(result, forward, fs_class, set())
1550
1551 if trace: _trace_unify_succeed(fpath, result)
1552 if trace and isinstance(result, fs_class):
1553 _trace_bindings(fpath, bindings)
1554
1555 return result
1556
1558 """
1559 Replace any feature structure that has a forward pointer with
1560 the target of its forward pointer (to preserve reentrancy).
1561 """
1562 for (var, value) in bindings.items():
1563 while id(value) in forward:
1564 value = forward[id(value)]
1565 bindings[var] = value
1566
1568 """
1569 Replace any feature structure that has a forward pointer with
1570 the target of its forward pointer (to preserve reentrancy).
1571 """
1572
1573 while id(fstruct) in forward: fstruct = forward[id(fstruct)]
1574
1575
1576 if id(fstruct) in visited: return
1577 visited.add(id(fstruct))
1578
1579 if _is_mapping(fstruct): items = fstruct.items()
1580 elif _is_sequence(fstruct): items = enumerate(fstruct)
1581 else: raise ValueError('Expected mapping or sequence')
1582 for fname, fval in items:
1583 if isinstance(fval, fs_class):
1584
1585 while id(fval) in forward:
1586 fval = forward[id(fval)]
1587 fstruct[fname] = fval
1588
1589 _apply_forwards(fval, forward, fs_class, visited)
1590
1591 return fstruct
1592
1594 """
1595 Replace any bound aliased vars with their binding; and replace
1596 any unbound aliased vars with their representative var.
1597 """
1598 for (var, value) in bindings.items():
1599 while isinstance(value, Variable) and value in bindings:
1600 value = bindings[var] = bindings[value]
1601
1603 if path == ():
1604 print '\nUnification trace:'
1605 else:
1606 fullname = '.'.join(str(n) for n in path)
1607 print ' '+'| '*(len(path)-1)+'|'
1608 print ' '+'| '*(len(path)-1)+'| Unify feature: %s' % fullname
1609 print ' '+'| '*len(path)+' / '+_trace_valrepr(fval1)
1610 print ' '+'| '*len(path)+'|\\ '+_trace_valrepr(fval2)
1612 print ' '+'| '*len(path)+'|'
1613 print ' '+'| '*len(path)+'| (identical objects)'
1614 print ' '+'| '*len(path)+'|'
1615 print ' '+'| '*len(path)+'+-->'+`fval1`
1617 if result is UnificationFailure: resume = ''
1618 else: resume = ' (nonfatal)'
1619 print ' '+'| '*len(path)+'| |'
1620 print ' '+'X '*len(path)+'X X <-- FAIL'+resume
1622
1623 print ' '+'| '*len(path)+'|'
1624 print ' '+'| '*len(path)+'+-->'+`fval1`
1634 if isinstance(val, Variable):
1635 return '%s' % val
1636 else:
1637 return '%r' % val
1638
1640 """
1641 @return: True if C{fstruct1} subsumes C{fstruct2}. I.e., return
1642 true if unifying C{fstruct1} with C{fstruct2} would result in a
1643 feature structure equal to C{fstruct2.}
1644 """
1645 return fstruct2 == unify(fstruct1, fstruct2)
1646
1648 """
1649 @return: A list of the feature paths of all features which are
1650 assigned incompatible values by C{fstruct1} and C{fstruct2}.
1651 @rtype: C{list} of C{tuple}
1652 """
1653 conflict_list = []
1654 def add_conflict(fval1, fval2, path):
1655 conflict_list.append(path)
1656 return fval1
1657 unify(fstruct1, fstruct2, fail=add_conflict, trace=trace)
1658 return conflict_list
1659
1660
1661
1662
1663
1665 return hasattr(v, 'has_key') and hasattr(v, 'items')
1666
1668 return (hasattr(v, '__iter__') and hasattr(v, '__len__') and
1669 not isinstance(v, basestring))
1670
1672 if isinstance(obj, FeatStruct): return FeatStruct
1673 if isinstance(obj, (dict, list)): return (dict, list)
1674 else:
1675 raise ValueError('To unify objects of type %s, you must specify '
1676 'fs_class explicitly.' % obj.__class__.__name__)
1677
1678
1679
1680
1682 """
1683 A mixin class for sequence clases that distributes variables() and
1684 substitute_bindings() over the object's elements.
1685 """
1690
1692 return self.__class__([self.subst(v, bindings) for v in self])
1693
1694 - def subst(self, v, bindings):
1699
1701 """
1702 A base feature value that is a tuple of other base feature values.
1703 FeatureValueTuple implements L{SubstituteBindingsI}, so it any
1704 variable substitutions will be propagated to the elements
1705 contained by the set. C{FeatureValueTuple}s are immutable.
1706 """
1708 if len(self) == 0: return '()'
1709 return '(%s)' % ', '.join('%s' % (b,) for b in self)
1710
1712 """
1713 A base feature value that is a set of other base feature values.
1714 FeatureValueSet implements L{SubstituteBindingsI}, so it any
1715 variable substitutions will be propagated to the elements
1716 contained by the set. C{FeatureValueSet}s are immutable.
1717 """
1719 if len(self) == 0: return '{/}'
1720
1721
1722 return '{%s}' % ', '.join(sorted('%s' % (b,) for b in self))
1723 __str__ = __repr__
1724
1726 """
1727 A base feature value that represents the union of two or more
1728 L{FeatureValueSet}s or L{Variable}s.
1729 """
1746
1748
1749
1750
1751 return '{%s}' % '+'.join(sorted('%s' % (b,) for b in self))
1752
1754 """
1755 A base feature value that represents the concatenation of two or
1756 more L{FeatureValueTuple}s or L{Variable}s.
1757 """
1774
1776
1777 return '(%s)' % '+'.join('%s' % (b,) for b in self)
1778
1780 """
1781 Helper function -- return a copy of list, with all elements of
1782 type C{cls} spliced in rather than appended in.
1783 """
1784 result = []
1785 for elt in lst:
1786 if isinstance(elt, cls): result.extend(elt)
1787 else: result.append(elt)
1788 return result
1789
1790
1791
1792
1793
1795 """
1796 A feature identifier that's specialized to put additional
1797 constraints, default values, etc.
1798 """
1799 - def __init__(self, name, default=None, display=None):
1800 assert display in (None, 'prefix', 'slash')
1801
1802 self._name = name
1803 """The name of this feature."""
1804
1805 self._default = default
1806 """Default value for this feature. Use None for unbound."""
1807
1808 self._display = display
1809 """Custom display location: can be prefix, or slash."""
1810
1811 if self._display == 'prefix':
1812 self._sortkey = (-1, self._name)
1813 elif self._display == 'slash':
1814 self._sortkey = (1, self._name)
1815 else:
1816 self._sortkey = (0, self._name)
1817
1818 name = property(lambda self: self._name)
1819 default = property(lambda self: self._default)
1820 display = property(lambda self: self._display)
1821
1823 return '*%s*' % self.name
1824
1826 if not isinstance(other, Feature): return -1
1827 if self._name == other._name: return 0
1828 return cmp(self._sortkey, other._sortkey)
1829
1831 return hash(self._name)
1832
1833
1834
1835
1836
1837 - def parse_value(self, s, position, reentrances, parser):
1839
1841 """
1842 If possible, return a single value.. If not, return
1843 the value L{UnificationFailure}.
1844 """
1845 if fval1 == fval2: return fval1
1846 else: return UnificationFailure
1847
1848
1850 - def parse_value(self, s, position, reentrances, parser):
1852
1854 RANGE_RE = re.compile('(-?\d+):(-?\d+)')
1855 - def parse_value(self, s, position, reentrances, parser):
1856 m = self.RANGE_RE.match(s, position)
1857 if not m: raise ValueError('range', position)
1858 return (int(m.group(1)), int(m.group(2))), m.end()
1859
1861 if fval1 is None: return fval2
1862 if fval2 is None: return fval1
1863 rng = max(fval1[0], fval2[0]), min(fval1[1], fval2[1])
1864 if rng[1] < rng[0]: return UnificationFailure
1865 return rng
1866
1867 SLASH = SlashFeature('slash', default=False, display='slash')
1868 TYPE = Feature('type', display='prefix')
1869
1870
1871
1872
1873
1875 """
1876 An abstract base class for base values that define a custom
1877 unification method. A C{CustomFeatureValue}'s custom unification
1878 method will be used during feature structure unification if:
1879
1880 - The C{CustomFeatureValue} is unified with another base value.
1881 - The C{CustomFeatureValue} is not the value of a customized
1882 L{Feature} (which defines its own unification method).
1883
1884 If two C{CustomFeatureValue} objects are unified with one another
1885 during feature structure unification, then the unified base values
1886 they return I{must} be equal; otherwise, an C{AssertionError} will
1887 be raised.
1888
1889 Subclasses must define L{unify()} and L{__cmp__()}. Subclasses
1890 may also wish to define L{__hash__()}.
1891 """
1892 - def unify(self, other):
1893 """
1894 If this base value unifies with C{other}, then return the
1895 unified value. Otherwise, return L{UnificationFailure}.
1896 """
1897 raise NotImplementedError('abstract base class')
1899 raise NotImplementedError('abstract base class')
1901 raise TypeError('%s objects or unhashable' % self.__class__.__name__)
1902
1903
1904
1905
1906
1910 self._features = dict((f.name,f) for f in features)
1911 self._fdict_class = fdict_class
1912 self._flist_class = flist_class
1913 self._prefix_feature = None
1914 self._slash_feature = None
1915 for feature in features:
1916 if feature.display == 'slash':
1917 if self._slash_feature:
1918 raise ValueError('Multiple features w/ display=slash')
1919 self._slash_feature = feature
1920 if feature.display == 'prefix':
1921 if self._prefix_feature:
1922 raise ValueError('Multiple features w/ display=prefix')
1923 self._prefix_feature = feature
1924 self._features_with_defaults = [feature for feature in features
1925 if feature.default is not None]
1926
1927 - def parse(self, s, fstruct=None):
1928 """
1929 Convert a string representation of a feature structure (as
1930 displayed by repr) into a C{FeatStruct}. This parse
1931 imposes the following restrictions on the string
1932 representation:
1933 - Feature names cannot contain any of the following:
1934 whitespace, parenthases, quote marks, equals signs,
1935 dashes, commas, and square brackets. Feature names may
1936 not begin with plus signs or minus signs.
1937 - Only the following basic feature value are supported:
1938 strings, integers, variables, C{None}, and unquoted
1939 alphanumeric strings.
1940 - For reentrant values, the first mention must specify
1941 a reentrance identifier and a value; and any subsequent
1942 mentions must use arrows (C{'->'}) to reference the
1943 reentrance identifier.
1944 """
1945 s = s.strip()
1946 value, position = self.partial_parse(s, 0, {}, fstruct)
1947 if position != len(s):
1948 self._error(s, 'end of string', position)
1949 return value
1950
1951 _START_FSTRUCT_RE = re.compile(r'\s*(?:\((\d+)\)\s*)?(\??[\w-]+)?(\[)')
1952 _END_FSTRUCT_RE = re.compile(r'\s*]\s*')
1953 _SLASH_RE = re.compile(r'/')
1954 _FEATURE_NAME_RE = re.compile(r'\s*([+-]?)([^\s\(\)<>"\'\-=\[\],]+)\s*')
1955 _REENTRANCE_RE = re.compile(r'\s*->\s*')
1956 _TARGET_RE = re.compile(r'\s*\((\d+)\)\s*')
1957 _ASSIGN_RE = re.compile(r'\s*=\s*')
1958 _COMMA_RE = re.compile(r'\s*,\s*')
1959 _BARE_PREFIX_RE = re.compile(r'\s*(?:\((\d+)\)\s*)?(\??[\w-]+\s*)()')
1960
1961 _START_FDICT_RE = re.compile(r'(%s)|(%s\s*(%s\s*(=|->)|[+-]%s|\]))' % (
1962 _BARE_PREFIX_RE.pattern, _START_FSTRUCT_RE.pattern,
1963 _FEATURE_NAME_RE.pattern, _FEATURE_NAME_RE.pattern))
1964
1965 - def partial_parse(self, s, position=0, reentrances=None, fstruct=None):
1966 """
1967 Helper function that parses a feature structure.
1968 @param s: The string to parse.
1969 @param position: The position in the string to start parsing.
1970 @param reentrances: A dictionary from reentrance ids to values.
1971 Defaults to an empty dictionary.
1972 @return: A tuple (val, pos) of the feature structure created
1973 by parsing and the position where the parsed feature
1974 structure ends.
1975 """
1976 if reentrances is None: reentrances = {}
1977 try:
1978 return self._partial_parse(s, position, reentrances, fstruct)
1979 except ValueError, e:
1980 if len(e.args) != 2: raise
1981 self._error(s, *e.args)
1982
1984
1985 if fstruct is None:
1986 if self._START_FDICT_RE.match(s, position):
1987 fstruct = self._fdict_class()
1988 else:
1989 fstruct = self._flist_class()
1990
1991
1992 match = self._START_FSTRUCT_RE.match(s, position)
1993 if not match:
1994 match = self._BARE_PREFIX_RE.match(s, position)
1995 if not match:
1996 raise ValueError('open bracket or identifier', position)
1997 position = match.end()
1998
1999
2000 if match.group(1):
2001 identifier = match.group(1)
2002 if identifier in reentrances:
2003 raise ValueError('new identifier', match.start(1))
2004 reentrances[identifier] = fstruct
2005
2006 if isinstance(fstruct, FeatDict):
2007 fstruct.clear()
2008 return self._partial_parse_featdict(s, position, match,
2009 reentrances, fstruct)
2010 else:
2011 del fstruct[:]
2012 return self._partial_parse_featlist(s, position, match,
2013 reentrances, fstruct)
2014
2017
2018 if match.group(2): raise ValueError('open bracket')
2019
2020 if not match.group(3): raise ValueError('open bracket')
2021
2022
2023 while position < len(s):
2024
2025 match = self._END_FSTRUCT_RE.match(s, position)
2026 if match is not None:
2027 return fstruct, match.end()
2028
2029
2030 match = self._REENTRANCE_RE.match(s, position)
2031 if match:
2032 position = match.end()
2033 match = _TARGET_RE.match(s, position)
2034 if not match: raise ValueError('identifier', position)
2035 target = match.group(1)
2036 if target not in reentrances:
2037 raise ValueError('bound identifier', position)
2038 position = match.end()
2039 fstruct.append(reentrances[target])
2040
2041
2042 else:
2043 value, position = (
2044 self._parse_value(0, s, position, reentrances))
2045 fstruct.append(value)
2046
2047
2048 if self._END_FSTRUCT_RE.match(s, position):
2049 continue
2050
2051
2052 match = self._COMMA_RE.match(s, position)
2053 if match is None: raise ValueError('comma', position)
2054 position = match.end()
2055
2056
2057 raise ValueError('close bracket', position)
2058
2061
2062 if match.group(2):
2063 if self._prefix_feature is None:
2064 raise ValueError('open bracket or identifier', match.start(2))
2065 prefixval = match.group(2).strip()
2066 if prefixval.startswith('?'):
2067 prefixval = Variable(prefixval)
2068 fstruct[self._prefix_feature] = prefixval
2069
2070
2071
2072 if not match.group(3):
2073 return self._finalize(s, match.end(), reentrances, fstruct)
2074
2075
2076
2077
2078
2079
2080
2081 while position < len(s):
2082
2083 name = value = None
2084
2085
2086 match = self._END_FSTRUCT_RE.match(s, position)
2087 if match is not None:
2088 return self._finalize(s, match.end(), reentrances, fstruct)
2089
2090
2091 match = self._FEATURE_NAME_RE.match(s, position)
2092 if match is None: raise ValueError('feature name', position)
2093 name = match.group(2)
2094 position = match.end()
2095
2096
2097 if name[0] == '*' and name[-1] == '*':
2098 name = self._features.get(name[1:-1])
2099 if name is None:
2100 raise ValueError('known special feature', match.start(2))
2101
2102
2103 if name in fstruct:
2104 raise ValueError('new name', match.start(2))
2105
2106
2107 if match.group(1) == '+': value = True
2108 if match.group(1) == '-': value = False
2109
2110
2111 if value is None:
2112 match = self._REENTRANCE_RE.match(s, position)
2113 if match is not None:
2114 position = match.end()
2115 match = self._TARGET_RE.match(s, position)
2116 if not match:
2117 raise ValueError('identifier', position)
2118 target = match.group(1)
2119 if target not in reentrances:
2120 raise ValueError('bound identifier', position)
2121 position = match.end()
2122 value = reentrances[target]
2123
2124
2125 if value is None:
2126 match = self._ASSIGN_RE.match(s, position)
2127 if match:
2128 position = match.end()
2129 value, position = (
2130 self._parse_value(name, s, position, reentrances))
2131
2132 else:
2133 raise ValueError('equals sign', position)
2134
2135
2136 fstruct[name] = value
2137
2138
2139 if self._END_FSTRUCT_RE.match(s, position):
2140 continue
2141
2142
2143 match = self._COMMA_RE.match(s, position)
2144 if match is None: raise ValueError('comma', position)
2145 position = match.end()
2146
2147
2148 raise ValueError('close bracket', position)
2149
2150 - def _finalize(self, s, pos, reentrances, fstruct):
2166
2172
2180
2181 - def _error(self, s, expected, position):
2182 lines = s.split('\n')
2183 while position > len(lines[0]):
2184 position -= len(lines.pop(0))+1
2185 estr = ('Error parsing feature structure\n ' +
2186 lines[0] + '\n ' + ' '*position + '^ ' +
2187 'Expected %s' % expected)
2188 raise ValueError, estr
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204 VALUE_HANDLERS = [
2205 ('parse_fstruct_value', _START_FSTRUCT_RE),
2206 ('parse_var_value', re.compile(r'\?[a-zA-Z_][a-zA-Z0-9_]*')),
2207 ('parse_str_value', re.compile("[uU]?[rR]?(['\"])")),
2208 ('parse_int_value', re.compile(r'-?\d+')),
2209 ('parse_sym_value', re.compile(r'[a-zA-Z_][a-zA-Z0-9_]*')),
2210 ('parse_app_value', re.compile(r'<(app)\((\?[a-z][a-z]*)\s*,'
2211 r'\s*(\?[a-z][a-z]*)\)>')),
2212 ('parse_logic_value', re.compile(r'<([^>]*)>')),
2213 ('parse_set_value', re.compile(r'{')),
2214 ('parse_tuple_value', re.compile(r'\(')),
2215 ]
2216
2219
2222
2225
2226
2229
2230 _SYM_CONSTS = {'None':None, 'True':True, 'False':False}
2234
2238
2249
2253
2257
2258 - def _parse_seq_value(self, s, position, reentrances, match,
2259 close_paren, seq_class, plus_class):
2260 """
2261 Helper function used by parse_tuple_value and parse_set_value.
2262 """
2263 cp = re.escape(close_paren)
2264 position = match.end()
2265
2266 m = re.compile(r'\s*/?\s*%s' % cp).match(s, position)
2267 if m: return seq_class(), m.end()
2268
2269 values = []
2270 seen_plus = False
2271 while True:
2272
2273 m = re.compile(r'\s*%s' % cp).match(s, position)
2274 if m:
2275 if seen_plus: return plus_class(values), m.end()
2276 else: return seq_class(values), m.end()
2277
2278
2279 val, position = self.parse_value(s, position, reentrances)
2280 values.append(val)
2281
2282
2283 m = re.compile(r'\s*(,|\+|(?=%s))\s*' % cp).match(s, position)
2284 if m.group(1) == '+': seen_plus = True
2285 if not m: raise ValueError("',' or '+' or '%s'" % cp, position)
2286 position = m.end()
2287
2288
2289
2290
2291
2293
2294 fs1_lines = str(fs1).split('\n')
2295 fs2_lines = str(fs2).split('\n')
2296 if len(fs1_lines) > len(fs2_lines):
2297 blankline = '['+' '*(len(fs2_lines[0])-2)+']'
2298 fs2_lines += [blankline]*len(fs1_lines)
2299 else:
2300 blankline = '['+' '*(len(fs1_lines[0])-2)+']'
2301 fs1_lines += [blankline]*len(fs2_lines)
2302 for (fs1_line, fs2_line) in zip(fs1_lines, fs2_lines):
2303 print indent + fs1_line + ' ' + fs2_line
2304 print indent+'-'*len(fs1_lines[0])+' '+'-'*len(fs2_lines[0])
2305
2306 linelen = len(fs1_lines[0])*2+3
2307 print indent+'| |'.center(linelen)
2308 print indent+'+-----UNIFY-----+'.center(linelen)
2309 print indent+'|'.center(linelen)
2310 print indent+'V'.center(linelen)
2311
2312 bindings = {}
2313
2314 result = fs1.unify(fs2, bindings)
2315 if result is None:
2316 print indent+'(FAILED)'.center(linelen)
2317 else:
2318 print '\n'.join(indent+l.center(linelen)
2319 for l in str(result).split('\n'))
2320 if bindings and len(bindings.bound_variables()) > 0:
2321 print repr(bindings).center(linelen)
2322 return result
2323
2325 import random, sys
2326
2327 HELP = '''
2328 1-%d: Select the corresponding feature structure
2329 q: Quit
2330 t: Turn tracing on or off
2331 l: List all feature structures
2332 ?: Help
2333 '''
2334
2335 print '''
2336 This demo will repeatedly present you with a list of feature
2337 structures, and ask you to choose two for unification. Whenever a
2338 new feature structure is generated, it is added to the list of
2339 choices that you can pick from. However, since this can be a
2340 large number of feature structures, the demo will only print out a
2341 random subset for you to choose between at a given time. If you
2342 want to see the complete lists, type "l". For a list of valid
2343 commands, type "?".
2344 '''
2345 print 'Press "Enter" to continue...'
2346 sys.stdin.readline()
2347
2348 fstruct_strings = [
2349 '[agr=[number=sing, gender=masc]]',
2350 '[agr=[gender=masc, person=3rd]]',
2351 '[agr=[gender=fem, person=3rd]]',
2352 '[subj=[agr=(1)[]], agr->(1)]',
2353 '[obj=?x]', '[subj=?x]',
2354 '[/=None]', '[/=NP]',
2355 '[cat=NP]', '[cat=VP]', '[cat=PP]',
2356 '[subj=[agr=[gender=?y]], obj=[agr=[gender=?y]]]',
2357 '[gender=masc, agr=?C]',
2358 '[gender=?S, agr=[gender=?S,person=3rd]]'
2359 ]
2360
2361 all_fstructs = [(i, FeatStruct.parse(fstruct_strings[i]))
2362 for i in range(len(fstruct_strings))]
2363
2364 def list_fstructs(fstructs):
2365 for i, fstruct in fstructs:
2366 print
2367 lines = str(fstruct).split('\n')
2368 print '%3d: %s' % (i+1, lines[0])
2369 for line in lines[1:]: print ' '+line
2370 print
2371
2372
2373 while 1:
2374
2375 MAX_CHOICES = 5
2376 if len(all_fstructs) > MAX_CHOICES:
2377 fstructs = random.sample(all_fstructs, MAX_CHOICES)
2378 fstructs.sort()
2379 else:
2380 fstructs = all_fstructs
2381
2382 print '_'*75
2383
2384 print 'Choose two feature structures to unify:'
2385 list_fstructs(fstructs)
2386
2387 selected = [None,None]
2388 for (nth,i) in (('First',0), ('Second',1)):
2389 while selected[i] is None:
2390 print ('%s feature structure (1-%d,q,t,l,?): '
2391 % (nth, len(all_fstructs))),
2392 try:
2393 input = sys.stdin.readline().strip()
2394 if input in ('q', 'Q', 'x', 'X'): return
2395 if input in ('t', 'T'):
2396 trace = not trace
2397 print ' Trace = %s' % trace
2398 continue
2399 if input in ('h', 'H', '?'):
2400 print HELP % len(fstructs); continue
2401 if input in ('l', 'L'):
2402 list_fstructs(all_fstructs); continue
2403 num = int(input)-1
2404 selected[i] = all_fstructs[num][1]
2405 print
2406 except:
2407 print 'Bad sentence number'
2408 continue
2409
2410 if trace:
2411 result = selected[0].unify(selected[1], trace=1)
2412 else:
2413 result = display_unification(selected[0], selected[1])
2414 if result is not None:
2415 for i, fstruct in all_fstructs:
2416 if `result` == `fstruct`: break
2417 else:
2418 all_fstructs.append((len(all_fstructs), result))
2419
2420 print '\nType "Enter" to continue unifying; or "q" to quit.'
2421 input = sys.stdin.readline().strip()
2422 if input in ('q', 'Q', 'x', 'X'): return
2423
2424 -def demo(trace=False):
2425 """
2426 Just for testing
2427 """
2428
2429
2430
2431 fstruct_strings = [
2432 '[agr=[number=sing, gender=masc]]',
2433 '[agr=[gender=masc, person=3]]',
2434 '[agr=[gender=fem, person=3]]',
2435 '[subj=[agr=(1)[]], agr->(1)]',
2436 '[obj=?x]', '[subj=?x]',
2437 '[/=None]', '[/=NP]',
2438 '[cat=NP]', '[cat=VP]', '[cat=PP]',
2439 '[subj=[agr=[gender=?y]], obj=[agr=[gender=?y]]]',
2440 '[gender=masc, agr=?C]',
2441 '[gender=?S, agr=[gender=?S,person=3]]'
2442 ]
2443 all_fstructs = [FeatStruct(fss) for fss in fstruct_strings]
2444
2445
2446
2447
2448
2449
2450
2451 for fs1 in all_fstructs:
2452 for fs2 in all_fstructs:
2453 print "\n*******************\nfs1 is:\n%s\n\nfs2 is:\n%s\n\nresult is:\n%s" % (fs1, fs2, unify(fs1, fs2))
2454
2455
2456 if __name__ == '__main__':
2457 demo()
2458