Package nltk :: Module featstruct
[hide private]
[frames] | no frames]

Source Code for Module nltk.featstruct

   1  # Natural Language Toolkit: Feature Structures 
   2  # 
   3  # Copyright (C) 2001-2008 NLTK Project 
   4  # Author: Edward Loper <[email protected]>, 
   5  #         Rob Speer, 
   6  #         Steven Bird <[email protected]> 
   7  # URL: <http://nltk.sourceforge.net> 
   8  # For license information, see LICENSE.TXT 
   9  # 
  10  # $Id$ 
  11   
  12  """ 
  13  Basic data classes for representing feature structures, and for 
  14  performing basic operations on those feature structures.  A X{feature 
  15  structure} is a mapping from feature identifiers to feature values, 
  16  where each feature value is either a basic value (such as a string or 
  17  an integer), or a nested feature structure.  There are two types of 
  18  feature structure, implemented by two subclasses of L{FeatStruct}: 
  19   
  20      - I{feature dictionaries}, implemented by L{FeatDict}, act like 
  21        Python dictionaries.  Feature identifiers may be strings or 
  22        instances of the L{Feature} class. 
  23      - I{feature lists}, implemented by L{FeatList}, act like Python 
  24        lists.  Feature identifiers are integers. 
  25         
  26  Feature structures are typically used to represent partial information 
  27  about objects.  A feature identifier that is not mapped to a value 
  28  stands for a feature whose value is unknown (I{not} a feature without 
  29  a value).  Two feature structures that represent (potentially 
  30  overlapping) information about the same object can be combined by 
  31  X{unification}.  When two inconsistent feature structures are unified, 
  32  the unification fails and returns C{None}. 
  33   
  34  Features can be specified using X{feature paths}, or tuples of feature 
  35  identifiers that specify path through the nested feature structures to 
  36  a value.  Feature structures may contain reentrant feature values.  A 
  37  X{reentrant feature value} is a single feature value that can be 
  38  accessed via multiple feature paths.  Unification preserves the 
  39  reentrance relations imposed by both of the unified feature 
  40  structures.  In the feature structure resulting from unification, any 
  41  modifications to a reentrant feature value will be visible using any 
  42  of its feature paths. 
  43   
  44  Feature structure variables are encoded using the L{nltk.sem.Variable} 
  45  class.  The variables' values are tracked using a X{bindings} 
  46  dictionary, which maps variables to their values.  When two feature 
  47  structures are unified, a fresh bindings dictionary is created to 
  48  track their values; and before unification completes, all bound 
  49  variables are replaced by their values.  Thus, the bindings 
  50  dictionaries are usually strictly internal to the unification process. 
  51  However, it is possible to track the bindings of variables if you 
  52  choose to, by supplying your own initial bindings dictionary to the 
  53  L{unify()} function. 
  54   
  55  When unbound variables are unified with one another, they become 
  56  X{aliased}.  This is encoded by binding one variable to the other. 
  57   
  58  Lightweight Feature Structures 
  59  ============================== 
  60  Many of the functions defined by L{nltk.featstruct} can be applied 
  61  directly to simple Python dictionaries and lists, rather than to 
  62  full-fledged L{FeatDict} and L{FeatList} objects.  In other words, 
  63  Python C{dicts} and C{lists} can be used as "light-weight" feature 
  64  structures. 
  65   
  66      >>> from nltk.featstruct import unify 
  67      >>> unify(dict(x=1, y=dict()), dict(a='a', y=dict(b='b'))) 
  68      {'y': {'b': 'b'}, 'x': 1, 'a': 'a'} 
  69   
  70  However, you should keep in mind the following caveats: 
  71   
  72    - Python dictionaries & lists ignore reentrance when checking for 
  73      equality between values.  But two FeatStructs with different 
  74      reentrances are considered nonequal, even if all their base 
  75      values are equal. 
  76     
  77    - FeatStructs can be easily frozen, allowing them to be used as 
  78      keys in hash tables.  Python dictionaries and lists can not. 
  79     
  80    - FeatStructs display reentrance in their string representations; 
  81      Python dictionaries and lists do not. 
  82     
  83    - FeatStructs may *not* be mixed with Python dictionaries and lists 
  84      (e.g., when performing unification). 
  85     
  86    - FeatStructs provide a number of useful methods, such as L{walk() 
  87      <FeatStruct.walk>} and L{cyclic() <FeatStruct.cyclic>}, which are 
  88      not available for Python dicts & lists. 
  89   
  90  In general, if your feature structures will contain any reentrances, 
  91  or if you plan to use them as dictionary keys, it is strongly 
  92  recommended that you use full-fledged L{FeatStruct} objects. 
  93  """ 
  94   
  95  import re, copy 
  96   
  97  from nltk.sem.logic import Variable, Expression, SubstituteBindingsI 
  98  from nltk.sem.logic import LogicParser, ParseException 
  99  import nltk.internals 
 100   
 101  ###################################################################### 
 102  # Feature Structure 
 103  ###################################################################### 
 104   
105 -class FeatStruct(SubstituteBindingsI):
106 """ 107 A mapping from feature identifiers to feature values, where each 108 feature value is either a basic value (such as a string or an 109 integer), or a nested feature structure. There are two types of 110 feature structure: 111 112 - I{feature dictionaries}, implemented by L{FeatDict}, act like 113 Python dictionaries. Feature identifiers may be strings or 114 instances of the L{Feature} class. 115 - I{feature lists}, implemented by L{FeatList}, act like Python 116 lists. Feature identifiers are integers. 117 118 Feature structures may be indexed using either simple feature 119 identifiers or 'feature paths.' A X{feature path} is a sequence 120 of feature identifiers that stand for a corresponding sequence of 121 indexing operations. In particular, C{fstruct[(f1,f2,...,fn)]} is 122 equivalent to C{fstruct[f1][f2]...[fn]}. 123 124 Feature structures may contain reentrant feature structures. A 125 X{reentrant feature structure} is a single feature structure 126 object that can be accessed via multiple feature paths. Feature 127 structures may also be cyclic. A feature structure is X{cyclic} 128 if there is any feature path from the feature structure to itself. 129 130 Two feature structures are considered equal if they assign the 131 same values to all features, and have the same reentrances. 132 133 By default, feature structures are mutable. They may be made 134 immutable with the L{freeze()} function. Once they have been 135 frozen, they may be hashed, and thus used as dictionary keys. 136 """ 137 138 _frozen = False 139 """@ivar: A flag indicating whether this feature structure is 140 frozen or not. Once this flag is set, it should never be 141 un-set; and no further modification should be made to this 142 feature structue.""" 143 144 ##//////////////////////////////////////////////////////////// 145 #{ Constructor 146 ##//////////////////////////////////////////////////////////// 147
148 - def __new__(cls, features=None, **morefeatures):
149 """ 150 Construct and return a new feature structure. If this 151 constructor is called directly, then the returned feature 152 structure will be an instance of either the L{FeatDict} class 153 or the L{FeatList} class. 154 155 @param features: The initial feature values for this feature 156 structure: 157 - FeatStruct(string) -> FeatStructParser().parse(string) 158 - FeatStruct(mapping) -> FeatDict(mapping) 159 - FeatStruct(sequence) -> FeatList(sequence) 160 - FeatStruct() -> FeatDict() 161 @param morefeatures: If C{features} is a mapping or C{None}, 162 then C{morefeatures} provides additional features for the 163 C{FeatDict} constructor. 164 """ 165 # If the FeatStruct constructor is called directly, then decide 166 # whether to create a FeatDict or a FeatList, based on the 167 # contents of the `features` argument. 168 if cls is FeatStruct: 169 if features is None: 170 return FeatDict.__new__(FeatDict, **morefeatures) 171 elif _is_mapping(features): 172 return FeatDict.__new__(FeatDict, features, **morefeatures) 173 elif morefeatures: 174 raise TypeError('Keyword arguments may only be specified ' 175 'if features is None or is a mapping.') 176 if isinstance(features, basestring): 177 if FeatStructParser._START_FDICT_RE.match(features): 178 return FeatDict.__new__(FeatDict, features, **morefeatures) 179 else: 180 return FeatList.__new__(FeatList, features, **morefeatures) 181 elif _is_sequence(features): 182 return FeatList.__new__(FeatList, features) 183 else: 184 raise TypeError('Expected string or mapping or sequence') 185 186 # Otherwise, construct the object as normal. 187 else: 188 return super(FeatStruct, cls).__new__(cls, features, 189 **morefeatures)
190 191 ##//////////////////////////////////////////////////////////// 192 #{ Uniform Accessor Methods 193 ##//////////////////////////////////////////////////////////// 194 # These helper functions allow the methods defined by FeatStruct 195 # to treat all feature structures as mappings, even if they're 196 # really lists. (Lists are treated as mappings from ints to vals) 197
198 - def _keys(self):
199 """Return an iterable of the feature identifiers used by this 200 FeatStruct.""" 201 raise NotImplementedError() # Implemented by subclasses.
202
203 - def _values(self):
204 """Return an iterable of the feature values directly defined 205 by this FeatStruct.""" 206 raise NotImplementedError() # Implemented by subclasses.
207
208 - def _items(self):
209 """Return an iterable of (fid,fval) pairs, where fid is a 210 feature identifier and fval is the corresponding feature 211 value, for all features defined by this FeatStruct.""" 212 raise NotImplementedError() # Implemented by subclasses.
213 214 ##//////////////////////////////////////////////////////////// 215 #{ Equality & Hashing 216 ##//////////////////////////////////////////////////////////// 217
218 - def equal_values(self, other, check_reentrance=False):
219 """ 220 @return: True if C{self} and C{other} assign the same value to 221 to every feature. In particular, return true if 222 C{self[M{p}]==other[M{p}]} for every feature path M{p} such 223 that C{self[M{p}]} or C{other[M{p}]} is a base value (i.e., 224 not a nested feature structure). 225 226 @param check_reentrance: If true, then also return false if 227 there is any difference between the reentrances of C{self} 228 and C{other}. 229 230 @note: the L{== operator <__eq__>} is equivalent to 231 C{equal_values()} with C{check_reentrance=True}. 232 """ 233 return self._equal(other, check_reentrance, set(), set(), set())
234
235 - def __eq__(self, other):
236 """ 237 Return true if C{self} and C{other} are both feature 238 structures, assign the same values to all features, and 239 contain the same reentrances. I.e., return 240 C{self.equal_values(other, check_reentrance=True)}. 241 242 @see: L{equal_values()} 243 """ 244 return self._equal(other, True, set(), set(), set())
245
246 - def __ne__(self, other):
247 """ 248 Return true unless C{self} and C{other} are both feature 249 structures, assign the same values to all features, and 250 contain the same reentrances. I.e., return 251 C{not self.equal_values(other, check_reentrance=True)}. 252 """ 253 return not self.__eq__(other)
254
255 - def __hash__(self):
256 """ 257 If this feature structure is frozen, return its hash value; 258 otherwise, raise C{TypeError}. 259 """ 260 if not self._frozen: 261 raise TypeError('FeatStructs must be frozen before they ' 262 'can be hashed.') 263 try: return self.__hash 264 except AttributeError: 265 self.__hash = self._hash(set()) 266 return self.__hash
267
268 - def _equal(self, other, check_reentrance, visited_self, 269 visited_other, visited_pairs):
270 """ 271 @return: True iff self and other have equal values. 272 @param visited_self: A set containing the ids of all C{self} 273 feature structures we've already visited. 274 @param visited_other: A set containing the ids of all C{other} 275 feature structures we've already visited. 276 @param visited_pairs: A set containing C{(selfid, otherid)} pairs 277 for all pairs of feature structures we've already visited. 278 """ 279 # If we're the same object, then we're equal. 280 if self is other: return True 281 282 # If we have different classes, we're definitely not equal. 283 if self.__class__ != other.__class__: return False 284 285 # If we define different features, we're definitely not equal. 286 # (Perform len test first because it's faster -- we should 287 # do profiling to see if this actually helps) 288 if len(self) != len(other): return False 289 if set(self._keys()) != set(other._keys()): return False 290 291 # If we're checking reentrance, then any time we revisit a 292 # structure, make sure that it was paired with the same 293 # feature structure that it is now. Note: if check_reentrance, 294 # then visited_pairs will never contain two pairs whose first 295 # values are equal, or two pairs whose second values are equal. 296 if check_reentrance: 297 if id(self) in visited_self or id(other) in visited_other: 298 return (id(self), id(other)) in visited_pairs 299 300 # If we're not checking reentrance, then we still need to deal 301 # with cycles. If we encounter the same (self, other) pair a 302 # second time, then we won't learn anything more by examining 303 # their children a second time, so just return true. 304 else: 305 if (id(self), id(other)) in visited_pairs: 306 return True 307 308 # Keep track of which nodes we've visited. 309 visited_self.add(id(self)) 310 visited_other.add(id(other)) 311 visited_pairs.add( (id(self), id(other)) ) 312 313 # Now we have to check all values. If any of them don't match, 314 # then return false. 315 for (fname, self_fval) in self._items(): 316 other_fval = other[fname] 317 if isinstance(self_fval, FeatStruct): 318 if not self_fval._equal(other_fval, check_reentrance, 319 visited_self, visited_other, 320 visited_pairs): 321 return False 322 else: 323 if self_fval != other_fval: return False 324 325 # Everything matched up; return true. 326 return True
327
328 - def _hash(self, visited):
329 """ 330 @return: A hash value for this feature structure. 331 @require: C{self} must be frozen. 332 @param visited: A set containing the ids of all feature 333 structures we've already visited while hashing. 334 """ 335 if id(self) in visited: return 1 336 visited.add(id(self)) 337 338 hashval = 5831 339 for (fname, fval) in sorted(self._items()): 340 hashval *= 37 341 hashval += hash(fname) 342 hashval *= 37 343 if isinstance(fval, FeatStruct): 344 hashval += fval._hash(visited) 345 else: 346 hashval += hash(fval) 347 # Convert to a 32 bit int. 348 hashval = int(hashval & 0x7fffffff) 349 return hashval
350 351 ##//////////////////////////////////////////////////////////// 352 #{ Freezing 353 ##//////////////////////////////////////////////////////////// 354 355 #: Error message used by mutating methods when called on a frozen 356 #: feature structure. 357 _FROZEN_ERROR = "Frozen FeatStructs may not be modified." 358
359 - def freeze(self):
360 """ 361 Make this feature structure, and any feature structures it 362 contains, immutable. Note: this method does not attempt to 363 'freeze' any feature values that are not C{FeatStruct}s; it 364 is recommended that you use only immutable feature values. 365 """ 366 if self._frozen: return 367 self._freeze(set())
368
369 - def frozen(self):
370 """ 371 @return: True if this feature structure is immutable. Feature 372 structures can be made immutable with the L{freeze()} method. 373 Immutable feature structures may not be made mutable again, 374 but new mutale copies can be produced with the L{copy()} method. 375 """ 376 return self._frozen
377
378 - def _freeze(self, visited):
379 """ 380 Make this feature structure, and any feature structure it 381 contains, immutable. 382 @param visited: A set containing the ids of all feature 383 structures we've already visited while freezing. 384 """ 385 if id(self) in visited: return 386 visited.add(id(self)) 387 self._frozen = True 388 for (fname, fval) in sorted(self._items()): 389 if isinstance(fval, FeatStruct): 390 fval._freeze(visited)
391 392 ##//////////////////////////////////////////////////////////// 393 #{ Copying 394 ##//////////////////////////////////////////////////////////// 395
396 - def copy(self, deep=True):
397 """ 398 Return a new copy of C{self}. The new copy will not be 399 frozen. 400 401 @param deep: If true, create a deep copy; if false, create 402 a shallow copy. 403 """ 404 if deep: 405 return copy.deepcopy(self) 406 else: 407 return self.__class__(self)
408 409 # Subclasses should define __deepcopy__ to ensure that the new 410 # copy will not be frozen.
411 - def __deepcopy__(self, memo):
412 raise NotImplementedError() # Implemented by subclasses.
413 414 ##//////////////////////////////////////////////////////////// 415 #{ Structural Information 416 ##//////////////////////////////////////////////////////////// 417
418 - def cyclic(self):
419 """ 420 @return: True if this feature structure contains itself. 421 """ 422 return self._find_reentrances({})[id(self)]
423
424 - def reentrances(self):
425 """ 426 @return: A list of all feature structures that can be reached 427 from C{self} by multiple feature paths. 428 @rtype: C{list} of L{FeatStruct} 429 """ 430 reentrance_dict = self._find_reentrances({}) 431 return [struct for (struct, reentrant) in reentrance_dict.items() 432 if reentrant]
433
434 - def walk(self):
435 """ 436 Return an iterator that generates this feature structure, and 437 each feature structure it contains. Each feature structure will 438 be generated exactly once. 439 """ 440 return self._walk(set())
441
442 - def _walk(self, visited):
443 """ 444 Return an iterator that generates this feature structure, and 445 each feature structure it contains. 446 @param visited: A set containing the ids of all feature 447 structures we've already visited while freezing. 448 """ 449 raise NotImplementedError() # Implemented by subclasses.
450
451 - def _walk(self, visited):
452 if id(self) in visited: return 453 visited.add(id(self)) 454 yield self 455 for fval in self._values(): 456 if isinstance(fval, FeatStruct): 457 for elt in fval._walk(visited): 458 yield elt
459 460 # Walk through the feature tree. The first time we see a feature 461 # value, map it to False (not reentrant). If we see a feature 462 # value more than once, then map it to True (reentrant).
463 - def _find_reentrances(self, reentrances):
464 """ 465 Return a dictionary that maps from the C{id} of each feature 466 structure contained in C{self} (including C{self}) to a 467 boolean value, indicating whether it is reentrant or not. 468 """ 469 if reentrances.has_key(id(self)): 470 # We've seen it more than once. 471 reentrances[id(self)] = True 472 else: 473 # This is the first time we've seen it. 474 reentrances[id(self)] = False 475 476 # Recurse to contained feature structures. 477 for fval in self._values(): 478 if isinstance(fval, FeatStruct): 479 fval._find_reentrances(reentrances) 480 481 return reentrances
482 483 ##//////////////////////////////////////////////////////////// 484 #{ Variables & Bindings 485 ##//////////////////////////////////////////////////////////// 486
487 - def substitute_bindings(self, bindings):
488 """@see: L{nltk.featstruct.substitute_bindings()}""" 489 return substitute_bindings(self, bindings)
490
491 - def retract_bindings(self, bindings):
492 """@see: L{nltk.featstruct.retract_bindings()}""" 493 return retract_bindings(self, bindings)
494
495 - def variables(self):
496 """@see: L{nltk.featstruct.find_variables()}""" 497 return find_variables(self)
498
499 - def rename_variables(self, vars=None, used_vars=(), new_vars=None):
500 """@see: L{nltk.featstruct.rename_variables()}""" 501 return rename_variables(self, vars, used_vars, new_vars)
502
503 - def remove_variables(self):
504 """ 505 @rtype: L{FeatStruct} 506 @return: The feature structure that is obtained by deleting 507 all features whose values are L{Variable}s. 508 """ 509 return remove_variables(self)
510 511 ##//////////////////////////////////////////////////////////// 512 #{ Unification 513 ##//////////////////////////////////////////////////////////// 514
515 - def unify(self, other, bindings=None, trace=False, 516 fail=None, rename_vars=True):
517 return unify(self, other, bindings, trace, fail, rename_vars)
518
519 - def subsumes(self, other):
520 """ 521 @return: True if C{self} subsumes C{other}. I.e., return true 522 if unifying C{self} with C{other} would result in a feature 523 structure equal to C{other}. 524 """ 525 return subsumes(self, other)
526 527 ##//////////////////////////////////////////////////////////// 528 #{ String Representations 529 ##//////////////////////////////////////////////////////////// 530
531 - def __repr__(self):
532 """ 533 Display a single-line representation of this feature structure, 534 suitable for embedding in other representations. 535 """ 536 return self._repr(self._find_reentrances({}), {})
537
538 - def _repr(self, reentrances, reentrance_ids):
539 """ 540 @return: A string representation of this feature structure. 541 @param reentrances: A dictionary that maps from the C{id} of 542 each feature value in self, indicating whether that value 543 is reentrant or not. 544 @param reentrance_ids: A dictionary mapping from the C{id}s 545 of feature values to unique identifiers. This is modified 546 by C{repr}: the first time a reentrant feature value is 547 displayed, an identifier is added to reentrance_ids for 548 it. 549 """ 550 raise NotImplementedError()
551 552 # Mutation: disable if frozen. 553 _FROZEN_ERROR = "Frozen FeatStructs may not be modified." 554 _FROZEN_NOTICE = "\n%sIf self is frozen, raise ValueError."
555 -def _check_frozen(method, indent=''):
556 """ 557 Given a method function, return a new method function that first 558 checks if C{self._frozen} is true; and if so, raises C{ValueError} 559 with an appropriate message. Otherwise, call the method and return 560 its result. 561 """ 562 def wrapped(self, *args, **kwargs): 563 if self._frozen: raise ValueError(_FROZEN_ERROR) 564 else: return method(self, *args, **kwargs)
565 wrapped.__name__ = method.__name__ 566 wrapped.__doc__ = (method.__doc__ or '') + (_FROZEN_NOTICE % indent) 567 return wrapped 568 569 570 ###################################################################### 571 # Feature Dictionary 572 ###################################################################### 573
574 -class FeatDict(FeatStruct, dict):
575 """ 576 A feature structure that acts like a Python dictionary. I.e., a 577 mapping from feature identifiers to feature values, where feature 578 identifiers can be strings or L{Feature}s; and feature values can 579 be either basic values (such as a string or an integer), or nested 580 feature structures. Feature identifiers for C{FeatDict}s are 581 sometimes called X{feature names}. 582 583 Two feature dicts are considered equal if they assign the same 584 values to all features, and have the same reentrances. 585 586 @see: L{FeatStruct} for information about feature paths, reentrance, 587 cyclic feature structures, mutability, freezing, and hashing. 588 """
589 - def __init__(self, features=None, **morefeatures):
590 """ 591 Create a new feature dictionary, with the specified features. 592 593 @param features: The initial value for this feature 594 dictionary. If C{features} is a C{FeatStruct}, then its 595 features are copied (shallow copy). If C{features} is a 596 C{dict}, then a feature is created for each item, mapping its 597 key to its value. If C{features} is a string, then it is 598 parsed using L{FeatStructParser}. If C{features} is a list of 599 tuples C{name,val}, then a feature is created for each tuple. 600 601 @param morefeatures: Additional features for the new feature 602 dictionary. If a feature is listed under both C{features} and 603 C{morefeatures}, then the value from C{morefeatures} will be 604 used. 605 """ 606 if isinstance(features, basestring): 607 FeatStructParser().parse(features, self) 608 self.update(**morefeatures) 609 else: 610 # update() checks the types of features. 611 self.update(features, **morefeatures)
612 613 #//////////////////////////////////////////////////////////// 614 #{ Dict methods 615 #//////////////////////////////////////////////////////////// 616 _INDEX_ERROR = "Expected feature name or path. Got %r." 617
618 - def __getitem__(self, name_or_path):
619 """If the feature with the given name or path exists, return 620 its value; otherwise, raise C{KeyError}.""" 621 if isinstance(name_or_path, (basestring, Feature)): 622 return dict.__getitem__(self, name_or_path) 623 elif isinstance(name_or_path, tuple): 624 try: 625 val = self 626 for fid in name_or_path: 627 if not isinstance(val, FeatStruct): 628 raise KeyError # path contains base value 629 val = val[fid] 630 return val 631 except (KeyError, IndexError): 632 raise KeyError(name_or_path) 633 else: 634 raise TypeError(self._INDEX_ERROR % name_or_path)
635
636 - def get(self, name_or_path, default=None):
637 """If the feature with the given name or path exists, return its 638 value; otherwise, return C{default}.""" 639 try: return self[name_or_path] 640 except KeyError: return default
641
642 - def __contains__(self, name_or_path):
643 """Return true if a feature with the given name or path exists.""" 644 try: self[name_or_path]; return True 645 except KeyError: return False
646
647 - def has_key(self, name_or_path):
648 """Return true if a feature with the given name or path exists.""" 649 return name_or_path in self
650
651 - def __delitem__(self, name_or_path):
652 """If the feature with the given name or path exists, delete 653 its value; otherwise, raise C{KeyError}.""" 654 if self._frozen: raise ValueError(_FROZEN_ERROR) 655 if isinstance(name_or_path, (basestring, Feature)): 656 return dict.__delitem__(self, name_or_path) 657 elif isinstance(name_or_path, tuple): 658 if len(name_or_path) == 0: 659 raise ValueError("The path () can not be set") 660 else: 661 parent = self[name_or_path[:-1]] 662 if not isinstance(parent, FeatStruct): 663 raise KeyError(name_or_path) # path contains base value 664 del parent[name_or_path[-1]] 665 else: 666 raise TypeError(self._INDEX_ERROR % name_or_path)
667
668 - def __setitem__(self, name_or_path, value):
669 """Set the value for the feature with the given name or path 670 to C{value}. If C{name_or_path} is an invalid path, raise 671 C{KeyError}.""" 672 if self._frozen: raise ValueError(_FROZEN_ERROR) 673 if isinstance(name_or_path, (basestring, Feature)): 674 return dict.__setitem__(self, name_or_path, value) 675 elif isinstance(name_or_path, tuple): 676 if len(name_or_path) == 0: 677 raise ValueError("The path () can not be set") 678 else: 679 parent = self[name_or_path[:-1]] 680 if not isinstance(parent, FeatStruct): 681 raise KeyError(name_or_path) # path contains base value 682 parent[name_or_path[-1]] = value 683 else: 684 raise TypeError(self._INDEX_ERROR % name_or_path)
685 686 clear = _check_frozen(dict.clear) 687 pop = _check_frozen(dict.pop) 688 popitem = _check_frozen(dict.popitem) 689 setdefault = _check_frozen(dict.setdefault) 690
691 - def update(self, features=None, **morefeatures):
692 if self._frozen: raise ValueError(_FROZEN_ERROR) 693 if features is None: 694 items = () 695 elif hasattr(features, 'has_key'): 696 items = features.items() 697 elif hasattr(features, '__iter__'): 698 items = features 699 else: 700 raise ValueError('Expected mapping or list of tuples') 701 702 for key, val in items: 703 if not isinstance(key, (basestring, Feature)): 704 raise TypeError('Feature names must be strings') 705 self[key] = val 706 for key, val in morefeatures.items(): 707 if not isinstance(key, (basestring, Feature)): 708 raise TypeError('Feature names must be strings') 709 self[key] = val
710 711 ##//////////////////////////////////////////////////////////// 712 #{ Copying 713 ##//////////////////////////////////////////////////////////// 714
715 - def __deepcopy__(self, memo):
716 memo[id(self)] = selfcopy = self.__class__() 717 for (key, val) in self._items(): 718 selfcopy[copy.deepcopy(key,memo)] = copy.deepcopy(val,memo) 719 return selfcopy
720 721 ##//////////////////////////////////////////////////////////// 722 #{ Uniform Accessor Methods 723 ##//////////////////////////////////////////////////////////// 724
725 - def _keys(self): return self.keys()
726 - def _values(self): return self.values()
727 - def _items(self): return self.items()
728 729 ##//////////////////////////////////////////////////////////// 730 #{ String Representations 731 ##//////////////////////////////////////////////////////////// 732
733 - def __str__(self):
734 """ 735 Display a multi-line representation of this feature dictionary 736 as an FVM (feature value matrix). 737 """ 738 return '\n'.join(self._str(self._find_reentrances({}), {}))
739
740 - def _repr(self, reentrances, reentrance_ids):
741 segments = [] 742 prefix = '' 743 suffix = '' 744 745 # If this is the first time we've seen a reentrant structure, 746 # then assign it a unique identifier. 747 if reentrances[id(self)]: 748 assert not reentrance_ids.has_key(id(self)) 749 reentrance_ids[id(self)] = `len(reentrance_ids)+1` 750 751 # sorting note: keys are unique strings, so we'll never fall 752 # through to comparing values. 753 for (fname, fval) in sorted(self.items()): 754 display = getattr(fname, 'display', None) 755 if reentrance_ids.has_key(id(fval)): 756 segments.append('%s->(%s)' % 757 (fname, reentrance_ids[id(fval)])) 758 elif (display == 'prefix' and not prefix and 759 isinstance(fval, (Variable, basestring))): 760 prefix = '%s' % fval 761 elif display == 'slash' and not suffix: 762 if isinstance(fval, Variable): 763 suffix = '/%s' % fval.name 764 else: 765 suffix = '/%r' % fval 766 elif isinstance(fval, Variable): 767 segments.append('%s=%s' % (fname, fval.name)) 768 elif fval is True: 769 segments.append('+%s' % fname) 770 elif fval is False: 771 segments.append('-%s' % fname) 772 elif isinstance(fval, Expression): 773 segments.append('%s=<%s>' % (fname, fval)) 774 elif not isinstance(fval, FeatStruct): 775 segments.append('%s=%r' % (fname, fval)) 776 else: 777 fval_repr = fval._repr(reentrances, reentrance_ids) 778 segments.append('%s=%s' % (fname, fval_repr)) 779 # If it's reentrant, then add on an identifier tag. 780 if reentrances[id(self)]: 781 prefix = '(%s)%s' % (reentrance_ids[id(self)], prefix) 782 return '%s[%s]%s' % (prefix, ', '.join(segments), suffix)
783
784 - def _str(self, reentrances, reentrance_ids):
785 """ 786 @return: A list of lines composing a string representation of 787 this feature dictionary. 788 @param reentrances: A dictionary that maps from the C{id} of 789 each feature value in self, indicating whether that value 790 is reentrant or not. 791 @param reentrance_ids: A dictionary mapping from the C{id}s 792 of feature values to unique identifiers. This is modified 793 by C{repr}: the first time a reentrant feature value is 794 displayed, an identifier is added to reentrance_ids for 795 it. 796 """ 797 # If this is the first time we've seen a reentrant structure, 798 # then tack on an id string. 799 if reentrances[id(self)]: 800 assert not reentrance_ids.has_key(id(self)) 801 reentrance_ids[id(self)] = `len(reentrance_ids)+1` 802 803 # Special case: empty feature dict. 804 if len(self) == 0: 805 if reentrances[id(self)]: 806 return ['(%s) []' % reentrance_ids[id(self)]] 807 else: 808 return ['[]'] 809 810 # What's the longest feature name? Use this to align names. 811 maxfnamelen = max(len(str(k)) for k in self.keys()) 812 813 lines = [] 814 # sorting note: keys are unique strings, so we'll never fall 815 # through to comparing values. 816 for (fname, fval) in sorted(self.items()): 817 fname = str(fname).ljust(maxfnamelen) 818 if isinstance(fval, Variable): 819 lines.append('%s = %s' % (fname,fval.name)) 820 821 elif isinstance(fval, Expression): 822 lines.append('%s = <%s>' % (fname, fval)) 823 824 elif isinstance(fval, FeatList): 825 fval_repr = fval._repr(reentrances, reentrance_ids) 826 lines.append('%s = %r' % (fname, fval_repr)) 827 828 elif not isinstance(fval, FeatDict): 829 # It's not a nested feature structure -- just print it. 830 lines.append('%s = %r' % (fname, fval)) 831 832 elif reentrance_ids.has_key(id(fval)): 833 # It's a feature structure we've seen before -- print 834 # the reentrance id. 835 lines.append('%s -> (%s)' % (fname, reentrance_ids[id(fval)])) 836 837 else: 838 # It's a new feature structure. Separate it from 839 # other values by a blank line. 840 if lines and lines[-1] != '': lines.append('') 841 842 # Recursively print the feature's value (fval). 843 fval_lines = fval._str(reentrances, reentrance_ids) 844 845 # Indent each line to make room for fname. 846 fval_lines = [(' '*(maxfnamelen+3))+l for l in fval_lines] 847 848 # Pick which line we'll display fname on, & splice it in. 849 nameline = (len(fval_lines)-1)/2 850 fval_lines[nameline] = ( 851 fname+' ='+fval_lines[nameline][maxfnamelen+2:]) 852 853 # Add the feature structure to the output. 854 lines += fval_lines 855 856 # Separate FeatStructs by a blank line. 857 lines.append('') 858 859 # Get rid of any excess blank lines. 860 if lines[-1] == '': lines.pop() 861 862 # Add brackets around everything. 863 maxlen = max(len(line) for line in lines) 864 lines = ['[ %s%s ]' % (line, ' '*(maxlen-len(line))) for line in lines] 865 866 # If it's reentrant, then add on an identifier tag. 867 if reentrances[id(self)]: 868 idstr = '(%s) ' % reentrance_ids[id(self)] 869 lines = [(' '*len(idstr))+l for l in lines] 870 idline = (len(lines)-1)/2 871 lines[idline] = idstr + lines[idline][len(idstr):] 872 873 return lines
874 875 876 ###################################################################### 877 # Feature List 878 ###################################################################### 879
880 -class FeatList(FeatStruct, list):
881 """ 882 A list of feature values, where each feature value is either a 883 basic value (such as a string or an integer), or a nested feature 884 structure. 885 886 Feature lists may contain reentrant feature values. A X{reentrant 887 feature value} is a single feature value that can be accessed via 888 multiple feature paths. Feature lists may also be cyclic. 889 890 Two feature lists are considered equal if they assign the same 891 values to all features, and have the same reentrances. 892 893 @see: L{FeatStruct} for information about feature paths, reentrance, 894 cyclic feature structures, mutability, freezing, and hashing. 895 """
896 - def __init__(self, features=()):
897 """ 898 Create a new feature list, with the specified features. 899 900 @param features: The initial list of features for this feature 901 list. If C{features} is a string, then it is paresd using 902 L{FeatStructParser}. Otherwise, it should be a sequence 903 of basic values and nested feature structures. 904 """ 905 if isinstance(features, basestring): 906 FeatStructParser().parse(features, self) 907 else: 908 list.__init__(self, features)
909 910 #//////////////////////////////////////////////////////////// 911 #{ List methods 912 #//////////////////////////////////////////////////////////// 913 _INDEX_ERROR = "Expected int or feature path. Got %r." 914
915 - def __getitem__(self, name_or_path):
916 if isinstance(name_or_path, (int, long)): 917 return list.__getitem__(self, name_or_path) 918 elif isinstance(name_or_path, tuple): 919 try: 920 val = self 921 for fid in name_or_path: 922 if not isinstance(val, FeatStruct): 923 raise KeyError # path contains base value 924 val = val[fid] 925 return val 926 except (KeyError, IndexError): 927 raise KeyError(name_or_path) 928 else: 929 raise TypeError(self._INDEX_ERROR % name_or_path)
930
931 - def __delitem__(self, name_or_path):
932 """If the feature with the given name or path exists, delete 933 its value; otherwise, raise C{KeyError}.""" 934 if self._frozen: raise ValueError(_FROZEN_ERROR) 935 if isinstance(name_or_path, (int, long)): 936 return list.__delitem__(self, name_or_path) 937 elif isinstance(name_or_path, tuple): 938 if len(name_or_path) == 0: 939 raise ValueError("The path () can not be set") 940 else: 941 parent = self[name_or_path[:-1]] 942 if not isinstance(parent, FeatStruct): 943 raise KeyError(name_or_path) # path contains base value 944 del parent[name_or_path[-1]] 945 else: 946 raise TypeError(self._INDEX_ERROR % name_or_path)
947
948 - def __setitem__(self, name_or_path, value):
949 """Set the value for the feature with the given name or path 950 to C{value}. If C{name_or_path} is an invalid path, raise 951 C{KeyError}.""" 952 if self._frozen: raise ValueError(_FROZEN_ERROR) 953 if isinstance(name_or_path, (int, long)): 954 return list.__setitem__(self, name_or_path, value) 955 elif isinstance(name_or_path, tuple): 956 if len(name_or_path) == 0: 957 raise ValueError("The path () can not be set") 958 else: 959 parent = self[name_or_path[:-1]] 960 if not isinstance(parent, FeatStruct): 961 raise KeyError(name_or_path) # path contains base value 962 parent[name_or_path[-1]] = value 963 else: 964 raise TypeError(self._INDEX_ERROR % name_or_path)
965 966 __delslice__ = _check_frozen(list.__delslice__, ' ') 967 __setslice__ = _check_frozen(list.__setslice__, ' ') 968 __iadd__ = _check_frozen(list.__iadd__) 969 __imul__ = _check_frozen(list.__imul__) 970 append = _check_frozen(list.append) 971 extend = _check_frozen(list.extend) 972 insert = _check_frozen(list.insert) 973 pop = _check_frozen(list.pop) 974 remove = _check_frozen(list.remove) 975 reverse = _check_frozen(list.reverse) 976 sort = _check_frozen(list.sort) 977 978 ##//////////////////////////////////////////////////////////// 979 #{ Copying 980 ##//////////////////////////////////////////////////////////// 981
982 - def __deepcopy__(self, memo):
983 memo[id(self)] = selfcopy = self.__class__() 984 selfcopy.extend([copy.deepcopy(fval,memo) for fval in self]) 985 return selfcopy
986 987 ##//////////////////////////////////////////////////////////// 988 #{ Uniform Accessor Methods 989 ##//////////////////////////////////////////////////////////// 990
991 - def _keys(self): return range(len(self))
992 - def _values(self): return self
993 - def _items(self): return enumerate(self)
994 995 ##//////////////////////////////////////////////////////////// 996 #{ String Representations 997 ##//////////////////////////////////////////////////////////// 998 999 # Special handling for: reentrances, variables, expressions.
1000 - def _repr(self, reentrances, reentrance_ids):
1001 # If this is the first time we've seen a reentrant structure, 1002 # then assign it a unique identifier. 1003 if reentrances[id(self)]: 1004 assert not reentrance_ids.has_key(id(self)) 1005 reentrance_ids[id(self)] = `len(reentrance_ids)+1` 1006 prefix = '(%s)' % reentrance_ids[id(self)] 1007 else: 1008 prefix = '' 1009 1010 segments = [] 1011 for fval in self: 1012 if id(fval) in reentrance_ids: 1013 segments.append('->(%s)' % reentrance_ids[id(fval)]) 1014 elif isinstance(fval, Variable): 1015 segments.append(fval.name) 1016 elif isinstance(fval, Expression): 1017 segments.append('%s' % fval) 1018 elif isinstance(fval, FeatStruct): 1019 segments.append(fval._repr(reentrances, reentrance_ids)) 1020 else: 1021 segments.append('%r' % fval) 1022 1023 return '%s[%s]' % (prefix, ', '.join(segments))
1024 1025 ###################################################################### 1026 # Variables & Bindings 1027 ###################################################################### 1028
1029 -def substitute_bindings(fstruct, bindings, fs_class='default'):
1030 """ 1031 @return: The feature structure that is obtained by replacing each 1032 variable bound by C{bindings} with its binding. If a variable is 1033 aliased to a bound variable, then it will be replaced by that 1034 variable's value. If a variable is aliased to an unbound 1035 variable, then it will be replaced by that variable. 1036 1037 @type bindings: C{dict} with L{Variable} keys 1038 @param bindings: A dictionary mapping from variables to values. 1039 """ 1040 if fs_class == 'default': fs_class = _default_fs_class(fstruct) 1041 fstruct = copy.deepcopy(fstruct) 1042 _substitute_bindings(fstruct, bindings, fs_class, set()) 1043 return fstruct
1044
1045 -def _substitute_bindings(fstruct, bindings, fs_class, visited):
1046 # Visit each node only once: 1047 if id(fstruct) in visited: return 1048 visited.add(id(fstruct)) 1049 1050 if _is_mapping(fstruct): items = fstruct.items() 1051 elif _is_sequence(fstruct): items = enumerate(fstruct) 1052 else: raise ValueError('Expected mapping or sequence') 1053 for (fname, fval) in items: 1054 while (isinstance(fval, Variable) and fval in bindings): 1055 fval = fstruct[fname] = bindings[fval] 1056 if isinstance(fval, fs_class): 1057 _substitute_bindings(fval, bindings, fs_class, visited) 1058 elif isinstance(fval, SubstituteBindingsI): 1059 fstruct[fname] = fval.substitute_bindings(bindings)
1060
1061 -def retract_bindings(fstruct, bindings, fs_class='default'):
1062 """ 1063 @return: The feature structure that is obtained by replacing each 1064 feature structure value that is bound by C{bindings} with the 1065 variable that binds it. A feature structure value must be 1066 identical to a bound value (i.e., have equal id) to be replaced. 1067 1068 C{bindings} is modified to point to this new feature structure, 1069 rather than the original feature structure. Feature structure 1070 values in C{bindings} may be modified if they are contained in 1071 C{fstruct}. 1072 """ 1073 if fs_class == 'default': fs_class = _default_fs_class(fstruct) 1074 (fstruct, new_bindings) = copy.deepcopy((fstruct, bindings)) 1075 bindings.update(new_bindings) 1076 inv_bindings = dict((id(val),var) for (var,val) in bindings.items()) 1077 _retract_bindings(fstruct, inv_bindings, fs_class, set()) 1078 return fstruct
1079
1080 -def _retract_bindings(fstruct, inv_bindings, fs_class, visited):
1081 # Visit each node only once: 1082 if id(fstruct) in visited: return 1083 visited.add(id(fstruct)) 1084 1085 if _is_mapping(fstruct): items = fstruct.items() 1086 elif _is_sequence(fstruct): items = enumerate(fstruct) 1087 else: raise ValueError('Expected mapping or sequence') 1088 for (fname, fval) in items: 1089 if isinstance(fval, fs_class): 1090 if id(fval) in inv_bindings: 1091 fstruct[fname] = inv_bindings[id(fval)] 1092 _retract_bindings(fval, inv_bindings, fs_class, visited)
1093 1094
1095 -def find_variables(fstruct, fs_class='default'):
1096 """ 1097 @return: The set of variables used by this feature structure. 1098 @rtype: C{set} of L{Variable} 1099 """ 1100 if fs_class == 'default': fs_class = _default_fs_class(fstruct) 1101 return _variables(fstruct, set(), fs_class, set())
1102
1103 -def _variables(fstruct, vars, fs_class, visited):
1104 # Visit each node only once: 1105 if id(fstruct) in visited: return 1106 visited.add(id(fstruct)) 1107 if _is_mapping(fstruct): items = fstruct.items() 1108 elif _is_sequence(fstruct): items = enumerate(fstruct) 1109 else: raise ValueError('Expected mapping or sequence') 1110 for (fname, fval) in items: 1111 if isinstance(fval, Variable): 1112 vars.add(fval) 1113 elif isinstance(fval, fs_class): 1114 _variables(fval, vars, fs_class, visited) 1115 elif isinstance(fval, SubstituteBindingsI): 1116 vars.update(fval.variables()) 1117 return vars
1118
1119 -def rename_variables(fstruct, vars=None, used_vars=(), new_vars=None, 1120 fs_class='default'):
1121 """ 1122 @return: The feature structure that is obtained by replacing 1123 any of this feature structure's variables that are in C{vars} 1124 with new variables. The names for these new variables will be 1125 names that are not used by any variable in C{vars}, or in 1126 C{used_vars}, or in this feature structure. 1127 1128 @type vars: C{set} 1129 @param vars: The set of variables that should be renamed. 1130 If not specified, C{find_variables(fstruct)} is used; i.e., all 1131 variables will be given new names. 1132 1133 @type used_vars: C{set} 1134 @param used_vars: A set of variables whose names should not be 1135 used by the new variables. 1136 1137 @type new_vars: C{dict} from L{Variable} to L{Variable} 1138 @param new_vars: A dictionary that is used to hold the mapping 1139 from old variables to new variables. For each variable M{v} 1140 in this feature structure: 1141 1142 - If C{new_vars} maps M{v} to M{v'}, then M{v} will be 1143 replaced by M{v'}. 1144 - If C{new_vars} does not contain M{v}, but C{vars} 1145 does contain M{v}, then a new entry will be added to 1146 C{new_vars}, mapping M{v} to the new variable that is used 1147 to replace it. 1148 1149 To consistantly rename the variables in a set of feature 1150 structures, simply apply rename_variables to each one, using 1151 the same dictionary: 1152 1153 >>> new_vars = {} # Maps old vars to alpha-renamed vars 1154 >>> new_fstruct1 = fstruct1.rename_variables(new_vars=new_vars) 1155 >>> new_fstruct2 = fstruct2.rename_variables(new_vars=new_vars) 1156 >>> new_fstruct3 = fstruct3.rename_variables(new_vars=new_vars) 1157 1158 If new_vars is not specified, then an empty dictionary is used. 1159 """ 1160 if fs_class == 'default': fs_class = _default_fs_class(fstruct) 1161 1162 # Default values: 1163 if new_vars is None: new_vars = {} 1164 if vars is None: vars = find_variables(fstruct, fs_class) 1165 else: vars = set(vars) 1166 1167 # Add our own variables to used_vars. 1168 used_vars = find_variables(fstruct, fs_class).union(used_vars) 1169 1170 # Copy ourselves, and rename variables in the copy. 1171 return _rename_variables(copy.deepcopy(fstruct), vars, used_vars, 1172 new_vars, fs_class, set())
1173
1174 -def _rename_variables(fstruct, vars, used_vars, new_vars, fs_class, visited):
1175 if id(fstruct) in visited: return 1176 visited.add(id(fstruct)) 1177 if _is_mapping(fstruct): items = fstruct.items() 1178 elif _is_sequence(fstruct): items = enumerate(fstruct) 1179 else: raise ValueError('Expected mapping or sequence') 1180 for (fname, fval) in items: 1181 if isinstance(fval, Variable): 1182 # If it's in new_vars, then rebind it. 1183 if fval in new_vars: 1184 fstruct[fname] = new_vars[fval] 1185 # If it's in vars, pick a new name for it. 1186 elif fval in vars: 1187 new_vars[fval] = _rename_variable(fval, used_vars) 1188 fstruct[fname] = new_vars[fval] 1189 used_vars.add(new_vars[fval]) 1190 elif isinstance(fval, fs_class): 1191 _rename_variables(fval, vars, used_vars, new_vars, 1192 fs_class, visited) 1193 elif isinstance(fval, SubstituteBindingsI): 1194 # Pick new names for any variables in `vars` 1195 for var in fval.variables(): 1196 if var in vars and var not in new_vars: 1197 new_vars[var] = _rename_variable(var, used_vars) 1198 used_vars.add(new_vars[var]) 1199 # Replace all variables in `new_vars`. 1200 fstruct[fname] = fval.substitute_bindings(new_vars) 1201 return fstruct
1202
1203 -def _rename_variable(var, used_vars):
1204 name, n = re.sub('\d+$', '', var.name), 2 1205 if not name: name = '?' 1206 while Variable('%s%s' % (name, n)) in used_vars: n += 1 1207 return Variable('%s%s' % (name, n))
1208
1209 -def remove_variables(fstruct, fs_class='default'):
1210 """ 1211 @rtype: L{FeatStruct} 1212 @return: The feature structure that is obtained by deleting 1213 all features whose values are L{Variable}s. 1214 """ 1215 if fs_class == 'default': fs_class = _default_fs_class(fstruct) 1216 return _remove_variables(copy.deepcopy(fstruct), fs_class, set())
1217
1218 -def _remove_variables(fstruct, fs_class, visited):
1219 if id(fstruct) in visited: return 1220 visited.add(id(fstruct)) 1221 if _is_mapping(fstruct): items = fstruct.items() 1222 elif _is_sequence(fstruct): items = enumerate(fstruct) 1223 else: raise ValueError('Expected mapping or sequence') 1224 for (fname, fval) in items: 1225 if isinstance(fval, Variable): 1226 del fstruct[fname] 1227 elif isinstance(fval, fs_class): 1228 _remove_variables(fval, fs_class, visited) 1229 return fstruct
1230 1231 1232 ###################################################################### 1233 # Unification 1234 ###################################################################### 1235
1236 -class _UnificationFailure(object):
1237 - def __repr__(self): return 'nltk.featstruct.UnificationFailure'
1238 UnificationFailure = _UnificationFailure() 1239 """A unique value used to indicate unification failure. It can be 1240 returned by L{Feature.unify_base_values()} or by custom C{fail()} 1241 functions to indicate that unificaiton should fail.""" 1242 1243 # The basic unification algorithm: 1244 # 1. Make copies of self and other (preserving reentrance) 1245 # 2. Destructively unify self and other 1246 # 3. Apply forward pointers, to preserve reentrance. 1247 # 4. Replace bound variables with their values.
1248 -def unify(fstruct1, fstruct2, bindings=None, trace=False, 1249 fail=None, rename_vars=True, fs_class='default'):
1250 """ 1251 Unify C{fstruct1} with C{fstruct2}, and return the resulting feature 1252 structure. This unified feature structure is the minimal 1253 feature structure that: 1254 - contains all feature value assignments from both C{fstruct1} 1255 and C{fstruct2}. 1256 - preserves all reentrance properties of C{fstruct1} and 1257 C{fstruct2}. 1258 1259 If no such feature structure exists (because C{fstruct1} and 1260 C{fstruct2} specify incompatible values for some feature), then 1261 unification fails, and C{unify} returns C{None}. 1262 1263 @type bindings: C{dict} with L{Variable} keys 1264 @param bindings: A set of variable bindings to be used and 1265 updated during unification. 1266 1267 Bound variables are replaced by their values. Aliased 1268 variables are replaced by their representative variable 1269 (if unbound) or the value of their representative variable 1270 (if bound). I.e., if variable C{I{v}} is in C{bindings}, 1271 then C{I{v}} is replaced by C{bindings[I{v}]}. This will 1272 be repeated until the variable is replaced by an unbound 1273 variable or a non-variable value. 1274 1275 Unbound variables are bound when they are unified with 1276 values; and aliased when they are unified with variables. 1277 I.e., if variable C{I{v}} is not in C{bindings}, and is 1278 unified with a variable or value C{I{x}}, then 1279 C{bindings[I{v}]} is set to C{I{x}}. 1280 1281 If C{bindings} is unspecified, then all variables are 1282 assumed to be unbound. I.e., C{bindings} defaults to an 1283 empty C{dict}. 1284 1285 @type trace: C{bool} 1286 @param trace: If true, generate trace output. 1287 1288 @type rename_vars: C{bool} 1289 @param rename_vars: If true, then rename any variables in 1290 C{fstruct2} that are also used in C{fstruct1}. This prevents 1291 aliasing in cases where C{fstruct1} and C{fstruct2} use the 1292 same variable name. E.g.: 1293 1294 >>> FeatStruct('[a=?x]').unify(FeatStruct('[b=?x]')) 1295 [a=?x, b=?x2] 1296 1297 If you intend for a variables in C{fstruct1} and C{fstruct2} with 1298 the same name to be treated as a single variable, use 1299 C{rename_vars=False}. 1300 """ 1301 # Decide which class(es) will be treated as feature structures, 1302 # for the purposes of unification. 1303 if fs_class == 'default': 1304 fs_class = _default_fs_class(fstruct1) 1305 if _default_fs_class(fstruct2) != fs_class: 1306 raise ValueError("Mixing FeatStruct objects with Python " 1307 "dicts and lists is not supported.") 1308 assert isinstance(fstruct1, fs_class) 1309 assert isinstance(fstruct2, fs_class) 1310 1311 # If bindings are unspecified, use an empty set of bindings. 1312 user_bindings = (bindings is not None) 1313 if bindings is None: bindings = {} 1314 1315 # Make copies of fstruct1 and fstruct2 (since the unification 1316 # algorithm is destructive). Do it all at once, to preserve 1317 # reentrance links between fstruct1 and fstruct2. Copy bindings 1318 # as well, in case there are any bound vars that contain parts 1319 # of fstruct1 or fstruct2. 1320 (fstruct1copy, fstruct2copy, bindings_copy) = ( 1321 copy.deepcopy((fstruct1, fstruct2, bindings))) 1322 1323 # Copy the bindings back to the original bindings dict. 1324 bindings.update(bindings_copy) 1325 1326 if rename_vars: 1327 vars1 = find_variables(fstruct1copy, fs_class) 1328 vars2 = find_variables(fstruct2copy, fs_class) 1329 _rename_variables(fstruct2copy, vars1, vars2, {}, fs_class, set()) 1330 1331 # Do the actual unification. If it fails, return None. 1332 forward = {} 1333 if trace: _trace_unify_start((), fstruct1copy, fstruct2copy) 1334 try: result = _destructively_unify(fstruct1copy, fstruct2copy, bindings, 1335 forward, trace, fail, fs_class, ()) 1336 except _UnificationFailureError: return None 1337 1338 # _destructively_unify might return UnificationFailure, e.g. if we 1339 # tried to unify a mapping with a sequence. 1340 if result is UnificationFailure: 1341 if fail is None: return None 1342 else: return fail(fstruct1copy, fstruct2copy, ()) 1343 1344 # Replace any feature structure that has a forward pointer 1345 # with the target of its forward pointer. 1346 result = _apply_forwards(result, forward, fs_class, set()) 1347 if user_bindings: _apply_forwards_to_bindings(forward, bindings) 1348 1349 # Replace bound vars with values. 1350 _resolve_aliases(bindings) 1351 _substitute_bindings(result, bindings, fs_class, set()) 1352 1353 # Return the result. 1354 if trace: _trace_unify_succeed((), result) 1355 if trace: _trace_bindings((), bindings) 1356 return result
1357
1358 -class _UnificationFailureError(Exception):
1359 """An exception that is used by C{_destructively_unify} to abort 1360 unification when a failure is encountered."""
1361
1362 -def _destructively_unify(fstruct1, fstruct2, bindings, forward, 1363 trace, fail, fs_class, path):
1364 """ 1365 Attempt to unify C{fstruct1} and C{fstruct2} by modifying them 1366 in-place. If the unification succeeds, then C{fstruct1} will 1367 contain the unified value, the value of C{fstruct2} is undefined, 1368 and forward[id(fstruct2)] is set to fstruct1. If the unification 1369 fails, then a _UnificationFailureError is raised, and the 1370 values of C{fstruct1} and C{fstruct2} are undefined. 1371 1372 @param bindings: A dictionary mapping variables to values. 1373 @param forward: A dictionary mapping feature structures ids 1374 to replacement structures. When two feature structures 1375 are merged, a mapping from one to the other will be added 1376 to the forward dictionary; and changes will be made only 1377 to the target of the forward dictionary. 1378 C{_destructively_unify} will always 'follow' any links 1379 in the forward dictionary for fstruct1 and fstruct2 before 1380 actually unifying them. 1381 @param trace: If true, generate trace output 1382 @param path: The feature path that led us to this unification 1383 step. Used for trace output. 1384 """ 1385 # If fstruct1 is already identical to fstruct2, we're done. 1386 # Note: this, together with the forward pointers, ensures 1387 # that unification will terminate even for cyclic structures. 1388 if fstruct1 is fstruct2: 1389 if trace: _trace_unify_identity(path, fstruct1) 1390 return fstruct1 1391 1392 # Set fstruct2's forward pointer to point to fstruct1; this makes 1393 # fstruct1 the canonical copy for fstruct2. Note that we need to 1394 # do this before we recurse into any child structures, in case 1395 # they're cyclic. 1396 forward[id(fstruct2)] = fstruct1 1397 1398 # Unifying two mappings: 1399 if _is_mapping(fstruct1) and _is_mapping(fstruct2): 1400 for fname in fstruct1: 1401 if getattr(fname, 'default', None) is not None: 1402 fstruct2.setdefault(fname, fname.default) 1403 for fname in fstruct2: 1404 if getattr(fname, 'default', None) is not None: 1405 fstruct1.setdefault(fname, fname.default) 1406 1407 # Unify any values that are defined in both fstruct1 and 1408 # fstruct2. Copy any values that are defined in fstruct2 but 1409 # not in fstruct1 to fstruct1. Note: sorting fstruct2's 1410 # features isn't actually necessary; but we do it to give 1411 # deterministic behavior, e.g. for tracing. 1412 for fname, fval2 in sorted(fstruct2.items()): 1413 if fname in fstruct1: 1414 fstruct1[fname] = _unify_feature_values( 1415 fname, fstruct1[fname], fval2, bindings, 1416 forward, trace, fail, fs_class, path+(fname,)) 1417 else: 1418 fstruct1[fname] = fval2 1419 1420 return fstruct1 # Contains the unified value. 1421 1422 # Unifying two sequences: 1423 elif _is_sequence(fstruct1) and _is_sequence(fstruct2): 1424 # If the lengths don't match, fail. 1425 if len(fstruct1) != len(fstruct2): 1426 return UnificationFailure 1427 1428 # Unify corresponding values in fstruct1 and fstruct2. 1429 for findex in range(len(fstruct1)): 1430 fstruct1[findex] = _unify_feature_values( 1431 findex, fstruct1[findex], fstruct2[findex], bindings, 1432 forward, trace, fail, fs_class, path+(findex,)) 1433 1434 return fstruct1 # Contains the unified value. 1435 1436 # Unifying sequence & mapping: fail. The failure function 1437 # doesn't get a chance to recover in this case. 1438 elif ((_is_sequence(fstruct1) or _is_mapping(fstruct1)) and 1439 (_is_sequence(fstruct2) or _is_mapping(fstruct2))): 1440 return UnificationFailure 1441 1442 # Unifying anything else: not allowed! 1443 raise TypeError('Expected mappings or sequences')
1444
1445 -def _unify_feature_values(fname, fval1, fval2, bindings, forward, 1446 trace, fail, fs_class, fpath):
1447 """ 1448 Attempt to unify C{fval1} and and C{fval2}, and return the 1449 resulting unified value. The method of unification will depend on 1450 the types of C{fval1} and C{fval2}: 1451 1452 1. If they're both feature structures, then destructively 1453 unify them (see L{_destructively_unify()}. 1454 2. If they're both unbound variables, then alias one variable 1455 to the other (by setting bindings[v2]=v1). 1456 3. If one is an unbound variable, and the other is a value, 1457 then bind the unbound variable to the value. 1458 4. If one is a feature structure, and the other is a base value, 1459 then fail. 1460 5. If they're both base values, then unify them. By default, 1461 this will succeed if they are equal, and fail otherwise. 1462 """ 1463 if trace: _trace_unify_start(fpath, fval1, fval2) 1464 1465 # Look up the "canonical" copy of fval1 and fval2 1466 while id(fval1) in forward: fval1 = forward[id(fval1)] 1467 while id(fval2) in forward: fval2 = forward[id(fval2)] 1468 1469 # If fval1 or fval2 is a bound variable, then 1470 # replace it by the variable's bound value. This 1471 # includes aliased variables, which are encoded as 1472 # variables bound to other variables. 1473 fvar1 = fvar2 = None 1474 while isinstance(fval1, Variable) and fval1 in bindings: 1475 fvar1 = fval1 1476 fval1 = bindings[fval1] 1477 while isinstance(fval2, Variable) and fval2 in bindings: 1478 fvar2 = fval2 1479 fval2 = bindings[fval2] 1480 1481 # Case 1: Two feature structures (recursive case) 1482 if isinstance(fval1, fs_class) and isinstance(fval2, fs_class): 1483 result = _destructively_unify(fval1, fval2, bindings, forward, 1484 trace, fail, fs_class, fpath) 1485 1486 # Case 2: Two unbound variables (create alias) 1487 elif (isinstance(fval1, Variable) and 1488 isinstance(fval2, Variable)): 1489 if fval1 != fval2: bindings[fval2] = fval1 1490 result = fval1 1491 1492 # Case 3: An unbound variable and a value (bind) 1493 elif isinstance(fval1, Variable): 1494 bindings[fval1] = fval2 1495 result = fval1 1496 elif isinstance(fval2, Variable): 1497 bindings[fval2] = fval1 1498 result = fval2 1499 1500 # Case 4: A feature structure & a base value (fail) 1501 elif isinstance(fval1, fs_class) or isinstance(fval2, fs_class): 1502 result = UnificationFailure 1503 1504 # Case 5: Two base values 1505 else: 1506 # Case 5a: Feature defines a custom unification method for base values 1507 if isinstance(fname, Feature): 1508 result = fname.unify_base_values(fval1, fval2, bindings) 1509 # Case 5b: Feature value defines custom unification method 1510 elif isinstance(fval1, CustomFeatureValue): 1511 result = fval1.unify(fval2) 1512 # Sanity check: unify value should be symmetric 1513 if (isinstance(fval2, CustomFeatureValue) and 1514 result != fval2.unify(fval1)): 1515 raise AssertionError( 1516 'CustomFeatureValue objects %r and %r disagree ' 1517 'about unification value: %r vs. %r' % 1518 (fval1, fval2, result, fval2.unify(fval1))) 1519 elif isinstance(fval2, CustomFeatureValue): 1520 result = fval2.unify(fval1) 1521 # Case 5c: Simple values -- check if they're equal. 1522 else: 1523 if fval1 == fval2: 1524 result = fval1 1525 else: 1526 result = UnificationFailure 1527 1528 # If either value was a bound variable, then update the 1529 # bindings. (This is really only necessary if fname is a 1530 # Feature or if either value is a CustomFeatureValue.) 1531 if result is not UnificationFailure: 1532 if fvar1 is not None: 1533 bindings[fvar1] = result 1534 result = fvar1 1535 if fvar2 is not None: 1536 bindings[fvar2] = result 1537 result = fvar2 1538 1539 # If we unification failed, call the failure function; it 1540 # might decide to continue anyway. 1541 if result is UnificationFailure: 1542 if fail is not None: result = fail(fval1, fval2, fpath) 1543 if trace: _trace_unify_fail(fpath[:-1], result) 1544 if result is UnificationFailure: 1545 raise _UnificationFailureError 1546 1547 # Normalize the result. 1548 if isinstance(result, fs_class): 1549 result = _apply_forwards(result, forward, fs_class, set()) 1550 1551 if trace: _trace_unify_succeed(fpath, result) 1552 if trace and isinstance(result, fs_class): 1553 _trace_bindings(fpath, bindings) 1554 1555 return result
1556
1557 -def _apply_forwards_to_bindings(forward, bindings):
1558 """ 1559 Replace any feature structure that has a forward pointer with 1560 the target of its forward pointer (to preserve reentrancy). 1561 """ 1562 for (var, value) in bindings.items(): 1563 while id(value) in forward: 1564 value = forward[id(value)] 1565 bindings[var] = value
1566
1567 -def _apply_forwards(fstruct, forward, fs_class, visited):
1568 """ 1569 Replace any feature structure that has a forward pointer with 1570 the target of its forward pointer (to preserve reentrancy). 1571 """ 1572 # Follow our own forwards pointers (if any) 1573 while id(fstruct) in forward: fstruct = forward[id(fstruct)] 1574 1575 # Visit each node only once: 1576 if id(fstruct) in visited: return 1577 visited.add(id(fstruct)) 1578 1579 if _is_mapping(fstruct): items = fstruct.items() 1580 elif _is_sequence(fstruct): items = enumerate(fstruct) 1581 else: raise ValueError('Expected mapping or sequence') 1582 for fname, fval in items: 1583 if isinstance(fval, fs_class): 1584 # Replace w/ forwarded value. 1585 while id(fval) in forward: 1586 fval = forward[id(fval)] 1587 fstruct[fname] = fval 1588 # Recurse to child. 1589 _apply_forwards(fval, forward, fs_class, visited) 1590 1591 return fstruct
1592
1593 -def _resolve_aliases(bindings):
1594 """ 1595 Replace any bound aliased vars with their binding; and replace 1596 any unbound aliased vars with their representative var. 1597 """ 1598 for (var, value) in bindings.items(): 1599 while isinstance(value, Variable) and value in bindings: 1600 value = bindings[var] = bindings[value]
1601
1602 -def _trace_unify_start(path, fval1, fval2):
1603 if path == (): 1604 print '\nUnification trace:' 1605 else: 1606 fullname = '.'.join(str(n) for n in path) 1607 print ' '+'| '*(len(path)-1)+'|' 1608 print ' '+'| '*(len(path)-1)+'| Unify feature: %s' % fullname 1609 print ' '+'| '*len(path)+' / '+_trace_valrepr(fval1) 1610 print ' '+'| '*len(path)+'|\\ '+_trace_valrepr(fval2)
1611 -def _trace_unify_identity(path, fval1):
1612 print ' '+'| '*len(path)+'|' 1613 print ' '+'| '*len(path)+'| (identical objects)' 1614 print ' '+'| '*len(path)+'|' 1615 print ' '+'| '*len(path)+'+-->'+`fval1`
1616 -def _trace_unify_fail(path, result):
1617 if result is UnificationFailure: resume = '' 1618 else: resume = ' (nonfatal)' 1619 print ' '+'| '*len(path)+'| |' 1620 print ' '+'X '*len(path)+'X X <-- FAIL'+resume
1621 -def _trace_unify_succeed(path, fval1):
1622 # Print the result. 1623 print ' '+'| '*len(path)+'|' 1624 print ' '+'| '*len(path)+'+-->'+`fval1`
1625 -def _trace_bindings(path, bindings):
1626 # Print the bindings (if any). 1627 if len(bindings) > 0: 1628 binditems = sorted(bindings.items(), key=lambda v:v[0].name) 1629 bindstr = '{%s}' % ', '.join( 1630 '%s: %s' % (var, _trace_valrepr(val)) 1631 for (var, val) in binditems) 1632 print ' '+'| '*len(path)+' Bindings: '+bindstr
1633 -def _trace_valrepr(val):
1634 if isinstance(val, Variable): 1635 return '%s' % val 1636 else: 1637 return '%r' % val
1638
1639 -def subsumes(fstruct1, fstruct2):
1640 """ 1641 @return: True if C{fstruct1} subsumes C{fstruct2}. I.e., return 1642 true if unifying C{fstruct1} with C{fstruct2} would result in a 1643 feature structure equal to C{fstruct2.} 1644 """ 1645 return fstruct2 == unify(fstruct1, fstruct2)
1646
1647 -def conflicts(fstruct1, fstruct2, trace=0):
1648 """ 1649 @return: A list of the feature paths of all features which are 1650 assigned incompatible values by C{fstruct1} and C{fstruct2}. 1651 @rtype: C{list} of C{tuple} 1652 """ 1653 conflict_list = [] 1654 def add_conflict(fval1, fval2, path): 1655 conflict_list.append(path) 1656 return fval1
1657 unify(fstruct1, fstruct2, fail=add_conflict, trace=trace) 1658 return conflict_list 1659 1660 ###################################################################### 1661 # Helper Functions 1662 ###################################################################### 1663
1664 -def _is_mapping(v):
1665 return hasattr(v, 'has_key') and hasattr(v, 'items')
1666
1667 -def _is_sequence(v):
1668 return (hasattr(v, '__iter__') and hasattr(v, '__len__') and 1669 not isinstance(v, basestring))
1670
1671 -def _default_fs_class(obj):
1672 if isinstance(obj, FeatStruct): return FeatStruct 1673 if isinstance(obj, (dict, list)): return (dict, list) 1674 else: 1675 raise ValueError('To unify objects of type %s, you must specify ' 1676 'fs_class explicitly.' % obj.__class__.__name__)
1677 ###################################################################### 1678 # FeatureValueSet & FeatureValueTuple 1679 ###################################################################### 1680
1681 -class SubstituteBindingsSequence(SubstituteBindingsI):
1682 """ 1683 A mixin class for sequence clases that distributes variables() and 1684 substitute_bindings() over the object's elements. 1685 """
1686 - def variables(self):
1687 return ([elt for elt in self if isinstance(elt, Variable)] + 1688 sum([list(elt.variables()) for elt in self 1689 if isinstance(elt, SubstituteBindingsI)], []))
1690
1691 - def substitute_bindings(self, bindings):
1692 return self.__class__([self.subst(v, bindings) for v in self])
1693
1694 - def subst(self, v, bindings):
1695 if isinstance(v, SubstituteBindingsI): 1696 return v.substitute_bindings(bindings) 1697 else: 1698 return bindings.get(v, v)
1699
1700 -class FeatureValueTuple(SubstituteBindingsSequence, tuple):
1701 """ 1702 A base feature value that is a tuple of other base feature values. 1703 FeatureValueTuple implements L{SubstituteBindingsI}, so it any 1704 variable substitutions will be propagated to the elements 1705 contained by the set. C{FeatureValueTuple}s are immutable. 1706 """
1707 - def __repr__(self): # [xx] really use %s here?
1708 if len(self) == 0: return '()' 1709 return '(%s)' % ', '.join('%s' % (b,) for b in self)
1710
1711 -class FeatureValueSet(SubstituteBindingsSequence, frozenset):
1712 """ 1713 A base feature value that is a set of other base feature values. 1714 FeatureValueSet implements L{SubstituteBindingsI}, so it any 1715 variable substitutions will be propagated to the elements 1716 contained by the set. C{FeatureValueSet}s are immutable. 1717 """
1718 - def __repr__(self): # [xx] really use %s here?
1719 if len(self) == 0: return '{/}' # distinguish from dict. 1720 # n.b., we sort the string reprs of our elements, to ensure 1721 # that our own repr is deterministic. 1722 return '{%s}' % ', '.join(sorted('%s' % (b,) for b in self))
1723 __str__ = __repr__ 1724
1725 -class FeatureValueUnion(SubstituteBindingsSequence, frozenset):
1726 """ 1727 A base feature value that represents the union of two or more 1728 L{FeatureValueSet}s or L{Variable}s. 1729 """
1730 - def __new__(cls, values):
1731 # If values contains FeatureValueUnions, then collapse them. 1732 values = _flatten(values, FeatureValueUnion) 1733 1734 # If the resulting list contains no variables, then 1735 # use a simple FeatureValueSet instead. 1736 if sum(isinstance(v, Variable) for v in values) == 0: 1737 values = _flatten(values, FeatureValueSet) 1738 return FeatureValueSet(values) 1739 1740 # If we contain a single variable, return that variable. 1741 if len(values) == 1: 1742 return list(values)[0] 1743 1744 # Otherwise, build the FeatureValueUnion. 1745 return frozenset.__new__(cls, values)
1746
1747 - def __repr__(self):
1748 # n.b., we sort the string reprs of our elements, to ensure 1749 # that our own repr is deterministic. also, note that len(self) 1750 # is guaranteed to be 2 or more. 1751 return '{%s}' % '+'.join(sorted('%s' % (b,) for b in self))
1752
1753 -class FeatureValueConcat(SubstituteBindingsSequence, tuple):
1754 """ 1755 A base feature value that represents the concatenation of two or 1756 more L{FeatureValueTuple}s or L{Variable}s. 1757 """
1758 - def __new__(cls, values):
1759 # If values contains FeatureValueConcats, then collapse them. 1760 values = _flatten(values, FeatureValueConcat) 1761 1762 # If the resulting list contains no variables, then 1763 # use a simple FeatureValueTuple instead. 1764 if sum(isinstance(v, Variable) for v in values) == 0: 1765 values = _flatten(values, FeatureValueTuple) 1766 return FeatureValueTuple(values) 1767 1768 # If we contain a single variable, return that variable. 1769 if len(values) == 1: 1770 return list(values)[0] 1771 1772 # Otherwise, build the FeatureValueConcat. 1773 return tuple.__new__(cls, values)
1774
1775 - def __repr__(self):
1776 # n.b.: len(self) is guaranteed to be 2 or more. 1777 return '(%s)' % '+'.join('%s' % (b,) for b in self)
1778
1779 -def _flatten(lst, cls):
1780 """ 1781 Helper function -- return a copy of list, with all elements of 1782 type C{cls} spliced in rather than appended in. 1783 """ 1784 result = [] 1785 for elt in lst: 1786 if isinstance(elt, cls): result.extend(elt) 1787 else: result.append(elt) 1788 return result
1789 1790 ###################################################################### 1791 # Specialized Features 1792 ###################################################################### 1793
1794 -class Feature(object):
1795 """ 1796 A feature identifier that's specialized to put additional 1797 constraints, default values, etc. 1798 """
1799 - def __init__(self, name, default=None, display=None):
1800 assert display in (None, 'prefix', 'slash') 1801 1802 self._name = name # [xx] rename to .identifier? 1803 """The name of this feature.""" 1804 1805 self._default = default # [xx] not implemented yet. 1806 """Default value for this feature. Use None for unbound.""" 1807 1808 self._display = display 1809 """Custom display location: can be prefix, or slash.""" 1810 1811 if self._display == 'prefix': 1812 self._sortkey = (-1, self._name) 1813 elif self._display == 'slash': 1814 self._sortkey = (1, self._name) 1815 else: 1816 self._sortkey = (0, self._name)
1817 1818 name = property(lambda self: self._name) 1819 default = property(lambda self: self._default) 1820 display = property(lambda self: self._display) 1821
1822 - def __repr__(self):
1823 return '*%s*' % self.name
1824
1825 - def __cmp__(self, other):
1826 if not isinstance(other, Feature): return -1 1827 if self._name == other._name: return 0 1828 return cmp(self._sortkey, other._sortkey)
1829
1830 - def __hash__(self):
1831 return hash(self._name)
1832 1833 #//////////////////////////////////////////////////////////// 1834 # These can be overridden by subclasses: 1835 #//////////////////////////////////////////////////////////// 1836
1837 - def parse_value(self, s, position, reentrances, parser):
1838 return parser.parse_value(s, position, reentrances)
1839
1840 - def unify_base_values(self, fval1, fval2, bindings):
1841 """ 1842 If possible, return a single value.. If not, return 1843 the value L{UnificationFailure}. 1844 """ 1845 if fval1 == fval2: return fval1 1846 else: return UnificationFailure
1847 1848
1849 -class SlashFeature(Feature):
1850 - def parse_value(self, s, position, reentrances, parser):
1851 return parser.partial_parse(s, position, reentrances)
1852
1853 -class RangeFeature(Feature):
1854 RANGE_RE = re.compile('(-?\d+):(-?\d+)')
1855 - def parse_value(self, s, position, reentrances, parser):
1856 m = self.RANGE_RE.match(s, position) 1857 if not m: raise ValueError('range', position) 1858 return (int(m.group(1)), int(m.group(2))), m.end()
1859
1860 - def unify_base_values(self, fval1, fval2, bindings):
1861 if fval1 is None: return fval2 1862 if fval2 is None: return fval1 1863 rng = max(fval1[0], fval2[0]), min(fval1[1], fval2[1]) 1864 if rng[1] < rng[0]: return UnificationFailure 1865 return rng
1866 1867 SLASH = SlashFeature('slash', default=False, display='slash') 1868 TYPE = Feature('type', display='prefix') 1869 1870 ###################################################################### 1871 # Specialized Feature Values 1872 ###################################################################### 1873
1874 -class CustomFeatureValue(object):
1875 """ 1876 An abstract base class for base values that define a custom 1877 unification method. A C{CustomFeatureValue}'s custom unification 1878 method will be used during feature structure unification if: 1879 1880 - The C{CustomFeatureValue} is unified with another base value. 1881 - The C{CustomFeatureValue} is not the value of a customized 1882 L{Feature} (which defines its own unification method). 1883 1884 If two C{CustomFeatureValue} objects are unified with one another 1885 during feature structure unification, then the unified base values 1886 they return I{must} be equal; otherwise, an C{AssertionError} will 1887 be raised. 1888 1889 Subclasses must define L{unify()} and L{__cmp__()}. Subclasses 1890 may also wish to define L{__hash__()}. 1891 """
1892 - def unify(self, other):
1893 """ 1894 If this base value unifies with C{other}, then return the 1895 unified value. Otherwise, return L{UnificationFailure}. 1896 """ 1897 raise NotImplementedError('abstract base class')
1898 - def __cmp__(self, other):
1899 raise NotImplementedError('abstract base class')
1900 - def __hash__(self):
1901 raise TypeError('%s objects or unhashable' % self.__class__.__name__)
1902 1903 ###################################################################### 1904 # Feature Structure Parser 1905 ###################################################################### 1906
1907 -class FeatStructParser(object):
1908 - def __init__(self, features=(SLASH, TYPE), fdict_class=FeatStruct, 1909 flist_class=FeatList):
1910 self._features = dict((f.name,f) for f in features) 1911 self._fdict_class = fdict_class 1912 self._flist_class = flist_class 1913 self._prefix_feature = None 1914 self._slash_feature = None 1915 for feature in features: 1916 if feature.display == 'slash': 1917 if self._slash_feature: 1918 raise ValueError('Multiple features w/ display=slash') 1919 self._slash_feature = feature 1920 if feature.display == 'prefix': 1921 if self._prefix_feature: 1922 raise ValueError('Multiple features w/ display=prefix') 1923 self._prefix_feature = feature 1924 self._features_with_defaults = [feature for feature in features 1925 if feature.default is not None]
1926
1927 - def parse(self, s, fstruct=None):
1928 """ 1929 Convert a string representation of a feature structure (as 1930 displayed by repr) into a C{FeatStruct}. This parse 1931 imposes the following restrictions on the string 1932 representation: 1933 - Feature names cannot contain any of the following: 1934 whitespace, parenthases, quote marks, equals signs, 1935 dashes, commas, and square brackets. Feature names may 1936 not begin with plus signs or minus signs. 1937 - Only the following basic feature value are supported: 1938 strings, integers, variables, C{None}, and unquoted 1939 alphanumeric strings. 1940 - For reentrant values, the first mention must specify 1941 a reentrance identifier and a value; and any subsequent 1942 mentions must use arrows (C{'->'}) to reference the 1943 reentrance identifier. 1944 """ 1945 s = s.strip() 1946 value, position = self.partial_parse(s, 0, {}, fstruct) 1947 if position != len(s): 1948 self._error(s, 'end of string', position) 1949 return value
1950 1951 _START_FSTRUCT_RE = re.compile(r'\s*(?:\((\d+)\)\s*)?(\??[\w-]+)?(\[)') 1952 _END_FSTRUCT_RE = re.compile(r'\s*]\s*') 1953 _SLASH_RE = re.compile(r'/') 1954 _FEATURE_NAME_RE = re.compile(r'\s*([+-]?)([^\s\(\)<>"\'\-=\[\],]+)\s*') 1955 _REENTRANCE_RE = re.compile(r'\s*->\s*') 1956 _TARGET_RE = re.compile(r'\s*\((\d+)\)\s*') 1957 _ASSIGN_RE = re.compile(r'\s*=\s*') 1958 _COMMA_RE = re.compile(r'\s*,\s*') 1959 _BARE_PREFIX_RE = re.compile(r'\s*(?:\((\d+)\)\s*)?(\??[\w-]+\s*)()') 1960 # This one is used to distinguish fdicts from flists: 1961 _START_FDICT_RE = re.compile(r'(%s)|(%s\s*(%s\s*(=|->)|[+-]%s|\]))' % ( 1962 _BARE_PREFIX_RE.pattern, _START_FSTRUCT_RE.pattern, 1963 _FEATURE_NAME_RE.pattern, _FEATURE_NAME_RE.pattern)) 1964
1965 - def partial_parse(self, s, position=0, reentrances=None, fstruct=None):
1966 """ 1967 Helper function that parses a feature structure. 1968 @param s: The string to parse. 1969 @param position: The position in the string to start parsing. 1970 @param reentrances: A dictionary from reentrance ids to values. 1971 Defaults to an empty dictionary. 1972 @return: A tuple (val, pos) of the feature structure created 1973 by parsing and the position where the parsed feature 1974 structure ends. 1975 """ 1976 if reentrances is None: reentrances = {} 1977 try: 1978 return self._partial_parse(s, position, reentrances, fstruct) 1979 except ValueError, e: 1980 if len(e.args) != 2: raise 1981 self._error(s, *e.args)
1982
1983 - def _partial_parse(self, s, position, reentrances, fstruct=None):
1984 # Create the new feature structure 1985 if fstruct is None: 1986 if self._START_FDICT_RE.match(s, position): 1987 fstruct = self._fdict_class() 1988 else: 1989 fstruct = self._flist_class() 1990 1991 # Read up to the open bracket. 1992 match = self._START_FSTRUCT_RE.match(s, position) 1993 if not match: 1994 match = self._BARE_PREFIX_RE.match(s, position) 1995 if not match: 1996 raise ValueError('open bracket or identifier', position) 1997 position = match.end() 1998 1999 # If there as an identifier, record it. 2000 if match.group(1): 2001 identifier = match.group(1) 2002 if identifier in reentrances: 2003 raise ValueError('new identifier', match.start(1)) 2004 reentrances[identifier] = fstruct 2005 2006 if isinstance(fstruct, FeatDict): 2007 fstruct.clear() 2008 return self._partial_parse_featdict(s, position, match, 2009 reentrances, fstruct) 2010 else: 2011 del fstruct[:] 2012 return self._partial_parse_featlist(s, position, match, 2013 reentrances, fstruct)
2014
2015 - def _partial_parse_featlist(self, s, position, match, 2016 reentrances, fstruct):
2017 # Prefix features are not allowed: 2018 if match.group(2): raise ValueError('open bracket') 2019 # Bare prefixes are not allowed: 2020 if not match.group(3): raise ValueError('open bracket') 2021 2022 # Build a list of the features defined by the structure. 2023 while position < len(s): 2024 # Check for the close bracket. 2025 match = self._END_FSTRUCT_RE.match(s, position) 2026 if match is not None: 2027 return fstruct, match.end() 2028 2029 # Reentances have the form "-> (target)" 2030 match = self._REENTRANCE_RE.match(s, position) 2031 if match: 2032 position = match.end() 2033 match = _TARGET_RE.match(s, position) 2034 if not match: raise ValueError('identifier', position) 2035 target = match.group(1) 2036 if target not in reentrances: 2037 raise ValueError('bound identifier', position) 2038 position = match.end() 2039 fstruct.append(reentrances[target]) 2040 2041 # Anything else is a value. 2042 else: 2043 value, position = ( 2044 self._parse_value(0, s, position, reentrances)) 2045 fstruct.append(value) 2046 2047 # If there's a close bracket, handle it at the top of the loop. 2048 if self._END_FSTRUCT_RE.match(s, position): 2049 continue 2050 2051 # Otherwise, there should be a comma 2052 match = self._COMMA_RE.match(s, position) 2053 if match is None: raise ValueError('comma', position) 2054 position = match.end() 2055 2056 # We never saw a close bracket. 2057 raise ValueError('close bracket', position)
2058
2059 - def _partial_parse_featdict(self, s, position, match, 2060 reentrances, fstruct):
2061 # If there was a prefix feature, record it. 2062 if match.group(2): 2063 if self._prefix_feature is None: 2064 raise ValueError('open bracket or identifier', match.start(2)) 2065 prefixval = match.group(2).strip() 2066 if prefixval.startswith('?'): 2067 prefixval = Variable(prefixval) 2068 fstruct[self._prefix_feature] = prefixval 2069 2070 # If group 3 is empty, then we just have a bare prefix, so 2071 # we're done. 2072 if not match.group(3): 2073 return self._finalize(s, match.end(), reentrances, fstruct) 2074 2075 # Build a list of the features defined by the structure. 2076 # Each feature has one of the three following forms: 2077 # name = value 2078 # name -> (target) 2079 # +name 2080 # -name 2081 while position < len(s): 2082 # Use these variables to hold info about each feature: 2083 name = value = None 2084 2085 # Check for the close bracket. 2086 match = self._END_FSTRUCT_RE.match(s, position) 2087 if match is not None: 2088 return self._finalize(s, match.end(), reentrances, fstruct) 2089 2090 # Get the feature name's name 2091 match = self._FEATURE_NAME_RE.match(s, position) 2092 if match is None: raise ValueError('feature name', position) 2093 name = match.group(2) 2094 position = match.end() 2095 2096 # Check if it's a special feature. 2097 if name[0] == '*' and name[-1] == '*': 2098 name = self._features.get(name[1:-1]) 2099 if name is None: 2100 raise ValueError('known special feature', match.start(2)) 2101 2102 # Check if this feature has a value already. 2103 if name in fstruct: 2104 raise ValueError('new name', match.start(2)) 2105 2106 # Boolean value ("+name" or "-name") 2107 if match.group(1) == '+': value = True 2108 if match.group(1) == '-': value = False 2109 2110 # Reentrance link ("-> (target)") 2111 if value is None: 2112 match = self._REENTRANCE_RE.match(s, position) 2113 if match is not None: 2114 position = match.end() 2115 match = self._TARGET_RE.match(s, position) 2116 if not match: 2117 raise ValueError('identifier', position) 2118 target = match.group(1) 2119 if target not in reentrances: 2120 raise ValueError('bound identifier', position) 2121 position = match.end() 2122 value = reentrances[target] 2123 2124 # Assignment ("= value"). 2125 if value is None: 2126 match = self._ASSIGN_RE.match(s, position) 2127 if match: 2128 position = match.end() 2129 value, position = ( 2130 self._parse_value(name, s, position, reentrances)) 2131 # None of the above: error. 2132 else: 2133 raise ValueError('equals sign', position) 2134 2135 # Store the value. 2136 fstruct[name] = value 2137 2138 # If there's a close bracket, handle it at the top of the loop. 2139 if self._END_FSTRUCT_RE.match(s, position): 2140 continue 2141 2142 # Otherwise, there should be a comma 2143 match = self._COMMA_RE.match(s, position) 2144 if match is None: raise ValueError('comma', position) 2145 position = match.end() 2146 2147 # We never saw a close bracket. 2148 raise ValueError('close bracket', position)
2149
2150 - def _finalize(self, s, pos, reentrances, fstruct):
2151 """ 2152 Called when we see the close brace -- checks for a slash feature, 2153 and adds in default values. 2154 """ 2155 # Add the slash feature (if any) 2156 match = self._SLASH_RE.match(s, pos) 2157 if match: 2158 name = self._slash_feature 2159 v, pos = self._parse_value(name, s, match.end(), reentrances) 2160 fstruct[name] = v 2161 ## Add any default features. -- handle in unficiation instead? 2162 #for feature in self._features_with_defaults: 2163 # fstruct.setdefault(feature, feature.default) 2164 # Return the value. 2165 return fstruct, pos
2166
2167 - def _parse_value(self, name, s, position, reentrances):
2168 if isinstance(name, Feature): 2169 return name.parse_value(s, position, reentrances, self) 2170 else: 2171 return self.parse_value(s, position, reentrances)
2172
2173 - def parse_value(self, s, position, reentrances):
2174 for (handler, regexp) in self.VALUE_HANDLERS: 2175 match = regexp.match(s, position) 2176 if match: 2177 handler_func = getattr(self, handler) 2178 return handler_func(s, position, reentrances, match) 2179 raise ValueError('value', position)
2180
2181 - def _error(self, s, expected, position):
2182 lines = s.split('\n') 2183 while position > len(lines[0]): 2184 position -= len(lines.pop(0))+1 # +1 for the newline. 2185 estr = ('Error parsing feature structure\n ' + 2186 lines[0] + '\n ' + ' '*position + '^ ' + 2187 'Expected %s' % expected) 2188 raise ValueError, estr
2189 2190 #//////////////////////////////////////////////////////////// 2191 #{ Value Parsers 2192 #//////////////////////////////////////////////////////////// 2193 2194 #: A table indicating how feature values should be parsed. Each 2195 #: entry in the table is a pair (handler, regexp). The first entry 2196 #: with a matching regexp will have its handler called. Handlers 2197 #: should have the following signature:: 2198 #: 2199 #: def handler(s, position, reentrances, match): ... 2200 #: 2201 #: and should return a tuple (value, position), where position is 2202 #: the string position where the value ended. (n.b.: order is 2203 #: important here!) 2204 VALUE_HANDLERS = [ 2205 ('parse_fstruct_value', _START_FSTRUCT_RE), 2206 ('parse_var_value', re.compile(r'\?[a-zA-Z_][a-zA-Z0-9_]*')), 2207 ('parse_str_value', re.compile("[uU]?[rR]?(['\"])")), 2208 ('parse_int_value', re.compile(r'-?\d+')), 2209 ('parse_sym_value', re.compile(r'[a-zA-Z_][a-zA-Z0-9_]*')), 2210 ('parse_app_value', re.compile(r'<(app)\((\?[a-z][a-z]*)\s*,' 2211 r'\s*(\?[a-z][a-z]*)\)>')), 2212 ('parse_logic_value', re.compile(r'<([^>]*)>')), 2213 ('parse_set_value', re.compile(r'{')), 2214 ('parse_tuple_value', re.compile(r'\(')), 2215 ] 2216
2217 - def parse_fstruct_value(self, s, position, reentrances, match):
2218 return self.partial_parse(s, position, reentrances)
2219
2220 - def parse_str_value(self, s, position, reentrances, match):
2221 return nltk.internals.parse_str(s, position)
2222
2223 - def parse_int_value(self, s, position, reentrances, match):
2224 return int(match.group()), match.end()
2225 2226 # Note: the '?' is included in the variable name.
2227 - def parse_var_value(self, s, position, reentrances, match):
2228 return Variable(match.group()), match.end()
2229 2230 _SYM_CONSTS = {'None':None, 'True':True, 'False':False}
2231 - def parse_sym_value(self, s, position, reentrances, match):
2232 val, end = match.group(), match.end() 2233 return self._SYM_CONSTS.get(val, val), end
2234
2235 - def parse_app_value(self, s, position, reentrances, match):
2236 """Mainly included for backwards compat.""" 2237 return LogicParser().parse('%s(%s)' % match.group(2,3)), match.end()
2238
2239 - def parse_logic_value(self, s, position, reentrances, match):
2240 parser = LogicParser() 2241 try: 2242 try: 2243 expr = parser.parse(match.group(1)) 2244 except ParseException: 2245 raise ValueError() 2246 return expr, match.end() 2247 except ValueError: 2248 raise ValueError('logic expression', match.start(1))
2249
2250 - def parse_tuple_value(self, s, position, reentrances, match):
2251 return self._parse_seq_value(s, position, reentrances, match, ')', 2252 FeatureValueTuple, FeatureValueConcat)
2253
2254 - def parse_set_value(self, s, position, reentrances, match):
2255 return self._parse_seq_value(s, position, reentrances, match, '}', 2256 FeatureValueSet, FeatureValueUnion)
2257
2258 - def _parse_seq_value(self, s, position, reentrances, match, 2259 close_paren, seq_class, plus_class):
2260 """ 2261 Helper function used by parse_tuple_value and parse_set_value. 2262 """ 2263 cp = re.escape(close_paren) 2264 position = match.end() 2265 # Special syntax fo empty tuples: 2266 m = re.compile(r'\s*/?\s*%s' % cp).match(s, position) 2267 if m: return seq_class(), m.end() 2268 # Read values: 2269 values = [] 2270 seen_plus = False 2271 while True: 2272 # Close paren: return value. 2273 m = re.compile(r'\s*%s' % cp).match(s, position) 2274 if m: 2275 if seen_plus: return plus_class(values), m.end() 2276 else: return seq_class(values), m.end() 2277 2278 # Read the next value. 2279 val, position = self.parse_value(s, position, reentrances) 2280 values.append(val) 2281 2282 # Comma or looking at close paren 2283 m = re.compile(r'\s*(,|\+|(?=%s))\s*' % cp).match(s, position) 2284 if m.group(1) == '+': seen_plus = True 2285 if not m: raise ValueError("',' or '+' or '%s'" % cp, position) 2286 position = m.end()
2287 2288 ###################################################################### 2289 #{ Demo 2290 ###################################################################### 2291
2292 -def display_unification(fs1, fs2, indent=' '):
2293 # Print the two input feature structures, side by side. 2294 fs1_lines = str(fs1).split('\n') 2295 fs2_lines = str(fs2).split('\n') 2296 if len(fs1_lines) > len(fs2_lines): 2297 blankline = '['+' '*(len(fs2_lines[0])-2)+']' 2298 fs2_lines += [blankline]*len(fs1_lines) 2299 else: 2300 blankline = '['+' '*(len(fs1_lines[0])-2)+']' 2301 fs1_lines += [blankline]*len(fs2_lines) 2302 for (fs1_line, fs2_line) in zip(fs1_lines, fs2_lines): 2303 print indent + fs1_line + ' ' + fs2_line 2304 print indent+'-'*len(fs1_lines[0])+' '+'-'*len(fs2_lines[0]) 2305 2306 linelen = len(fs1_lines[0])*2+3 2307 print indent+'| |'.center(linelen) 2308 print indent+'+-----UNIFY-----+'.center(linelen) 2309 print indent+'|'.center(linelen) 2310 print indent+'V'.center(linelen) 2311 2312 bindings = {} 2313 2314 result = fs1.unify(fs2, bindings) 2315 if result is None: 2316 print indent+'(FAILED)'.center(linelen) 2317 else: 2318 print '\n'.join(indent+l.center(linelen) 2319 for l in str(result).split('\n')) 2320 if bindings and len(bindings.bound_variables()) > 0: 2321 print repr(bindings).center(linelen) 2322 return result
2323
2324 -def interactivedemo(trace=False):
2325 import random, sys 2326 2327 HELP = ''' 2328 1-%d: Select the corresponding feature structure 2329 q: Quit 2330 t: Turn tracing on or off 2331 l: List all feature structures 2332 ?: Help 2333 ''' 2334 2335 print ''' 2336 This demo will repeatedly present you with a list of feature 2337 structures, and ask you to choose two for unification. Whenever a 2338 new feature structure is generated, it is added to the list of 2339 choices that you can pick from. However, since this can be a 2340 large number of feature structures, the demo will only print out a 2341 random subset for you to choose between at a given time. If you 2342 want to see the complete lists, type "l". For a list of valid 2343 commands, type "?". 2344 ''' 2345 print 'Press "Enter" to continue...' 2346 sys.stdin.readline() 2347 2348 fstruct_strings = [ 2349 '[agr=[number=sing, gender=masc]]', 2350 '[agr=[gender=masc, person=3rd]]', 2351 '[agr=[gender=fem, person=3rd]]', 2352 '[subj=[agr=(1)[]], agr->(1)]', 2353 '[obj=?x]', '[subj=?x]', 2354 '[/=None]', '[/=NP]', 2355 '[cat=NP]', '[cat=VP]', '[cat=PP]', 2356 '[subj=[agr=[gender=?y]], obj=[agr=[gender=?y]]]', 2357 '[gender=masc, agr=?C]', 2358 '[gender=?S, agr=[gender=?S,person=3rd]]' 2359 ] 2360 2361 all_fstructs = [(i, FeatStruct.parse(fstruct_strings[i])) 2362 for i in range(len(fstruct_strings))] 2363 2364 def list_fstructs(fstructs): 2365 for i, fstruct in fstructs: 2366 print 2367 lines = str(fstruct).split('\n') 2368 print '%3d: %s' % (i+1, lines[0]) 2369 for line in lines[1:]: print ' '+line 2370 print
2371 2372 2373 while 1: 2374 # Pick 5 feature structures at random from the master list. 2375 MAX_CHOICES = 5 2376 if len(all_fstructs) > MAX_CHOICES: 2377 fstructs = random.sample(all_fstructs, MAX_CHOICES) 2378 fstructs.sort() 2379 else: 2380 fstructs = all_fstructs 2381 2382 print '_'*75 2383 2384 print 'Choose two feature structures to unify:' 2385 list_fstructs(fstructs) 2386 2387 selected = [None,None] 2388 for (nth,i) in (('First',0), ('Second',1)): 2389 while selected[i] is None: 2390 print ('%s feature structure (1-%d,q,t,l,?): ' 2391 % (nth, len(all_fstructs))), 2392 try: 2393 input = sys.stdin.readline().strip() 2394 if input in ('q', 'Q', 'x', 'X'): return 2395 if input in ('t', 'T'): 2396 trace = not trace 2397 print ' Trace = %s' % trace 2398 continue 2399 if input in ('h', 'H', '?'): 2400 print HELP % len(fstructs); continue 2401 if input in ('l', 'L'): 2402 list_fstructs(all_fstructs); continue 2403 num = int(input)-1 2404 selected[i] = all_fstructs[num][1] 2405 print 2406 except: 2407 print 'Bad sentence number' 2408 continue 2409 2410 if trace: 2411 result = selected[0].unify(selected[1], trace=1) 2412 else: 2413 result = display_unification(selected[0], selected[1]) 2414 if result is not None: 2415 for i, fstruct in all_fstructs: 2416 if `result` == `fstruct`: break 2417 else: 2418 all_fstructs.append((len(all_fstructs), result)) 2419 2420 print '\nType "Enter" to continue unifying; or "q" to quit.' 2421 input = sys.stdin.readline().strip() 2422 if input in ('q', 'Q', 'x', 'X'): return 2423
2424 -def demo(trace=False):
2425 """ 2426 Just for testing 2427 """ 2428 #import random 2429 2430 # parser breaks with values like '3rd' 2431 fstruct_strings = [ 2432 '[agr=[number=sing, gender=masc]]', 2433 '[agr=[gender=masc, person=3]]', 2434 '[agr=[gender=fem, person=3]]', 2435 '[subj=[agr=(1)[]], agr->(1)]', 2436 '[obj=?x]', '[subj=?x]', 2437 '[/=None]', '[/=NP]', 2438 '[cat=NP]', '[cat=VP]', '[cat=PP]', 2439 '[subj=[agr=[gender=?y]], obj=[agr=[gender=?y]]]', 2440 '[gender=masc, agr=?C]', 2441 '[gender=?S, agr=[gender=?S,person=3]]' 2442 ] 2443 all_fstructs = [FeatStruct(fss) for fss in fstruct_strings] 2444 #MAX_CHOICES = 5 2445 #if len(all_fstructs) > MAX_CHOICES: 2446 #fstructs = random.sample(all_fstructs, MAX_CHOICES) 2447 #fstructs.sort() 2448 #else: 2449 #fstructs = all_fstructs 2450 2451 for fs1 in all_fstructs: 2452 for fs2 in all_fstructs: 2453 print "\n*******************\nfs1 is:\n%s\n\nfs2 is:\n%s\n\nresult is:\n%s" % (fs1, fs2, unify(fs1, fs2))
2454 2455 2456 if __name__ == '__main__': 2457 demo() 2458