|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
---|---|
Array | General interface to arrays of arbitrary dimension. |
AttributePredicate | A predicate that (only) matches a ATTRIBUTE_VALUE. |
CharSeq | A sequence where each element is a character. |
Consumable | An object that can send its contents to a Consumer. |
Consumer | A Consumer is something that will accept data (output), and do something with it. |
ElementPredicate | A predicate that (only) matches a ELEMENT_VALUE. |
ItemPredicate | A predicate (or type) on an item in a sequence. |
NodePredicate | A predicate that (only) matches only "nodes" in the XML sense. |
PositionConsumer | An object that can be "fed" a TreePosition, and will do something with it. |
Sequence | A Sequence is an ordered list of elements. |
XConsumer | A Consumer extended with XML-specific methods. |
Class Summary | |
---|---|
AbstractFormat | |
AbstractSequence | An AbstractSequence is used to implement Sequences, and almost all classes that extend AbstractSequence will implement Sequence. |
BitVector | Simple adjustable-length vector of boolean values. |
CharBuffer | Editable character sequence using a a buffer-gap implementstion and self-adjusting position. |
ConsumerWriter | A Writer that wraps (filters) a Consumer. |
Convert | A class that encapsulates primitive<->Object conversions. |
EofClass | |
ExtPosition | A SeqPosition for sequences that need more than a Pos int for a position. |
ExtSequence | Abstract helper class for Sequences that use an ExtPosition. |
F32Vector | Simple adjustable-length vector whose elements are 32-bit floats. |
F64Vector | Simple adjustable-length vector whose elements are 64-bit floats. |
FilterConsumer | A Consumer that wraps some other Consumer. |
FString | Simple adjustable-length vector whose elements are 32-bit floats. |
FVector | Simple adjustable-length vector whose elements are Object references. |
GapVector | An array with a gap in the middle, allowing efficient insert and delete. |
GeneralArray | A class to handle general multi-dimensional arrays. |
GeneralArray1 | |
LList | Semi-abstract class for traditions Lisp-style lists. |
Pair | A "pair" object, as used in Lisp-like languages. |
PairWithPosition | A Pair with the file name and position it was read from. |
PositionManager | |
PrintConsumer | A Consumer that extends a PrintWriter. |
S16Vector | Simple adjustable-length vector of signed 16-bit integers (shorts). |
S32Vector | Simple adjustable-length vector of signed 32-bit integers (ints). |
S64Vector | Simple adjustable-length vector of signed 64-bit integers (longs). |
S8Vector | Simple adjustable-length vector of signed 8-bit integers (bytes). |
SeqPosition | A position in a sequence (list). |
SimpleVector | A SimpleVector implement as a simple array plus a current size. |
StableVector | Implements a stable sequence with sticky positions. |
Strings | Various static utility methods for general strings (CharSeqs). |
SubCharSeq | |
SubSequence | A sequence consisting of a sub-range of the elements of a base sequence. |
TreeList | A compact representation of a tree, that is a nested list structure. |
TreePosition | A position that can also go down and up in a tree. |
U16Vector | Simple adjustable-length vector of unsigned 16-bit integers (shorts). |
U32Vector | Simple adjustable-length vector of unsigned 32-bit integers (ints). |
U64Vector | Simple adjustable-length vector of unsigned 64-bit integers (longs). |
U8Vector | Simple adjustable-length vector of unsigned 8-bit integers (bytes). |
UnescapedData | Used for text that is supposed to be written out verbatim. |
VoidConsumer | A Consumer that does nothing. |
Contains utility classes and interfaces for sequences (lists), arrays, and trees.
Features:
gnu.lists
; instead you implement suitable methods
in the sequence class itself. This approach may seem strange, but it has
important performance advantages, especially in applications
where you want to go up and down nested sequences (i.e. in trees, such
as XML elements), or remember sets of positions (such as XML node-sets).TreeList
class can compactly represent
a nested hetrogenous tree. One intended usage is as a very efficient
representation for XML data.Consumer
interface is an abstraction of "data sinks" -
objects that can receive data and do something with them. It is
similar to SAX's DocumentHandler but at a more abstract (and sometimes
more efficient) level.
A position points between two elements in its "containing" sequence, or it points before the first element (if any), or after the last. Sometimes, we loosely say that the position refers to or points to an element of the sequence; in that case we mean the position is immediately before the element.
An iterator is an object that has a current position,
but that can be moved so it gets a new position.
What needs to be emphasized is that all
SeqPosition
class to implement positions and iteration.
In other "collections frameworks" each sequence class has its corresponding
iterator class, but in this framework you instead add the methods
that handle iteration to the sequence class itself, extending
the AbstractSequence
class.
A consequence of this is that if you have an unused
SeqPosition
object, you can
use it to iterate over any Sequence
you choose.
This potentially saves memory allocation, which gains you the most
when iterating over a nested sequence structure, like a tree of XML data.
We do this by requiring that any position be representable using a pair
of an int
and an Object
.
In other words, the state of an iterator is
restricted to be an (int
, Object
) pair.
Of course that is all you could need, since if you need more
state you can always create a helper object,
and store the extra state there. The assumption we make though is that for
most Sequence
s, an (int
, Object
)
would be enough, without creating any new objects (beyond,
sometimes, the SeqPosition
itself).
The int
part of a position-state is normally
named ipos
, and the Object
part of a
position-state is named xpos
.
Neither of these values have any meaning in themselves, they only have
meaning as interpreted by the specific Sequence
.
For example, ipos
might be the offset of the position in
the sequence, but it might also some "magic cookie".
If a Sequence
supports insertions and deletions,
and you want positions to be automatically adjusted, then a position
has to be a magic cookie rather than an offset.
(A typical implementation is for such a position to be an index
into a table of positions managed by the Sequence
itself.)
So a complete position is actually a (int
, Object
,
AbstractSequence
) triple, where the AbstractSequence
component is the object that interprets the (int
,
Object
) pair.
Normally, the AbstractSequence
part of a position triple is the
Sequence
"containing" the position, but it can be any
AbstractSequence
that implements the various methods that provide the semantics
for the (int
, Object
) pair.
The PositionContainer
interface is a more abstract version of SeqPosition
, in
that it can represents more than one position.
For example, instead of an array of separately allocated
SeqPosition
objects, you might use some data structure
that implements PositionContainer
,
which is abstractly a list of (int
, Object
,
Sequence
)-triples.
You typically use an iteration framework it process the elements
of a sequence in such a way that the data consumer is active
and in control, while the sequence itself (the data producer) is a
passive object. The Consumer
works the other way round: The data producer is active and in control,
while the data consumer is passive, consuming elements when requested
by the data producer. The Consumer
is a more abstract
variant of SAX's
ContentHandler
interface for processing "events";
however Consumer
is intended for general value sequences,
and is less tied to XML.
int iter = 0; // standard start position for (;;) { iter = list.nextPos(iter); if (iter == 0) break; Object value = list.getPosPrevious(iter); ... use value ...; }
The position encoding for sequences implemented using an array is simple.
Assume the next index (as returned by nextIndex
) is i.
If the position is a before/backwards position, then position value
is (
i<<1)
.
If the position is an after/forwards position, then position value
is ((
i<<1))|1
.
But what each each item in the buffer has variable size?
One example is a UTF-8 string. Another example is a buffer of text
where each "item" is a line.
Then we have the choice: Should the position value encode the
index (making nextIndex
and get
cheap), or
should it encode the offset into the buffer (making sequential access cheap)?
Since sequential access is preferred, we do the latter.
Thus for a before/backwards position, if the buffer offset for
item i is pi, then the position value
is (
pi << 1)
.
However, there is a complication when moving forwards using
a ListIterator
since using set
or remove
affects the previous item. It may be much more expensive to
calculate the buffer position of the previous item than the next item.
(Given pi it may be cheap to
calculate pi+1 but expensive to
calculate pi-1.)
Therefore, we suggest using
(((
pi-1+1)<<1))|1
,
where p-1 is defined as -1.
The addition of 1 allows us to handle the case of i=0 without
conflicting with the use of -1 to mean end-of-sequence.
Plus it makes the previous case of a simple array a special case.
|
||||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |