|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.nutch.analysis.lang.NGramProfile
This class represents a ngram profile. A ngram profile is a set of the most frequently used sequences of chars in a text or set of texts. This class can be used to runs a ngram analysis over submitted text and then to build new NGramProfiles. A profile can then be serialized into a textual file, or a profile can be initialized from a ngram profile file (ngp files).
Constructor Summary | |
NGramProfile(String name,
int minlen,
int maxlen)
Construct a new ngram profile |
Method Summary | |
void |
add(StringBuffer word)
Add ngrams from a single word to this profile |
void |
add(Token t)
Add ngrams from a token to this profile |
void |
analyze(StringBuffer text)
Analyze a piece of text. |
static NGramProfile |
create(String name,
InputStream is,
String encoding)
Create a new ngram profile from an input stream. |
String |
getName()
Returns the profile name. |
float |
getSimilarity(NGramProfile another)
Calculate a score how well NGramProfiles match each other The similarity calculation is at experimental level. |
List |
getSorted()
Return a sorted list of ngrams. |
void |
load(InputStream is)
Loads a ngram profile from an InputStream. |
static void |
main(String[] args)
Main method used for command line process. |
protected void |
normalize()
Normalize the profile (calculates the ngrams frequencies) |
void |
save(OutputStream os)
Writes NGramProfile content into OutputStream. |
String |
toString()
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
public NGramProfile(String name, int minlen, int maxlen)
name
- is the name of the profileminlen
- is the min length of ngram sequencesmaxlen
- is the max length of ngram sequencesMethod Detail |
public String getName()
public void add(Token t)
t
- is the Token to be addedpublic void add(StringBuffer word)
word
- is the word to addpublic void analyze(StringBuffer text)
text
- is the text to be analyzedprotected void normalize()
public List getSorted()
public String toString()
public float getSimilarity(NGramProfile another)
another
- is the ngram profile to compare against
public void load(InputStream is) throws IOException
is
- is the InputStream to read
IOException
public static NGramProfile create(String name, InputStream is, String encoding) throws UnsupportedEncodingException
name
- is the name of the profile.is
- is the stream to read.encoding
- is the encoding of the stream.
UnsupportedEncodingException
public void save(OutputStream os) throws IOException
os
- is the stream to output to.
IOException
- if something wrong occurs on the output stream.public static void main(String[] args)
NGramProfile [-create profilename filename encoding] [-similarity file1 file2] [-score profile-name filename encoding]
args
- arguments.
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |