|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Objectorg.apache.nutch.analysis.lang.NGramProfile
This class represents a ngram profile. A ngram profile is a set of the most frequently used sequences of chars in a text or set of texts. This class can be used to runs a ngram analysis over submitted text and then to build new NGramProfiles. A profile can then be serialized into a textual file, or a profile can be initialized from a ngram profile file (ngp files).
| Constructor Summary | |
NGramProfile(String name,
int minlen,
int maxlen)
Construct a new ngram profile |
|
| Method Summary | |
void |
add(StringBuffer word)
Add ngrams from a single word to this profile |
void |
add(Token t)
Add ngrams from a token to this profile |
void |
analyze(StringBuffer text)
Analyze a piece of text. |
static NGramProfile |
create(String name,
InputStream is,
String encoding)
Create a new ngram profile from an input stream. |
String |
getName()
Returns the profile name. |
float |
getSimilarity(NGramProfile another)
Calculate a score how well NGramProfiles match each other The similarity calculation is at experimental level. |
List |
getSorted()
Return a sorted list of ngrams. |
void |
load(InputStream is)
Loads a ngram profile from an InputStream. |
static void |
main(String[] args)
Main method used for command line process. |
protected void |
normalize()
Normalize the profile (calculates the ngrams frequencies) |
void |
save(OutputStream os)
Writes NGramProfile content into OutputStream. |
String |
toString()
|
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
public NGramProfile(String name,
int minlen,
int maxlen)
name - is the name of the profileminlen - is the min length of ngram sequencesmaxlen - is the max length of ngram sequences| Method Detail |
public String getName()
public void add(Token t)
t - is the Token to be addedpublic void add(StringBuffer word)
word - is the word to addpublic void analyze(StringBuffer text)
text - is the text to be analyzedprotected void normalize()
public List getSorted()
public String toString()
public float getSimilarity(NGramProfile another)
another - is the ngram profile to compare against
public void load(InputStream is)
throws IOException
is - is the InputStream to read
IOException
public static NGramProfile create(String name,
InputStream is,
String encoding)
throws UnsupportedEncodingException
name - is the name of the profile.is - is the stream to read.encoding - is the encoding of the stream.
UnsupportedEncodingException
public void save(OutputStream os)
throws IOException
os - is the stream to output to.
IOException - if something wrong occurs on the output stream.public static void main(String[] args)
NGramProfile [-create profilename filename encoding]
[-similarity file1 file2]
[-score profile-name filename encoding]
args - arguments.
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||