org.apache.nutch.util
Class ScoreStats
java.lang.Object
org.apache.nutch.util.ScoreStats
- public class ScoreStats
- extends Object
When we generate a fetchlist, we need to choose a "cutoff"
score, such that any scores above that cutoff will be included
in the fetchlist. Any scores below will not be. (It is too
hard to do the obvious thing, which is to sort the list of all
pages by score, and pick the top K.)
We need a good way to choose that cutoff. ScoreStats is used
during LinkAnalysis to track the distribution of scores that
we compute. We bucketize the scorespace into 2000 buckets.
the first 1000 are equally-spaced counts for the range 0..1.0
(non-inclusive). The 2nd buckets are logarithmically spaced
between 1 and Float.MAX_VALUE.
If the score is < 1, then choose a bucket by (score / 1000) and
choosing the incrementing the resulting slot.
If the score is >1, then take the base-10 log, and take the
integer floor. This should be an int no greater than 9. This
is the hundreds-place digit for the index. (Since '1' is in
the thousands-place.) Next, find where the score appears in
the range between floor(log(score)), and ceiling(log(score)).
The percentage of the distance between these two values is
reflected in the final two digits for the index.
- Author:
- Mike Cafarella
Method Summary |
void |
addScore(float score)
Increment the counter in the right place. |
void |
emitDistribution(PrintStream pout)
Print out the distribution, with greater specificity
for percentiles 90th - 100th. |
static void |
main(String[] argv)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
ScoreStats
public ScoreStats()
addScore
public void addScore(float score)
- Increment the counter in the right place. We keep
2000 different buckets. Half of them are <1, and
half are >1.
Dies when it tries to fill bucket "1132"
emitDistribution
public void emitDistribution(PrintStream pout)
- Print out the distribution, with greater specificity
for percentiles 90th - 100th.
main
public static void main(String[] argv)
throws IOException
- Throws:
IOException
Copyright © 2006 The Apache Software Foundation