|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.nutch.db.DistributedWebDBReader
The WebDBReader implements all the read-only parts of accessing our web database. All the writing ones can be found in WebDBWriter.
Constructor Summary | |
DistributedWebDBReader(NutchFileSystem nfs,
File root)
Open a web db reader for the named directory. |
Method Summary | |
void |
close()
Shutdown |
Link[] |
getLinks(MD5Hash md5)
Grab all the links from the given MD5 hash. |
Link[] |
getLinks(UTF8 url)
Get all the hyperlinks that link TO the indicated URL. |
Page |
getPage(String url)
Get Page from the pagedb with the given URL. |
Page[] |
getPages(MD5Hash md5)
Get all the Pages according to their content hash. |
Enumeration |
links()
Return all the links, by target URL |
static void |
main(String[] argv)
The DistributedWebDBReader.main() provides some handy utility methods for looking through the contents of the webdb. |
long |
numLinks()
Return the number of links in our db. |
int |
numMachines()
How many sections (machines) there are in this distributed db. |
long |
numPages()
Return the number of pages we're dealing with. |
boolean |
pageExists(MD5Hash md5)
Test whether a certain piece of content is in the database, but don't bother returning the Page(s) itself. |
Enumeration |
pages()
Iterate through all the Pages, sorted by URL. |
Enumeration |
pagesByMD5()
Iterate through all the Pages, sorted by MD5. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public DistributedWebDBReader(NutchFileSystem nfs, File root) throws IOException, FileNotFoundException
Method Detail |
public void close() throws IOException
close
in interface IWebDBReader
IOException
public int numMachines()
public long numPages()
numPages
in interface IWebDBReader
public long numLinks()
numLinks
in interface IWebDBReader
public Page getPage(String url) throws IOException
getPage
in interface IWebDBReader
IOException
public Page[] getPages(MD5Hash md5) throws IOException
getPages
in interface IWebDBReader
IOException
public boolean pageExists(MD5Hash md5) throws IOException
pageExists
in interface IWebDBReader
IOException
public Enumeration pages() throws IOException
pages
in interface IWebDBReader
IOException
public Enumeration pagesByMD5() throws IOException
pagesByMD5
in interface IWebDBReader
IOException
public Link[] getLinks(UTF8 url) throws IOException
getLinks
in interface IWebDBReader
IOException
public Link[] getLinks(MD5Hash md5) throws IOException
getLinks
in interface IWebDBReader
IOException
public Enumeration links() throws IOException
links
in interface IWebDBReader
IOException
public static void main(String[] argv) throws FileNotFoundException, IOException
FileNotFoundException
IOException
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |