org.apache.nutch.parse
Class OutlinkExtractor
java.lang.Object
org.apache.nutch.parse.OutlinkExtractor
- public class OutlinkExtractor
- extends Object
Extractor to extract Outlinks
/ URLs from plain text using Regular Expressions.
- Since:
- 0.7
- Version:
- 1.0
- Author:
- Stephan Strittmatter - http://www.sybit.de
- See Also:
- Comparison
of different regexp-Implementations ,
Overview about Java Regexp APIs
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
OutlinkExtractor
public OutlinkExtractor()
getOutlinks
public static Outlink[] getOutlinks(String plainText)
- Extracts
Outlink from given plain text.
- Parameters:
plainText - the plain text from wich URLs should be extracted.
- Returns:
- Array of
Outlinks within found in plainText
getOutlinks
public static Outlink[] getOutlinks(String plainText,
String anchor)
- Extracts
Outlink from given plain text and adds anchor
to the extracted Outlinks
- Parameters:
plainText - the plain text from wich URLs should be extracted.anchor - the anchor of the url
- Returns:
- Array of
Outlinks within found in plainText
Copyright © 2006 The Apache Software Foundation