org.apache.nutch.parse
Class OutlinkExtractor
java.lang.Object
org.apache.nutch.parse.OutlinkExtractor
- public class OutlinkExtractor
- extends Object
Extractor to extract Outlink
s
/ URLs from plain text using Regular Expressions.
- Since:
- 0.7
- Version:
- 1.0
- Author:
- Stephan Strittmatter - http://www.sybit.de
- See Also:
- Comparison
of different regexp-Implementations ,
Overview about Java Regexp APIs
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
OutlinkExtractor
public OutlinkExtractor()
getOutlinks
public static Outlink[] getOutlinks(String plainText)
- Extracts
Outlink
from given plain text.
- Parameters:
plainText
- the plain text from wich URLs should be extracted.
- Returns:
- Array of
Outlink
s within found in plainText
getOutlinks
public static Outlink[] getOutlinks(String plainText,
String anchor)
- Extracts
Outlink
from given plain text and adds anchor
to the extracted Outlink
s
- Parameters:
plainText
- the plain text from wich URLs should be extracted.anchor
- the anchor of the url
- Returns:
- Array of
Outlink
s within found in plainText
Copyright © 2006 The Apache Software Foundation