Crawler
class Crawler implements Countable, IteratorAggregate
Crawler eases navigation of a list of \DOMElement objects.
Methods
Constructor.
Removes all the nodes.
Adds HTML/XML content.
Adds an HTML content to the list of nodes.
Adds an XML content to the list of nodes.
Adds an array of \DOMNode instances to the list of nodes.
Returns the previous sibling nodes of the current selection.
Returns the attribute value of the first node of the list.
Returns the node name of the first node of the list.
Returns the node value of the first node of the list.
Returns the first node of the list as HTML.
Extracts information from the list of nodes.
Filters the list of nodes with an XPath expression.
Selects links by name or alt value for clickable images.
Selects a button by name or alt value for images.
Returns a Form object for the first node in the list.
Overloads a default namespace prefix to be used with XPath and CSS expressions.
No description
Converts string for XPath expressions.
No description
No description
No description
Details
at line line 67
__construct(mixed $node = null, string $currentUri = null, string $baseHref = null)
Constructor.
at line line 78
clear()
Removes all the nodes.
at line line 94
add(DOMNodeList|DOMNode|array|string|null $node)
Adds a node to the current list of nodes.
This method uses the appropriate specialized add*() method based on the type of the argument.
at line line 119
addContent(string $content, null|string $type = null)
Adds HTML/XML content.
If the charset is not set via the content type, it is assumed to be ISO-8859-1, which is the default charset defined by the HTTP 1.1 specification.
at line line 169
addHtmlContent(string $content, string $charset = 'UTF-8')
Adds an HTML content to the list of nodes.
The libxml errors are disabled when the content is parsed.
If you want to get parsing errors, be sure to enable internal errors via libxmluseinternalerrors(true) and then, get the errors via libxmlgeterrors(). Be sure to clear errors with libxmlclear_errors() afterward.
at line line 224
addXmlContent(string $content, string $charset = 'UTF-8')
Adds an XML content to the list of nodes.
The libxml errors are disabled when the content is parsed.
If you want to get parsing errors, be sure to enable internal errors via libxmluseinternalerrors(true) and then, get the errors via libxmlgeterrors(). Be sure to clear errors with libxmlclear_errors() afterward.
at line line 254
addDocument(DOMDocument $dom)
Adds a \DOMDocument to the list of nodes.
at line line 266
addNodeList(DOMNodeList $nodes)
Adds a \DOMNodeList to the list of nodes.
at line line 280
addNodes(array $nodes)
Adds an array of \DOMNode instances to the list of nodes.
at line line 292
addNode(DOMNode $node)
Adds a \DOMNode instance to the list of nodes.
at line line 325
Crawler
eq(int $position)
Returns a node given its position in the node list.
at line line 350
array
each(Closure $closure)
Calls an anonymous function on each node of the list.
The anonymous function receives the position and the node wrapped in a Crawler instance as arguments.
Example:
$crawler->filter('h1')->each(function ($node, $i) {
return $node->text();
});
at line line 368
Crawler
slice(int $offset, int $length = null)
Slices the list of nodes by $offset and $length.
at line line 382
Crawler
reduce(Closure $closure)
Reduces the list of nodes by calling an anonymous function.
To remove a node from the list, the anonymous function must return false.
at line line 399
Crawler
first()
Returns the first node of the current selection.
at line line 409
Crawler
last()
Returns the last node of the current selection.
at line line 421
Crawler
siblings()
Returns the siblings nodes of the current selection.
at line line 437
Crawler
nextAll()
Returns the next siblings nodes of the current selection.
at line line 453
Crawler
previousAll()
Returns the previous sibling nodes of the current selection.
at line line 469
Crawler
parents()
Returns the parents nodes of the current selection.
at line line 494
Crawler
children()
Returns the children nodes of the current selection.
at line line 514
string|null
attr(string $attribute)
Returns the attribute value of the first node of the list.
at line line 532
string
nodeName()
Returns the node name of the first node of the list.
at line line 548
string
text()
Returns the node value of the first node of the list.
at line line 564
string
html()
Returns the first node of the list as HTML.
at line line 591
array
extract(array $attributes)
Extracts information from the list of nodes.
You can extract attributes or/and the node value (_text).
Example:
$crawler->filter('h1 a')->extract(array('_text', 'href'));
at line line 625
Crawler
filterXPath(string $xpath)
Filters the list of nodes with an XPath expression.
The XPath expression is evaluated in the context of the crawler, which is considered as a fake parent of the elements inside it. This means that a child selector "div" or "./div" will match only the div elements of the current crawler, not their children.
at line line 648
Crawler
filter(string $selector)
Filters the list of nodes with a CSS selector.
This method only works if you have installed the CssSelector Symfony Component.
at line line 667
Crawler
selectLink(string $value)
Selects links by name or alt value for clickable images.
at line line 682
Crawler
selectButton(string $value)
Selects a button by name or alt value for images.
at line line 701
Link
link(string $method = 'get')
Returns a Link object for the first node in the list.
at line line 717
Link[]
links()
Returns an array of Link objects for the nodes in the list.
at line line 737
Form
form(array $values = null, string $method = null)
Returns a Form object for the first node in the list.
at line line 757
setDefaultNamespacePrefix(string $prefix)
Overloads a default namespace prefix to be used with XPath and CSS expressions.
at line line 766
registerNamespace(string $prefix, string $namespace)
at line line 792
static string
xpathLiteral(string $s)
Converts string for XPath expressions.
Escaped characters are: quotes (") and apostrophe (').
Examples:
echo Crawler::xpathLiteral('foo " bar');
//prints 'foo " bar'
echo Crawler::xpathLiteral("foo ' bar");
//prints "foo ' bar"
echo Crawler::xpathLiteral('a\'b"c');
//prints concat('a', "'", 'b"c')