% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/selectors.R
\name{html_element}
\alias{html_element}
\alias{html_elements}
\title{Select elements from an HTML document}
\usage{
html_element(x, css, xpath)

html_elements(x, css, xpath)
}
\arguments{
\item{x}{Either a document, a node set or a single node.}

\item{css, xpath}{Elements to select. Supply one of \code{css} or \code{xpath}
depending on whether you want to use a CSS selector or XPath 1.0
expression.}
}
\value{
\code{html_element()} returns a nodeset the same length as the input.
\code{html_elements()} flattens the output so there's no direct way to map
the output to the input.
}
\description{
\code{html_element()} and \code{html_elements()} find HTML element using CSS selectors
or XPath expressions. CSS selectors are particularly useful in conjunction
with \url{https://selectorgadget.com/}, which makes it very easy to discover the
selector you need.
}
\section{CSS selector support}{


CSS selectors are translated to XPath selectors by the \pkg{selectr}
package, which is a port of the python \pkg{cssselect} library,
\url{https://pythonhosted.org/cssselect/}.

It implements the majority of CSS3 selectors, as described in
\url{https://www.w3.org/TR/2011/REC-css3-selectors-20110929/}. The
exceptions are listed below:
\itemize{
\item Pseudo selectors that require interactivity are ignored:
\verb{:hover}, \verb{:active}, \verb{:focus}, \verb{:target}, \verb{:visited}.
\item The following pseudo classes don't work with the wild card element, *:
\verb{*:first-of-type}, \verb{*:last-of-type}, \verb{*:nth-of-type},
\verb{*:nth-last-of-type}, \verb{*:only-of-type}
\item It supports \verb{:contains(text)}
\item You can use !=, \verb{[foo!=bar]} is the same as \verb{:not([foo=bar])}
\item \verb{:not()} accepts a sequence of simple selectors, not just a single
simple selector.
}
}

\examples{
html <- minimal_html("
  <h1>This is a heading</h1>
  <p id='first'>This is a paragraph</p>
  <p class='important'>This is an important paragraph</p>
")

html |> html_element("h1")
html |> html_elements("p")
html |> html_elements(".important")
html |> html_elements("#first")

# html_element() vs html_elements() --------------------------------------
html <- minimal_html("
  <ul>
    <li><b>C-3PO</b> is a <i>droid</i> that weighs <span class='weight'>167 kg</span></li>
    <li><b>R2-D2</b> is a <i>droid</i> that weighs <span class='weight'>96 kg</span></li>
    <li><b>Yoda</b> weighs <span class='weight'>66 kg</span></li>
    <li><b>R4-P17</b> is a <i>droid</i></li>
  </ul>
")
li <- html |> html_elements("li")

# When applied to a node set, html_elements() returns all matching elements
# beneath any of the inputs, flattening results into a new node set.
li |> html_elements("i")

# When applied to a node set, html_element() always returns a vector the
# same length as the input, using a "missing" element where needed.
li |> html_element("i")
# and html_text() and html_attr() will return NA
li |> html_element("i") |> html_text2()
li |> html_element("span") |> html_attr("class")
}
