Skip to main content
XPath (XML Path Language) is a powerful query language for selecting nodes from XML and HTML documents. It offers more advanced capabilities than CSS selectors, including text node extraction, parent selection, and complex conditional logic.

How XPath Works

XPath uses path expressions to navigate through the hierarchical structure of XML/HTML documents. You can select elements, attributes, text nodes, and more using a syntax similar to file system paths. Key Benefits:
  • Extract text nodes directly (without HTML tags)
  • Navigate to parent elements
  • Use advanced conditional logic
  • Perfect for RSS/XML feeds
  • Support for regex matching

XPath Versions in changedetection.io

changedetection.io supports two XPath implementations:
Prefix: xpath: (or no prefix for // syntax)Engine: elementpath libraryFeatures:
  • XPath 2.0 and 3.0 support
  • Better namespace handling
  • Automatic default namespace support for RSS/Atom feeds
  • Modern expression syntax
Example:
xpath://div[@class='price']/text()
//title/text()

Basic XPath Syntax

Selecting Elements

//div
Selects all <div> elements anywhere in the document.
/html/body/div
Selects <div> elements that are direct children of <body>.

Selecting by Attributes

//div[@class='price']
Selects divs with class="price".
//a[@href]
Selects all <a> elements that have an href attribute.

Extracting Text

//h1/text()
Extracts the text content of <h1> elements (without HTML tags).
//div[@id='content']//text()
Extracts all text within the content div.

Extracting Attributes

//img/@src
Extracts the src attribute from all images.
//meta[@property='og:price:amount']/@content
Extracts Open Graph price metadata.

Practical Examples

Monitor Product Price

<div class="product">
  <h2>Gaming Laptop</h2>
  <span class="price" data-value="1299.99">$1,299.99</span>
</div>

Extract Multiple Fields

//div[@class='product']/h2/text()
//span[@class='price']/text()
//div[@class='stock']/text()
Each XPath expression on a new line extracts different fields.

Monitor RSS Feed Items

Automatic default namespace handling:
//item/title/text()
//item/description/text()
Works directly with RSS feeds without namespace handling.

Advanced Techniques

Conditional Selection

//div[contains(text(), 'In Stock')]
Selects divs containing “In Stock” text.
//h2[contains(@class, 'product')]/text()
Selects h2 elements where class contains “product”.
//div[@class='product' and @data-available='true']
Selects divs matching BOTH conditions.
//span[@class='sale' or @class='discount']
Selects spans matching EITHER condition.
//div[@class='item' and not(@class='hidden')]
Selects items that are NOT hidden.
//li[1]
Selects the first <li> element.
//li[last()]
Selects the last <li> element.
//li[position() > 2]
Selects all <li> elements after the second one.

Parent and Sibling Navigation

//span[@class='price']/parent::div
Selects the parent div of the price span.
//h2[@class='title']/following-sibling::p[1]
Selects the first paragraph following the title.
//span[@class='label']/preceding-sibling::input
Selects input elements before the label.

Regular Expression Matching

changedetection.io supports EXSLT regex functions:
//div[re:match(text(), 'Price: \$\d+\.\d{2}')]
Matches divs with text matching the price pattern.
//a[re:test(@href, 'product/\d+')]
Selects links where href matches the pattern.

Working with Namespaces

RSS/Atom Feeds

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <title>My Feed</title>
    <item>
      <title>First Item</title>
      <description>Item description</description>
    </item>
  </channel>
</rss>

Handling CDATA Sections

Some RSS feeds wrap content in CDATA:
<description><![CDATA[<p>HTML content here</p>]]></description>
changedetection.io automatically processes CDATA sections. The XPath //description/text() will extract the content.

Combining Include and Remove Filters

//article[@class='main']
This extracts the main article while removing ads, sidebars, and scripts.
Remove filters must use the xpath: or xpath1: prefix explicitly.

Testing XPath Expressions

Using Browser DevTools

  1. Open DevTools Console (F12)
  2. Use $x() to test XPath:
    $x('//div[@class="price"]/text()')
    
  3. Verify the results

Online XPath Testers

Online testers typically use XPath 1.0. Use xpath1: prefix in changedetection.io for compatibility.

Common Patterns

Pattern: Extract Plain Text

//div[@id='content']//text()
Use case: Get all text without HTML tags.

Pattern: Monitor Table Data

//table[@class='pricing']//tr[2]/td[3]/text()
Use case: Extract specific table cell (row 2, column 3).

Pattern: Get Meta Description

//meta[@name='description']/@content
Use case: Extract page meta description.

Pattern: Track Stock Status

//div[@class='availability' and contains(text(), 'In Stock')]
Use case: Check if “In Stock” appears.
//a[@class='download']/@href
Use case: Get download link URLs.

Common Pitfalls

Pitfall #1: Forgetting text()
//div[@class='price']
Returns the entire element with HTML tags.Better:
//div[@class='price']/text()
Extracts just the text content.
Pitfall #2: Case SensitivityXPath is case-sensitive!
//Div[@Class='Price']  # Wrong
//div[@class='price']  # Correct
Pitfall #3: Namespace Issues with RSSIf your XPath returns nothing from an RSS feed:Problem: Using XPath 1.0 without local-name()
xpath1://item/title/text()  # May fail
Solution 1: Use default XPath (2.0/3.0)
//item/title/text()  # Automatic namespace handling
Solution 2: Use local-name() with XPath 1.0
xpath1://*[local-name()='item']/*[local-name()='title']/text()

When to Use XPath

Good for:
  • Monitoring RSS/Atom feeds
  • Extracting text without HTML tags
  • Complex conditional filtering
  • Navigating to parent elements
  • XML documents
  • When you need regex matching
  • Extracting specific attributes
Not ideal for:
  • JSON APIs (use JSON filtering instead)
  • When CSS selectors are sufficient (simpler syntax)
  • When you want visual selector support

XPath vs CSS Selectors

FeatureXPathCSS Selectors
Text extraction//div/text()Not possible
Parent selection//span/parent::divNot possible
Attribute extraction//@hrefNot directly
Conditional logic[contains(@class, 'x')]Limited
Visual selector❌ Not available✅ Available
RSS/XML feeds✅ Excellent❌ Not suitable
Learning curveSteeperEasier

Real-World Examples

//div[@itemprop='aggregateRating']//span[@itemprop='ratingValue']/text()
Extracts structured rating data from product pages.
//item/title/text()
//item/pubDate/text()
Monitors RSS feed items for new headlines and dates.
//script[@type='application/ld+json']/text()
Extracts JSON-LD structured data, which can then be filtered with JSON filters.
//table[@class='data']//tr[position() > 1]/td[2]/text()
Extracts second column from all data rows (skipping header).

Debugging Tips

Returns Empty Results

  1. Check if you’re using the right XPath version
  2. For RSS/XML, try switching between xpath: and xpath1:
  3. Use local-name() for namespaced elements
  4. Verify element exists (check browser’s element inspector)
  5. Test in browser console: $x('your-xpath-here')

Returns Unexpected Content

  1. Add /text() to extract only text content
  2. Use [1] or [last()] to get specific positions
  3. Add more specific conditions with [@attribute='value']
  4. Check if you need // (any level) vs / (direct child)