The getPageSource() method in Selenium WebDriver is a powerful tool used to retrieve the entire HTML source code of the current page loaded in the browser session. This method is essential for debugging, content verification, and handling scenarios where direct element manipulation might be tricky.
What Does getPageSource() Do?
- Returns: The complete HTML source of the currently loaded page as a
String
. - Purpose: Enables verification of page contents, debugging test flows, and even searching for hidden or dynamic elements/text not exposed through standard locators.
- Limitations: The output represents the DOM structure as Selenium interprets it at the time of the method call. Dynamically generated content (e.g., via JavaScript) that hasn’t yet loaded may not be present unless appropriate waits are implemented.
Syntax
String pageSource = driver.getPageSource();
Here, driver
is your active WebDriver
instance.
Typical Usage Scenarios
- Text Verification: Detect if a given string or value exists anywhere in the page’s source, including hidden text or dynamically loaded content.
- Debugging: Print or save the HTML source during test failures or unexpected behaviors, making it easier to troubleshoot.
- Automation Condition Checks: Automatically decide to proceed or halt a test step based on the presence or absence of certain content in the source code.
- Comparison: Check before-and-after states of the page for UI or DOM changes.
Complete Code Example
Here’s a simple program that loads a website, fetches its HTML source, prints it, and checks if a certain keyword exists:
import org.openqa.selenium.WebDriver; import org.openqa.selenium.chrome.ChromeDriver; public class GetPageSourceExample { public static void main(String[] args) { // Set path to your chromedriver executable System.setProperty("webdriver.chrome.driver", "/path/to/chromedriver"); WebDriver driver = new ChromeDriver(); // Open a webpage driver.get("https://www.selenium.dev"); // Get the complete source of the current page String pageSource = driver.getPageSource(); // Print page source to console (or write to file) System.out.println(pageSource); // Search for specific text in the page source if(pageSource.contains("Selenium")) { System.out.println("Text found: Test Passed!"); } else { System.out.println("Text not found: Test Failed!"); } // Close the browser driver.quit(); } }
This workflow is commonly used to assert the presence of crucial text, meta tags, or other hidden content within a web page.
Important Notes
- Not Always Up-to-Date: The returned source may not reflect modifications made by JavaScript after the page has loaded, depending on browser and WebDriver implementation. Always ensure dynamic elements have fully loaded before calling this method.
- Raw Source: The formatting of the returned string may differ from the server’s original HTML, as it is a representation of the DOM state as seen by the browser.
- Alternatives for Visible Text: If you only need the visible text, fetching the text from the
<body>
element usinggetText()
may be more suitable.
Best Practices
- Wait for Essential Content: For pages with asynchronous content, use explicit waits to ensure essential elements are loaded before retrieving the page source.
- Efficient Debugging: Capture and store
getPageSource()
output during test failures to aid in root cause analysis. - Text Search: Use
.contains()
or regex to efficiently check for the existence of critical content.
The getPageSource() method in Selenium WebDriver is invaluable for complete-page validation, uncovering hidden problems, and building robust automated tests where direct element targeting is insufficient.