📣 Requestly API Client – Free Forever & Open Source. A powerful alternative to Postman. Try now ->

Selenium WebDriver Guide with Examples

Azma Banu
Learn Selenium WebDriver setup, commands, locators, waits, best practices, and cross-browser testing with real examples.
Selenium WebDriver Guide with Examples

Web applications are expected to run seamlessly across different browsers and operating systems. Even a minor functionality issue can lead to poor user experience and business loss. This is where automation testing plays a crucial role.

Selenium WebDriver is one of the most widely adopted tools for browser automation, used by QA engineers and developers to test modern web applications at scale.

What is Selenium WebDriver?

Selenium WebDriver is a browser automation framework that allows interaction with web applications just like a real user would. It provides APIs that enable automation of tasks such as clicking elements, filling forms, navigating pages, and validating responses across browsers like Chrome, Firefox, Safari, and Edge. Unlike its predecessor Selenium RC, WebDriver directly communicates with browsers without a proxy server, making it faster and more efficient.

Key Features of Selenium WebDriver

Some of the most important features that make Selenium WebDriver popular are:

  • Support for multiple programming languages such as Java, Python, C#, Ruby, and JavaScript.
  • Compatibility with all major browsers.
  • Ability to handle dynamic web elements and modern web technologies like AJAX.
  • Simple APIs for writing and maintaining test scripts.
  • Extensibility through integration with frameworks like TestNG, JUnit, and NUnit.

How Selenium WebDriver Works?

Selenium WebDriver works by sending commands to the browser through a driver executable. For instance, ChromeDriver controls Chrome, GeckoDriver controls Firefox, and so on. The communication typically happens via JSON Wire Protocol or W3C WebDriver protocol.

The flow is:

  1. Test script sends a command using WebDriver API.
  2. The command is translated into HTTP requests and sent to the browser driver.
  3. The browser driver interacts with the browser and performs the action.
  4. The response is returned to the script.

Architecture of Selenium WebDriver

The architecture consists of four layers:

  • Selenium Client Libraries: Available for multiple languages.
  • JSON Wire Protocol / W3C Protocol: Defines communication between client libraries and drivers.
  • Browser Drivers: ChromeDriver, GeckoDriver, etc., that translate commands.
  • Browsers: The actual browsers (Chrome, Firefox, Edge, Safari) where automation happens.

Setting Up Selenium WebDriver

Follow these instructions to set up Selenium WebDriver:

Prerequisites

  • JDK (for Java users) or respective runtime for other languages.
  • A build tool like Maven or Gradle (Java) or pip (Python).
  • Browser-specific driver executables.

Downloading WebDriver for Different Browsers

  • Chrome → ChromeDriver Firefox → GeckoDriver
  • Edge → EdgeDriver
  • Safari → SafariDriver (pre-installed on macOS)

Configuring WebDriver in Your Project

In Java (using Maven), add the dependency:

<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>4.12.0</version></dependency>

For Python:

pip install selenium

Selenium WebDriver Commands

Here are some important Selenium WebDriver commands:

Browser Commands

  • driver.get(“URL”) – Opens the given URL.
  • driver.close() – Closes the current browser window.
  • driver.quit() – Exits the driver and closes all windows.

Navigation Commands

  • driver.navigate().to(“URL”) – Navigate to a URL.
  • driver.navigate().back() – Go back to the previous page.
  • driver.navigate().forward() – Move forward.
  • driver.navigate().refresh() – Refresh page.

WebElement Commands

  • click() – Clicks on an element.
  • sendKeys() – Enters text into a field.
  • getText() – Extracts inner text of an element.
  • isDisplayed() – Checks if an element is visible.

Locating Elements in Selenium WebDriver

Locators are essential for interacting with elements.

By ID

driver.findElement(By.id("username"));

By Name

driver.findElement(By.name("password"));

By Class Name

driver.findElement(By.className("btn-primary"));

By Tag Name

driver.findElement(By.tagName("input"));

By Link Text and Partial Link Text

driver.findElement(By.linkText("Sign Up"));
driver.findElement(By.partialLinkText("Sign"));

By CSS Selector

driver.findElement(By.cssSelector("div.container > input[type='text']"));

By XPath

driver.findElement(By.xpath("//button[@id='submit']"));

Handling Web Elements with Selenium WebDriver

Here are some ways to handle web elements with Selenium WebDriver:

Handling Text Boxes

driver.findElement(By.id("username")).sendKeys("testUser");

Handling Buttons and Click Events

driver.findElement(By.id("loginBtn")).click();

Handling Dropdowns and Checkboxes

Select dropdown = new Select(driver.findElement(By.id("country")));
dropdown.selectByVisibleText("India");

Handling Alerts and Popups

Alert alert = driver.switchTo().alert();
alert.accept();

Handling Frames and iFrames

driver.switchTo().frame("frameName");

Handling Multiple Browser Windows

for (String winHandle : driver.getWindowHandles()) {
driver.switchTo().window(winHandle);
}

Synchronization in Selenium WebDriver

Here is how to synchronize Selenium WebDriver using these methods:

Implicit Waits

driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);

Explicit Waits

WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10));
wait.until(ExpectedConditions.visibilityOfElementLocated(By.id("loginBtn")))

Fluent Waits

Wait<WebDriver> wait = new FluentWait<>(driver)
.withTimeout(Duration.ofSeconds(30))
.pollingEvery(Duration.ofSeconds(5))
.ignoring(NoSuchElementException.class);

Advanced User Interactions with Selenium WebDriver

Here are some advanced user interactions with Selenium WebDriver:

Using the Actions Class

Actions actions = new Actions(driver);
actions.moveToElement(element).click().perform();

Keyboard and Mouse Events

actions.keyDown(Keys.CONTROL).sendKeys("a").keyUp(Keys.CONTROL).perform();

Drag and Drop

actions.dragAndDrop(source, target).perform();

Taking Screenshots with Selenium WebDriver

File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
FileUtils.copyFile(screenshot, new File("screenshot.png"));

Handling File Uploads and Downloads

For uploads:

driver.findElement(By.id("fileUpload")).sendKeys("/path/to/file.txt");

For downloads, browser profile configuration is often required.

Running Selenium WebDriver Tests on Different Browsers

A critical aspect of web application testing is ensuring that the application behaves consistently across multiple browsers. Users access websites using Chrome, Firefox, Safari, Edge, and others, each with its own rendering engine. Selenium WebDriver provides dedicated browser drivers to automate these environments and verify compatibility.

Chrome with ChromeDriver

ChromeDriver is the official driver maintained by the Chromium team. It communicates with the Chrome browser using the WebDriver protocol. A basic example in Java:

WebDriver driver = new ChromeDriver();driver.get("https://example.com");

Firefox with GeckoDriver

Firefox requires GeckoDriver, which translates Selenium commands into the Marionette automation protocol.

WebDriver driver = new FirefoxDriver();driver.get("https://example.com");

Edge with EdgeDriver

Microsoft Edge provides its own driver that works similarly to ChromeDriver, as Edge is based on Chromium.

WebDriver driver = new EdgeDriver();driver.get("https://example.com");

Safari with SafariDriver

SafariDriver comes pre-installed on macOS. To enable automation, “Allow Remote Automation” must be turned on in Safari’s Develop menu.

WebDriver driver = new SafariDriver();driver.get("https://example.com");

Key Considerations for Multi-Browser Execution

  • Ensure the correct driver version matches the browser version to avoid compatibility issues.
  • Use WebDriverManager (Java) or similar libraries to automatically handle driver binaries.
  • Incorporate browser-specific capabilities when needed (e.g., headless mode in Chrome and Firefox).
  • Validate both UI rendering and functional behavior, as browsers may interpret CSS, JavaScript, and DOM differently.

Limitations of Selenium WebDriver

Here are some limitations of Selenium WebDriver:

  • No built-in support for image comparison.
  • Limited handling of captcha and OTP.
  • Requires external libraries for reporting.
  • Maintenance becomes complex for large-scale tests.

Best Practices for Selenium WebDriver Automation

Here are some best practices to be followed for Selenium WebDriver Automation:

  • Use explicit waits over thread sleeps.
  • Keep locators short and resilient.
  • Apply Page Object Model for maintainable scripts.
  • Run tests on real devices in addition to emulators.
  • Leverage CI/CD for continuous execution.

Why Run Selenium WebDriver Tests on Real Devices and Browsers?

Local environments cannot replicate real-world conditions like device hardware, OS-level rendering, and network variations. Running Selenium WebDriver tests on a real device cloud ensures more reliable results compared to emulators/simulators.

BrowserStack Automate is a testing tool that provides access to 3500+ real browsers and devices on the cloud. Teams can run parallel Selenium tests, validate cross-browser behavior, and debug issues using features like video recording and logs. This eliminates the overhead of maintaining local test infrastructure.

Conclusion

Selenium WebDriver has become the backbone of modern web automation. Its flexibility, multi-language support, and broad browser compatibility make it an industry standard. By combining WebDriver with best practices and executing tests on real devices through solutions like BrowserStack Automate, teams can achieve reliable, scalable, and high-quality test automation.

Written by
Azma Banu

Related posts