Find Broken Links on Webpage with Selenium Automation
As an Automation Tester, you get to play with links present on the website. Links are one of the most important parts of the webpage. So, it's always important to not leave any broken links on any website. Testing the links manually could be a tough and time taking task. In this post, we'll learn to automate URL link testing.
1. What is a Broken Link?
A broken link is a URL that is not working or not reachable. There are multiple reasons for its dis-functioning. There are different HTTP error status codes that the browser shows when the link is broken. These error codes have different meanings. Let's take a look at the different HTTP status codes.
- 200 - It means success, the link is working.
- 404 - It's the most common one, which means Page not found.
- 403 - Authorization is required to access the page.
- 400 - It has different meanings, bad request, bad host, timeout, etc.
- 500 - Internal server error.
2. Why a link broke?
A link might not be working due to many reasons, like.,
- The server is down which is hosting the URL.
- Might be a human error where the wrong URL is inserted in the HTML code by mistake.
3. How to write Selenium Code to find Broken links
Selenium WebDriver has the ability to find all the links present on the web page and also to check whether they are working or not. Otherwise, it could be a tedious task to check manually whether all the links present on a website are working or not. Let's take a look at the logic which we use in our Selenium code:
- Get all the links present on the web page on the basis of HTML anchor page <a>, which is used for creating a link on the webpage.
- Store all the links inside a list.
- Send an HTTP request to each link and verify the response received.
- If the response code is 200 then the link is working and if the response is other than 200 then the link is not working.
Let's divide our solution into two parts:
- Write Selenium Code to get all links on a web page
- Write code to verify that those links are working or not
3.1. Selenium code to Get All Links from a Web page
public class GetAllLinks { public static void main(String[] args) { // Initialize Webdriver Object (Update your system's path) System.setProperty("webdriver.chrome.driver", "D:\\mydir\\chromedriver.exe"); WebDriver driver = new ChromeDriver(); driver.get("https://phptravels.com/demo/"); // Store all link elements (anchor tag elements in html) in a list java.util.Listlinks = driver.findElements(By.tagName("a")); // Print no. of links stored in list System.out.println(links.size()); for (int i = 1; i<=links.size(); i=i+1) { // Print text of all the links System.out.println(links.get(i).getText()); } } }
Code Explanation:
1. Open the webpage.
2. Create a list of type WebElements' and store all elements with tag name 'a' in it using 'findElements()'
3. Iterate over all the links using list size as its maximum value.
4. Get the text of the link by using getText() and print it.
Now, you have all the links in the list, you can perform different operations on it and put different checks on it.
3.2. Write code to Find Broken links on a Webpage
Now we already have the collection of all the links, the next task is to check whether those links are broken or not. For this purpose, we would use Java's HttpURLConnection library which is present inside the java.net package.
To check whether a URL is working or not, we create an HTTP connection to that URL using the HttpURLConnection library and we receive a response code similar to REST APIs. If the response is 200 then the URL is working fine if the response code is 400 or greater than 400 then it confirms that the URL is broken.
About HttpURLConnection
As the name suggests, A URLConnection with support for HTTP-specific features.
Each HttpURLConnection instance is used to make a single request but the underlying network connection to the HTTP server may be transparently shared by other instances.
Calling the close() methods on the InputStream or OutputStream of an HttpURLConnection after a request may free network resources associated with this instance but has no effect on any shared persistent connection.
Calling the disconnect() method may close the underlying socket if a persistent connection is otherwise idle at that time.
Now, it's time to add the verifyLinks() method to our previous code and then our program will be completed.
Selenium Code to find broken links on a Webpage:
package com.techlistic.testscripts; import java.io.IOException; import java.net.HttpURLConnection; import java.net.URL; import java.util.List; import org.openqa.selenium.By; import org.openqa.selenium.WebDriver; import org.openqa.selenium.WebElement; import org.openqa.selenium.chrome.ChromeDriver; public class BrokenLinksTest { public static void main(String[] args) throws IOException { // Update your system's path, where Chromedriver.exe is present System.setProperty("webdriver.chrome.driver", "D:\\mydir\\chromedriver.exe"); // Initialize Webdriver Object WebDriver driver = new ChromeDriver(); driver.get("https://phptravels.com/demo/"); // Store all link elements (anchor tag elements in html) in a list List<WebElement> links = driver.findElements(By.tagName("a")); System.out.println(links.size()); // Print no. of links stored in list for (int i = 1; i<=links.size(); i=i+1){ // Print text of all the links System.out.println(((WebElement) links.get(i)).getText()); // Get href attribute WebElement elem = links.get(i); String linkUrl = elem.getAttribute("href"); // Call Verify Links method verifyLinks(linkUrl); } // Close WebDriver driver.quit(); } public static void verifyLinks(String websiteLink) throws IOException { // Create URL object and pass website link URL url = new URL(websiteLink); // Create URL connection and Get the response code HttpURLConnection httpURLConnect=(HttpURLConnection)url.openConnection(); httpURLConnect.setConnectTimeout(5000); httpURLConnect.connect(); // Verify Response code if(httpURLConnect.getResponseCode() >= 400){ System.out.println(websiteLink+" - " +httpURLConnect.getResponseMessage()+"is a broken link"); } //Fetching and Printing the response code obtained else{ System.out.println(websiteLink+" - "+httpURLConnect.getResponseMessage()); } // Disconnect URL Connection httpURLConnect.disconnect(); } }
Code Explanation:
We have already explained the code for getting links from the webpage, here we'll explain the verifyLinks() method code.
- In the verifyLinks() method, we are receiving a parameter websiteLink, which is the URL to be tested.
- Then we are creating an object of the URL class and pass the websiteLink param to it.
- After that, we initialized the object of HttpURLConnection class and open the connection to the URL using HTTP protocol with open() function.
- Set the timeout, so that if communication to the URL couldn't be made within the set timeout range then it throws a timeout exception.
- In the end, we are receiving the response code using getResponseCode() function and verify whether it's 400 or greater, then print the broken link.
Conclusion
Links are one of the important components of a website. Broken links would definitely not leave a good impression on the users. So, links testing becomes an important aspect of the Test Plan. But performing it manually becomes a tough and time taking task. It's better to solve this problem by automating it using Selenium and HttpURLConnection.
Handle Multiple Tabs in Selenium << Previous || Next >> Upload/Download File in Selenium
Thanks for sharing the best information and suggestions, If you are looking for the best website design company in jodhpur, then visit Digital Suncity. Highly energetic blog, I’d love to find out some additional information.
ReplyDeleteThe information you've provided is useful because it provides a wealth of knowledge that will be highly beneficial to me. Thank you for sharing that. Keep up the good work. Web Development Company Bloomington
ReplyDelete