Let’s break down the problem
To detect broken links in an automated Playwright script, we have to do two things:- Detect and extract all links on a visited page.
- Make requests to all these URLs and evaluate their status code.
How to extract the href
attribute from all links on a page
To extract all these href
values, you could think of calling a locator like page.getByRole('link')
or page.locator('a')
, iterate over all the link elements, and access the href attribute…
But unfortunately, this approach won’t work because you can’t iterate over a locator. Playwright locators are lazy and will only be evaluated when combined with an action or assertion.
locator.all()
— access the DOM right away
If you’re going beyond the classical end-to-end testing case like we do right now, you can call the locator.all method to reach into the DOM and receive an Array of locators matching the currently present DOM elements.
With this method at hand, we can evaluate all the link target URL.
const links = page.locator(“a”)
Save requests and remove duplicate link targets
When we evaluated all the links, there’s a high chance that the collection includes duplicates. For example, a link to home(/)
will probably be included multiple times. And while these duplicates aren’t a big deal, why should we check a target URL for a good status code multiple times?
Let’s remove the duplicates by betting on a native JavaScript set. Sets have the wonderful characteristic that they only hold unique values. When we add the same value twice, it’ll be automatically ignored. We don’t have to check if a value is already in the set. Nice!
And when we already iterate over the link targets, we can also remove mailto:
and anchor links (#something
) in the same go!
<a href="">
). Clicking these links will only reload the page and shouldn’t be on your pages either. But if we filter these out, we won’t know the page has empty links.
Let’s add a soft assertion to our link mapping to get notified about empty links!
Add Playwright’s soft assertions to collect errors but keep the test running
Whenever you use a Playwright assertion withexpect
, these assertions will throw an exception and prevent your test case from running. For end-to-end test cases, this behavior makes sense. When you click a button, expect a modal to appear to fill out an included form; if the modal doesn’t show, the form-filling Playwright instructions will also fail. So why continue the test?
(expect.soft())
. Soft assertions work the same way as regular ones, but they won’t throw on failure. Errors will be collected and displayed at the end of your test case.
Normalize local link targets and guarantee absolute URLs
When we extract all thehref
values, we’ll likely discover local links such as /
or /features
. If we want to check the status code of the resulting URLs, we can’t request these because they need a proper protocol and domain.
To transform relative links to absolute URLs, we can use another native JavaScript goodie — the URL() constructor. I won’t get into much detail here, but URL()
is the powerhouse behind most JavaScript URL operations. You can pass it a URL (it doesn’t matter if it’s relative or absolute), and a base URL and new URL()
will do all the URL parsing for you. It’s pretty darn sweet!
new URL()
with the extracted link and the current page URL (page.url().href
), we can normalize all link targets.
And now we’re ready to check if all the URLs return a proper status code!
If you’re looking for the final snippet to extract link target URLs, find it on GitHub.
How to check for broken link URLs
Now that we have a set holding all the URLs, we can start making requests and check for green status codes. We could reach for Playwright’srequest
fixture, but luckily, the page object also holds a request object for us.
But what’s the difference between the two? page.request
will make requests in the context of the current page. For example, if you have a test case that logs in a user, the current page
object will hold some session cookies. And if you then make requests with page.request
, the HTTP call will include these session cookies, too.
Whenever you want to make API calls on behalf of a logged-in user, page.request
is the way to go!