How I use Claude Code to create end-to-end tests effortlessly

Feb 15, 2026

As AI agents become more and more capable of writing code, writing clear specifications becomes even more important. Specifications describe how we want an application to behave, which is useful to define how to implement it, but also to test it actually does what it should!

End-to-end tests (automated GUI tests) are neat because they are “automated manual tests”, simulating how a user actually interacts with your system. However, they sometimes get a bad rap because of the effort required to build (and maintain) them, proportionally to the benefits they bring. But what if you could create them much faster?

In this post, I’ll illustrate the workflow I use to create end-to-end tests for a web application, using tools like Claude Code, Playwright and its official MCP, and a library called Playwright-BDD. From a description of test scenarios with our own words, we let Claude Code do most of the work and get an end-to-end test implementation, with both a plain English version (for your PM!) and its associated Javascript/Typescript code to conduct the test.

Motivation

This started out from a frustration: as the development of Copilex (AI assistant for lawyers) progressed, I kept adding manual test scenarios to a “Pre-release checks” Notion page. And since we’re a small team with no QA tester, I’m the lucky one to inherit that pleasure… I eventually took a week-end to figure out how to automate this as much as possible.

Part 1: The Toolkit

Before explaining the workflow I use, I’ll first need to introduce a few concepts / tools.

Behavior-Driven Development (BDD)

When I created all my manual test scenarios, I actually did it in a very structured way, based on the Behavior-Driven Development (BDD) approach.

Instead of directly writing tests in code, BDD encourages you to first write specifications in plain language that all stakeholders can understand.

It starts with a description of the feature as a User Story, in the form:

As a [type or role of the user]
I want to [what the user wants to do]
So that [the reason the user wants to do it]

Then, we use Given-When-Then (GWT) as a semi-structured way to write down test scenarios for that feature:

Given (the initial context)
When (an action occurs)
Then (the expected outcome)

Here is a simple example to illustrate:

Feature: User Authentication
  As a user
  I want to sign up and log in
  So that I can access the chat application

  Scenario: User logs in successfully
    Given I am on the login page
    When I fill in "Email" with "PLAYWRIGHT_EMAIL_1"
    And I fill in "Password" with "PLAYWRIGHT_PASSWORD_1"
    And I click the "Sign in" button
    Then I should be logged in
    And I should see "Welcome"

Each step describes a state of the application, an action, or an expected outcome. This streamlines the process of writing tests, as each step is a clear and concise description of what should happen, which is something we can test for. By the way, this syntax is called the Gherkin language.

Playwright

Playwright is a modern end-to-end testing framework developed by Microsoft. It enables reliable automation of web applications across all modern browsers (Chromium, Firefox, and WebKit) with a single API.

It has nice features like an auto-wait functionality that eliminates flaky tests, some powerful selectors to interact with the UI, a built-in test runner with parallel execution…

The Magic Combo: Playwright-BDD

Initially, I planned to combine Cucumber.js (a popular BDD framework) with Playwright by writing Gherkin specifications and implementing Playwright code inside Cucumber step definitions.

However, I discovered something even better: the Playwright-BDD library. This library elegantly bridges the gap between BDD and Playwright by:

Reading your Gherkin feature files
Automatically generating test skeletons
Letting you fill in each step using Playwright code

Here’s how it works in practice.
You write a feature file, similar to the example above but for sign-up:

Feature: User Authentication
 As a user
 I want to sign up and log in
 So that I can access the chat application

 Scenario: User signs up successfully
   Given I am on the login page
   When I click "Sign up instead"
   And I fill in "Email" with "PLAYWRIGHT_EMAIL_1"
   And I fill in "Password" with "PLAYWRIGHT_PASSWORD_1"
   And I click the "Sign up" button
   Then I should be logged in
   And I should see "Welcome"

Then Playwright-BDD can generate a skeleton for the test file, for example:

Given('I am on the login page', async ({}) => {
  // Step: I am on the login page
  // From: tests/e2e/user-authentication.feature:7:5
});

When('I fill in {string} with {string}', async ({}, arg: string, arg1: string) => {
  // Step: I fill in {string} with {string}
  // From: tests/e2e/user-authentication.feature:8:5
});

When('I click the {string} button', async ({}, arg: string) => {
  // Step: I click the {string} button
  // From: tests/e2e/user-authentication.feature:9:5
});

Then('I should be logged in', async ({}) => {
  // Step: I should be logged in
  // From: tests/e2e/user-authentication.feature:10:5
});

...

Which you fill in like this:

// tests/steps/user-authentication.steps.ts

import { createBdd } from 'playwright-bdd';
import { expect } from '@playwright/test';

const { Given, When, Then } = createBdd();

Given('I am on the login page', async ({ page }) => {
  await page.goto('/');
  await expect(page.locator('text=Log in')).toBeVisible();
});

When('I fill in {string} with {string}', async ({ page }, field: string, value: string) => {
  await page.fill(`input[name="${field.toLowerCase()}"]`, value);
});

When('I click the {string} button', async ({ page }, buttonText: string) => {
  await page.click(`button:has-text("${buttonText}")`);
});

Then('I should be logged in', async ({ page }) => {
  await expect(page.locator('button:has-text("Sign out")')).toBeVisible();
});

...

And finally, you use Playwright to run the tests: it runs a headless browser and executes the steps in the test file, checking that the application behaves as expected.

The beauty is that these steps are reusable across different scenarios: you only need to write a step like “I click the {string} button” once, and it can be reused in other scenarios that involve clicking a button with a specific text (at least, as long as the button text is unique throughout the app).

For more concrete examples, check out the end-to-end tests from my react-sidepanes library project: https://github.com/didmar/react-sidepanes/tree/main/packages/react-sidepanes/tests/e2e

Claude Code (or your favorite AI coding assistant)

Claude Code can work with your codebase, perform web searches to look up documentation, and execute commands. Its agentic nature makes it capable of fixing its own bugs autonomously (well, most of the time).

Note that the workflow I’ll describe below is not inherently specific to Claude Code, as most AI coding assistants will offer similar features.

Part 2: The Workflow

OK, now let me walk you through what I did, step by step.

Install the tools

First, let’s install everything we need. Assuming you are in the root folder of your web app:

# Claude Code
curl -fsSL https://claude.ai/install.sh | bash
claude  # Follow the steps to configure Claude Code, you'll need a subscription

# Install Playwright and Playwright-BDD to our app's dev dependencies
npm install -D @playwright/test playwright-bdd

# Download the headless browsers for Playwright to use
npx playwright install

# Install the Playwright MCP for Claude
claude mcp add playwright npx @playwright/mcp@latest

Then, we create a playwright.config.ts file to configure both Playwright and Playwright-BDD:

// playwright.config.ts

import { defineConfig } from '@playwright/test';
import { defineBddConfig } from 'playwright-bdd';

// Path to your features and steps files.
// Depending on your preferences, you may also put them under different sub-folders.
// Here, BDD features and steps files are located in a `tests` sub-folder for each feature or component, and cross-cutting tests (e.g., accessibility) are in the e2e folder.
const testDir = defineBddConfig({
  features: ['./e2e/', './src/features/*/tests/', './src/components/*/tests/'],
  steps: ['./e2e/*.ts', './src/features/*/tests/', './src/components/*/tests/']
});

export default defineConfig({
  testDir,
  // Global setup runs once before all tests (optional)
  globalSetup: './e2e/global-setup.ts',
  // Configure how the tests are run
  fullyParallel: true,
  // If your tests have some shared state, uncomment this
  // to ensure they all run sequentially
  // workers: 1,
  // Customize the default timeout (in ms) if needed
  // timeout: 60_000,
  reporter: 'html',
  use: {
    // This should match the URL where your app is served
    baseURL: 'http://localhost:5173',
    trace: 'on-first-retry',
    screenshot: 'only-on-failure',
    video: 'retain-on-failure'
  },
  webServer: {
    command: 'npm run dev',
    port: 5173,
    // If you already have a dev server running anyway, leave as is.
    // If not, set to false so that it launches the dev server before testing.
    reuseExistingServer: true,
  }
});

In my package.json file, I added these scripts to streamline the testing workflow:

{
    "scripts": {
        ...
        "e2e": "pnpm exec bddgen && playwright test",
        "e2e:grep": "pnpm exec bddgen && playwright test --grep",
        "e2e:ui": "pnpm exec bddgen && playwright test --ui",
        "e2e:bddgen": "pnpm exec bddgen"
    }
}

e2e Runs all the tests with the headless browser.
e2e:grep: Runs the tests that match the given grep pattern, e.g. e2e:grep "Sign up".
e2e:ui: Opens Playwright’s test UI. You simply double-click on any test to run it, and see the results in the browser.
e2e:bddgen: Reads your Gherkin feature files and checks all the steps have an implementation, and if not generates the skeleton for those steps. We do it before running the tests to ensure that we did not add something in the feature files that we forgot to implement in the steps files.

Create a Browser Use and Automation skill

Claude skills are a way to extend Claude Code’s capabilities by adding a prompt that will be loaded when Claude is working in a specific context. In our case, we want to create a skill that will be loaded when Claude is working on end-to-end testing of the application.

This takes the form of a markdown file we put in the .claude/skills/browser-use-and-automation/SKILL.md.

---
name: Browser Use and Automation
description: Browser automation with Playwright and Playwright MCP. Use when user wants to create end to end (e2e) tests of the application, or for debugging the UI part of the application or perform any browser-based testing.
version: 1.0.0
author: Didier Marin
tags: [testing, automation, browser, e2e, playwright, web-testing]
---

# Log in a test user with Playwright browser automation

- Navigate to `http://localhost:3000/login`
- use the fill form tool with PLAYWRIGHT_EMAIL_1 as the Email address and PLAYWRIGHT_PASSWORD_1 as the password (those are secret keys that will be automatically replaced by their corresponding value)
- Use the click tool on the "Continue" button
- Keep the browser open for you to continue interacting
- Report success or any errors to the user

# Creating or modifying e2e tests

In that case, please refer to e2e/README.md for explanations about our E2E test setup.
In particular, look at the `# Writing end-to-end tests` section and follow the procedure described there

Explaining the workflow to Claude Code

The e2e/README.md file will document how we do the end-to-end testing for the application, for Claude Code to use (and for humans developers!).

End-to-end tests with Playwright BDD

Our end-to-end tests are written with the help of the Playwright BDD framework.

The e2e folder (this folder) contains the global setup for the tests:

Global setup file e2e/global-setup.ts that runs once before all tests and resets the test user quotas

Fixtures file e2e/fixtures.ts that automatically logs in the test user before each test

Shared step files e2e/shared.steps.ts that implement shared steps for all features

Each feature or component in our application may have a tests sub-folder which contains:

Gherkin feature files tests/<bdd-feature-name>.feature

Step files tests/<bdd-feature-name>.steps.ts that implement the steps for each corresponding feature file

Shared step files tests/shared.steps.ts that implement shared steps for the feature or component

See also playwright.config.ts for the configuration of the tests, which is located in the parent folder.

Installation

Playwright needs to be installed:

pnpm exec playwright install

Test User Configuration

We use test users that are already signed up and ready to used for the tests.

The user credentials are stored in environment variables PLAYWRIGHT_EMAIL_1 and PLAYWRIGHT_PASSWORD_1. Load them from .env.e2e.

Note: If tests run sequentially (workers: 1), only one test user (_1) is needed. If that number of workers in playwright.config.ts is higher than 1, add additional test users with corresponding suffixes (_2, _3, etc.).

Running tests

Run all end-to-end tests:

pnpm run e2e

Run a specific test:

pnpm run e2e:grep "Name of the scenario"

View test report:

pnpm exec playwright show-report

Debug tests with Playwright UI (see video, console logs, etc.):

pnpm run e2e:ui

# Can also filter which scenarios will be listed (it won't run them directly, only list them), e.g.,
pnpm run e2e:ui -- viewing-and-editing
pnpm run e2e:ui -- viewing-and-editing-documents.feature:10

Writing a new end-to-end test

The user should provide a rough outline of the scenarios that they intend to test.

Create a tests folder in your feature/component directory (if it doesn't exist)

Create a *.feature file with Gherkin syntax to describe the scenarios that they intend to test:
- The file name should be the name of the feature or component, or of the scenario if there are already feature files, e.g., viewing-and-editing.feature in the documents feature.
- Reuse the existing steps whenever possible: run grep "\(Given\|When\|Then\)(" **/shared.steps.ts to list all shared steps that are implemented. Those will be automatically discovered by Playwright BDD, so no need to import them in the step file.

IMPORTANT: Ask the user to review the feature file and provide feedback on the scenarios and steps.

Once the user is happy with the feature file, do NOT write any code yet. Instead, use Playwright MCP to manually perform the scenarios and take notes on the selectors and actions that you need to implement in the step file (write those down in a markdown file). It is very likely that the steps will need to be adjusted to how the UI actually works, so update the feature file accordingly.

Based on your notes, add all necessary ARIA attributes in order to make selection more straightforward, while making the application more accessible at the same time.

Check with the user that the changes made to the feature file are OK.

Run pnpm run e2e:bddgen to generate snippets for the new steps.

Fill in the new steps in the generated .steps.ts file, using your notes from the previous step.

Run the test with pnpm run e2e -- <feature-file-name>.feature or pnpm run e2e:grep "Name of the scenario" for a specific scenario (if there are more than one).

Fix any issues until the test passes. Keep the Playwright MCP open to help you debug any issues quickly, as this will be faster than re-running the tests multiple times!

When the test passes, ask the user for review.

Check any refactoring that could be done to improve the test, e.g. a step that is duplicated in multiple places, a step that is not DRY, etc.

Ask the user for final review, and if they are happy, you may close the Playwright MCP.

Modifying an existing end-to-end test

Follow a similar process to writing a new end-to-end test:

Understand the changes that need to be made

Adapt the feature file to reflect the changes

Manually perform the updated scenarios

Update the feature file and step file to reflect the changes

Playwright Best Practices

Refer to the Playwright Best Practices documentation for more details.

Waiting Best Practices

Never use page.waitForTimeout() - Only for debugging, never in production tests

Trust auto-waiting - Most Playwright actions (click, fill, etc.) already wait for elements to be actionable

Wait for specific conditions, not page states - Wait for observable changes (URL, element, response), not generic "page loaded"

Common Patterns

Navigation/Redirects

await page.click('#button') await page.waitForURL('**/expected-path')

Content Appears

await page.click('#load') await expect(page.locator('text=Success')).toBeVisible()

API Responses

await Promise.all([ page.waitForResponse('**/api/endpoint'), page.click('#fetch') ])

Element State Changes

await expect(page.locator('#element')).toBeVisible() await expect(page.locator('#element')).toBeHidden()

Load States (Rarely Needed)

page.goto(url) already waits for 'load' event - usually sufficient

Only specify waitUntil: 'domcontentloaded' if you need faster tests and don't need images

Avoid networkidle unless dealing with apps that have continuous network activity

view raw README.md hosted with ❤ by GitHub

The most interesting bit is the section that describes a complete workflow on how to create new end-to-end test scenarios, starting from just a rough description (that is a voice memo of me rambling about it) and guiding Claude Code through the process. The key part is telling Claude Code to “manually perform the scenarios” and take notes on the selectors and actions that you need to implement in the step file.

Thanks to this workflow, I can create many end-to-end test scenarios and not just for the most critical ones, as Claude Code takes care of the boring work of finding the proper selectors and implementing the steps.

Limitations

Of course, there are limitations to this workflow, it rarely does 100% of the work without some feedback on my part.

One limitation is how Claude tends to create slightly different steps (e.g., “I should remain on the login page” and “I am on the login page”) that are effectively the same.
Similarly, Claude would not refactor Playwright code that was repeated over and over again in various steps, even though I explicitly tell it to use a shared step file e2e/shared.steps.ts.
That said, I can guide Claude Code to refactor and consolidate code when needed.
But I think it highlights the importance of having an experienced developer in the loop, and strong guidelines when it comes to how you describe your test scenario in a structured way.

Conclusion

Claude Code is a powerful tool to generate entire features or components. It works even better with a test feedback loop to guide the implementation. In this post I showed how you can build end-to-end testing of a web application with the help of Claude Code, and grounded on a human-readable description of the scenario you are testing.

As I alluded to in the intro, there are still many possibilities to explore and go towards a full “Specification-Driven Development” workflow, where you start by writing the specifications in a human-readable format, and then let Claude Code generate both the code and the tests.

Pawel Jozefiak

Feb 19

The BDD + Playwright + Claude Code combo is solid. I've been using Playwright for browser automation in my agent workflows and the reusability of steps is a game-changer.

One thing I'd add: when your agent generates and runs tests autonomously, you need a way to see what passed and what broke without digging through terminal output. That visibility problem pushed me to build a proper dashboard for my agent's work: https://thoughts.jock.pl/p/wiz-1-5-ai-agent-dashboard-native-app-2026

The limitation you mention - Claude creating functionally equivalent but different steps - is real. My workaround: strict skill files that enforce naming conventions. Helps with deduplication across sessions.

1 reply by Didier

1 more comment...

Didier’s Substack