What Are Flaky Tests?

A flaky test is a test that passes and fails intermittently without any code changes. You run the test suite — it passes. You run it again on the same code — a test fails. You run it a third time — it passes again. This non-deterministic behavior destroys trust in the test suite and wastes enormous amounts of developer time investigating false failures.

Industry data shows that flaky tests are the number one complaint of development teams about test automation. Google reported that 1.5% of their tests were flaky, and those tests consumed 2-16% of their entire compute resources through retries. At scale, even a small flaky test percentage has massive impact.

Root Causes of Flakiness

1. Timing and Synchronization Issues (Most Common)

The test interacts with the UI before an element is ready, or checks a condition before an asynchronous operation completes.

Bad pattern:

await page.click('#submit');
// Element might not exist yet!
const message = await page.textContent('.success-message');
expect(message).toBe('Order placed');

Fixed pattern:

await page.click('#submit');
await expect(page.locator('.success-message')).toHaveText('Order placed');
// Playwright automatically waits for the element and retries

2. Test Order Dependencies

Tests that depend on other tests running first (shared state, data created by previous tests).

3. Shared Mutable State

Tests that modify global state (database, files, environment variables) without proper isolation.

4. External Service Dependencies

Tests that call real external services which may be slow, rate-limited, or occasionally unavailable.

5. Resource Contention

Tests competing for limited resources (ports, file handles, database connections) during parallel execution.

6. Time-Dependent Logic

Tests that depend on the current time, day of week, or timezone.

Fixing Flaky Tests

Replace Sleep with Explicit Waits

// BAD — hardcoded sleep
await page.click('#submit');
await page.waitForTimeout(3000);
expect(await page.textContent('.result')).toBe('Success');

// GOOD — wait for condition
await page.click('#submit');
await expect(page.locator('.result')).toHaveText('Success', { timeout: 10000 });

Ensure Test Isolation

@BeforeEach
void isolateTest() {
    database.beginTransaction();
    // Each test gets a clean state
}

@AfterEach
void cleanupTest() {
    database.rollbackTransaction();
    // All changes are undone
}

Mock External Services

await page.route('**/api/external-service/**', route => {
    route.fulfill({
        status: 200,
        body: JSON.stringify({ result: 'mocked response' })
    });
});

Flaky Test Detection Systems

Repeat Mode in PR Pipeline

Run new or modified tests multiple times before merging:

# GitHub Actions example
- name: Run new tests 20 times
  run: |
    for i in {1..20}; do
      npx playwright test --grep @new
    done

Flakiness Tracking Dashboard

Track each test’s pass/fail history over time:

Test: testCheckoutFlow
  Last 100 runs: 96 pass, 4 fail (96% reliability)
  Status: FLAKY (below 99% threshold)
  Last failure: 2024-01-15 — TimeoutError on .payment-confirmation
  Assigned to: @developer-alice

Automatic Flaky Detection

# Analyze test results across CI runs
def detect_flaky_tests(results_last_30_days):
    for test in results:
        pass_rate = test.passes / (test.passes + test.failures)
        if 0.5 < pass_rate < 0.99:
            mark_as_flaky(test)
            notify_team(test)

Quarantine System

When a test is identified as flaky:

  1. Mark it: Add a @Flaky tag or move to a quarantine suite
  2. Isolate it: Remove from the blocking CI pipeline
  3. Monitor it: Continue running in a separate non-blocking job
  4. Fix it: Assign an owner and set a deadline
  5. Restore it: Once fixed and stable for N runs, move back to the main suite
@Tag("quarantine")
@Flaky(reason = "Intermittent timeout on slow CI runners", ticket = "BUG-123")
@Test
void testCheckoutWithCoupon() {
    // This test is quarantined - it runs but does not block deployment
}

Prevention Best Practices

  1. Never use hardcoded waits — always use explicit conditions
  2. Run tests in random order — catches order dependencies early
  3. Repeat new tests — run 20-50 times before merging
  4. Mock external services — eliminate network variability
  5. Use unique test data — avoid conflicts between parallel tests
  6. Set realistic timeouts — long enough for slow CI, short enough to fail fast
  7. Review flaky metrics weekly — make flakiness visible to the team

Exercises

Exercise 1: Diagnose and Fix

Take 3 intentionally flaky tests (timing-dependent, order-dependent, and shared-state) and fix each one. Document the root cause and the fix.

Exercise 2: Build a Quarantine System

  1. Create a @Quarantine tag/label mechanism in your test framework
  2. Configure CI to run quarantined tests separately
  3. Build a script that tracks flaky test history across runs
  4. Set up alerts when a test’s reliability drops below 99%

Exercise 3: Prevention Pipeline

  1. Add repeat-mode testing for new tests in your PR pipeline
  2. Configure random test ordering
  3. Set up a flakiness dashboard tracking reliability per test
  4. Create a team policy document for handling flaky tests