Data Flow Testing

Q: What is data flow testing?

Data Flow Testing is a key concept in Test Design Techniques. This lesson teaches you to understand define-use pairs and data flow anomalies, providing practical skills you can apply immediately in your testing work.

Q: How do I apply data flow testing in real projects?

Start by practicing the core techniques covered in this lesson. Specifically, you should apply all-defs, all-uses, and all-du-paths coverage criteria. Apply these skills in your current project to see immediate results.

Q: Why is data flow testing important for QA engineers?

Data Flow Testing is a core skill that employers look for in QA professionals. It directly impacts test coverage, defect detection, and team efficiency. Mastering it strengthens your Test Design Techniques capabilities and makes you more effective at delivering quality software.

Q: What should I know before learning data flow testing?

You should have a basic understanding of software testing fundamentals. Familiarity with data flow testing will help, but the lesson includes review sections for key prerequisites.

Q: How does data flow testing help my QA career?

Knowledge of data flow testing is frequently listed in QA job descriptions and interview questions. It demonstrates expertise in data flow testing, du pairs and shows you can contribute to quality assurance at a professional level. Senior roles especially value this competency.

Yuri Kan

Data Flow Testing

Master data flow testing — track how variables are defined, used, and killed throughout your code. Learn to detect data flow anomalies and apply DU-pair

Quick Answer

Data Flow Testing covers essential QA skills — after this lesson you can understand define-use pairs and data flow anomalies.

— Yuri Kan, Senior QA Lead

What You Will Learn

Understand define-use pairs and data flow anomalies
Apply all-defs, all-uses, and all-du-paths coverage criteria
Detect data flow defects such as uninitialized variables and dead definitions

Table of Contents

What Is Data Flow Testing?

Data flow testing focuses on the lifecycle of variables: where they are defined (assigned a value), where they are used (read), and where they are killed (go out of scope or are re-assigned). By tracking these events along execution paths, data flow testing reveals defects that other techniques miss.

While control flow testing asks “which paths does the code take?”, data flow testing asks “what happens to the data along those paths?”

Variable States: Define, Use, Kill

Every variable goes through three states:

Define (d): The variable receives a value.

total = 0           # definition of total
user = get_user()   # definition of user

Use (u): The variable’s value is read. Two types:

c-use (computation use): Value used in a calculation: result = total * tax_rate
p-use (predicate use): Value used in a condition: if total > 100:

Kill (k): The variable ceases to exist (goes out of scope) or is re-defined.

total = 0        # define total
total = total + 5  # use total (c-use), then kill + redefine total

Data Flow Anomalies

Data flow anomalies are suspicious patterns that often indicate bugs:

dd anomaly (define-define)

A variable is defined twice without being used between definitions.

price = get_base_price()     # define
price = get_sale_price()     # define again — first definition is wasted
discount = price * 0.1       # use

The first price assignment is dead code. Either it is an error or unnecessary.

ur anomaly (use-reference without definition)

A variable is used before being defined.

def calculate_total():
    total = total + tax    # BUG: total used before definition
    return total

du anomaly (define with no use)

A variable is defined but never used.

def process():
    result = expensive_computation()  # define
    return "done"                     # result never used

Define-Use Pairs (DU Pairs)

A DU pair is a pair (d, u) where:

d is a statement where variable v is defined
u is a statement where variable v is used
There exists at least one path from d to u that does not re-define v (a definition-clear path)

Example

def process_payment(amount, discount_code):
    price = amount                    # Line 1: define price

    if discount_code == "SAVE10":     # Line 2: use discount_code (p-use)
        discount = 0.10               # Line 3: define discount
    elif discount_code == "SAVE20":   # Line 4: use discount_code (p-use)
        discount = 0.20               # Line 5: define discount
    else:
        discount = 0                  # Line 6: define discount

    final = price * (1 - discount)    # Line 7: use price (c-use), use discount (c-use)
    return final                      # Line 8: use final (c-use)

DU pairs for price: (1, 7) DU pairs for discount_code: (param, 2), (param, 4) DU pairs for discount: (3, 7), (5, 7), (6, 7) DU pairs for final: (7, 8)

Data Flow Coverage Criteria

From weakest to strongest:

All-Defs Coverage

For every variable definition, at least one DU pair from that definition is covered.

For discount: test at least one of (3,7), (5,7), or (6,7). One test case suffices.

All-Uses Coverage (All-C-Uses/All-P-Uses)

For every variable definition, every reachable use is covered.

For discount: test ALL of (3,7), (5,7), and (6,7). Three test cases needed.

All-DU-Paths Coverage

For every DU pair, every definition-clear path between the definition and use is covered.

This is the strongest criterion but may require many tests if there are multiple paths between a definition and its use.

Practical Data Flow Analysis

In practice, you rarely draw formal data flow graphs. Instead, apply data flow thinking:

Trace each variable from creation to last use
Check for anomalies — is anything defined but not used? Used before defined? Defined twice without use?
Ensure all definitions reach uses — does every path from definition to use behave correctly?

Common Data Flow Bugs

Null pointer from conditional definition:

if condition:
    connection = create_connection()
# BUG: connection undefined if condition is False
connection.execute(query)

Stale value after re-assignment:

config = load_config("production")
if is_testing:
    config = load_config("test")
# config has correct value here

setup_database(config)
config = load_config("production")  # re-define — why?
setup_cache(config)                 # always uses production config — bug if testing?

Exercise: Data Flow Analysis

Problem 1

Identify all DU pairs and data flow anomalies in this function:

def calculate_grade(scores, curve):
    total = 0                        # Line 1
    count = 0                        # Line 2
    average = 0                      # Line 3

    for score in scores:             # Line 4
        total = total + score        # Line 5
        count = count + 1            # Line 6

    if count > 0:                    # Line 7
        average = total / count      # Line 8

    average = average + curve        # Line 9

    if average >= 90:                # Line 10
        grade = "A"                  # Line 11
    elif average >= 80:              # Line 12
        grade = "B"                  # Line 13
    else:
        grade = "C"                  # Line 14

    return grade                     # Line 15

Solution

DU pairs:

total: (1, 5-use), (5, 5-use), (1, 8) if empty, (5, 8) after loop
count: (2, 6-use), (6, 6-use), (2, 7), (6, 7), (2, 8) if empty, (6, 8)
average: (3, 9) if count==0, (8, 9) if count>0
grade: (11, 15), (13, 15), (14, 15)
scores: (param, 4)
curve: (param, 9)

Anomaly: Line 3 defines average = 0. If the scores list is empty, count=0 at line 7, so line 8 is skipped. Line 9 uses the initial average = 0, resulting in grade = curve. This may not be the intended behavior — a dd anomaly (line 3 then line 8 both define average) and a potential logic bug when scores is empty.

Test cases for all-uses:

#	scores	curve	Covers
1	[85, 95]	5	Loop executes, count>0, average computed, >=90
2	[70, 80]	0	Loop executes, 80<=avg<90
3	[50, 60]	0	Loop executes, avg<80
4	[]	10	Empty scores, count=0 path

Problem 2

Find and fix data flow bugs in this code:

def process_order(items, coupon):
    subtotal = 0
    shipping = 0

    for item in items:
        subtotal += item.price * item.quantity

    if subtotal > 50:
        shipping = 0

    if coupon:
        discount = subtotal * coupon.percent / 100

    total = subtotal - discount + shipping
    return total

Solution

Bug 1: ur anomaly — discount used before definition. If coupon is falsy, discount is never defined, but line total = subtotal - discount + shipping uses it. Fix: initialize discount = 0 before the if.

Bug 2: dd anomaly — shipping always 0. shipping is defined as 0, and then conditionally set to 0 again. The else case is missing — presumably shipping should have a non-zero value when subtotal <= 50.

Fixed code:

def process_order(items, coupon):
    subtotal = 0
    discount = 0          # Fix: initialize discount

    for item in items:
        subtotal += item.price * item.quantity

    if subtotal > 50:
        shipping = 0
    else:
        shipping = 9.99   # Fix: non-zero shipping for small orders

    if coupon:
        discount = subtotal * coupon.percent / 100

    total = subtotal - discount + shipping
    return total

Tools for Data Flow Analysis

Most data flow analysis happens during static analysis:

SonarQube — Detects dead code, unused variables, null pointer risks
SpotBugs (Java) — Finds uninitialized reads, dead stores
Pylint/Pyflakes (Python) — Reports unused variables, undefined names
ESLint (JavaScript) — no-unused-vars, no-undef rules
Coverity — Commercial tool with advanced data flow analysis

Key Takeaways

Data flow testing tracks variables through define → use → kill lifecycle
DU pairs connect variable definitions to their uses along definition-clear paths
Three coverage levels: all-defs (weakest), all-uses, all-du-paths (strongest)
Data flow anomalies (dd, ur, du) often indicate real bugs
Most common real-world bug: variable used on only one branch of a conditional
Static analysis tools automate much of data flow anomaly detection
Apply data flow thinking during code review even without formal tools

Knowledge Check

1. What is a 'define-use pair' (DU pair) in data flow testing?

2. What data flow anomaly occurs when a variable is used before it is defined?

3. Which data flow coverage criterion is the strongest?

Frequently Asked Questions

What is data flow testing?

Data Flow Testing is a key concept in Test Design Techniques. This lesson teaches you to understand define-use pairs and data flow anomalies, providing practical skills you can apply immediately in your testing work.

How do I apply data flow testing in real projects?

Start by practicing the core techniques covered in this lesson. Specifically, you should apply all-defs, all-uses, and all-du-paths coverage criteria. Apply these skills in your current project to see immediate results.

Why is data flow testing important for QA engineers?

Data Flow Testing is a core skill that employers look for in QA professionals. It directly impacts test coverage, defect detection, and team efficiency. Mastering it strengthens your Test Design Techniques capabilities and makes you more effective at delivering quality software.

What should I know before learning data flow testing?

You should have a basic understanding of software testing fundamentals. Familiarity with data flow testing will help, but the lesson includes review sections for key prerequisites.

How does data flow testing help my QA career?

Knowledge of data flow testing is frequently listed in QA job descriptions and interview questions. It demonstrates expertise in data flow testing, du pairs and shows you can contribute to quality assurance at a professional level. Senior roles especially value this competency.

Data Flow Testing

What You Will Learn

What Is Data Flow Testing? #

Variable States: Define, Use, Kill #

Data Flow Anomalies #

dd anomaly (define-define) #

ur anomaly (use-reference without definition) #

du anomaly (define with no use) #

Define-Use Pairs (DU Pairs) #

Example #

Data Flow Coverage Criteria #

All-Defs Coverage #

All-Uses Coverage (All-C-Uses/All-P-Uses) #

All-DU-Paths Coverage #

Practical Data Flow Analysis #

Common Data Flow Bugs #

Exercise: Data Flow Analysis #

Problem 1 #

Problem 2 #

Tools for Data Flow Analysis #

Key Takeaways #

Knowledge Check

Frequently Asked Questions

What Is Data Flow Testing?

Variable States: Define, Use, Kill

Data Flow Anomalies

dd anomaly (define-define)

ur anomaly (use-reference without definition)

du anomaly (define with no use)

Define-Use Pairs (DU Pairs)

Example

Data Flow Coverage Criteria

All-Defs Coverage

All-Uses Coverage (All-C-Uses/All-P-Uses)

All-DU-Paths Coverage

Practical Data Flow Analysis

Common Data Flow Bugs

Exercise: Data Flow Analysis

Problem 1

Problem 2

Tools for Data Flow Analysis

Key Takeaways