Home

03 Testing Fundamentals: Boundary Analysis and LLM-Assisted Testing

lecture testing pytest unit-testing test-pyramid equivalence-classes boundary-testing llm-assisted-development

1. Introduction: Welcome Back to Testing - From Theory to Practice

Welcome back! In Part 1 of Chapter 03 (Testing Fundamentals), you learned the fundamentals of software testing:

What We Covered in Part 1:

  1. Why Testing Matters
    • Code quality (Ruff, Pyright) ≠ Code correctness (tests)
    • CI can be green, but your code can still have critical bugs
    • Tests catch logic errors that static analysis tools miss
  2. The Testing Pyramid
    • 70% Unit tests (fast, focused, cheap)
    • 20% Integration tests (moderate speed)
    • 10% E2E tests (slow, expensive)
    • Not the inverted “test cone” anti-pattern!
  3. Unit Testing Fundamentals
    • AAA Pattern: Arrange-Act-Assert
    • Why pytest is the industry standard
    • Writing your first unit test
    • The impossibility of exhaustive testing
  4. Clean Code Principles for Testing
    • One concept per test
    • One assert per test (with pragmatic exceptions)
    • Descriptive test names that tell you what’s broken
  5. Equivalence Classes
    • Simple floats → 5 equivalence classes
    • Multiple parameters → 10+ classes (complexity explosion!)
    • Array inputs → Structural + Value dimensions
    • Complex functions like find_intersection() → 600+ potential combinations!

The Challenge We Left You With:

You now know WHY testing matters and HOW to write basic unit tests. But you’re probably thinking:

Today’s Mission: From Testing Basics to Testing Mastery

In Part 2, we’re going to solve these problems with three powerful techniques:

  1. Boundary Value Analysis - Where bugs actually hide (not in the middle of equivalence classes!)
  2. LLM-Assisted Testing - Break the “test cone” by using AI for boilerplate (while keeping human oversight)
  3. Integration Testing & Workflow - Make testing a natural part of your development process

What Makes Part 2 Different:

Part 1 was conceptual - understanding what tests are and why they matter.

Part 2 is practical - learning specific techniques to write better tests faster, and integrating testing into your real workflow.

By the end of today’s lecture, you’ll be able to:

The Road Ahead:

✅ Part 1: Foundation (Why test? How to write basic unit tests?)
→ Part 2: Mastery (Where do bugs hide? How to test efficiently?)
→ Chapter 03 (TDD and CI): TDD & CI (Write tests FIRST, automate everything)

Let’s dive into the techniques that separate beginners from professionals: boundary value analysis and LLM-assisted testing!


2. Boundary Value Analysis: Where Bugs Hide

Observation: Bugs often lurk at the boundaries between equivalence classes.

Why boundaries? Off-by-one errors, floating-point precision, edge cases in conditional logic.

Let’s apply boundary analysis to all our examples.


2.1 Boundaries for Simple Float Example: reciprocal(x)

Function: reciprocal(x) = 1/x

Equivalence Classes and Their Boundaries:

Equivalence Class Boundary Values to Test
Positive numbers x = 0.0001 (near zero), x = 1.0, x = 1000000.0 (very large)
Negative numbers x = -0.0001 (near zero), x = -1.0, x = -1000000.0
Zero x = 0.0, x = 1e-100 (extremely close to zero)
Extreme boundaries sys.float_info.max, sys.float_info.min, float('inf'), float('nan')

Python Float Boundaries (Similar to C/C++ FLT_MAX, DBL_MIN):

Python floats are 64-bit IEEE 754 doubles.

What is IEEE 754?

IEEE 754 is the international standard for floating-point arithmetic, defining how computers represent and compute with decimal numbers. It specifies:

Why this matters for testing: Understanding IEEE 754 boundaries helps you write tests for edge cases that occur at the limits of numerical precision. All modern programming languages (Python, C, C++, Java, JavaScript) follow this standard.

Official References:

Python provides the sys.float_info module to access platform-specific float limits defined by IEEE 754:

Python Constant C/C++ Equivalent Typical Value Description
sys.float_info.max DBL_MAX 1.7976931348623157e+308 Maximum representable finite float
sys.float_info.min DBL_MIN 2.2250738585072014e-308 Minimum positive normalized float
sys.float_info.epsilon DBL_EPSILON 2.220446049250313e-16 Difference between 1.0 and next representable float
math.inf or float('inf') INFINITY inf Positive infinity (non-finite value)
-math.inf or float('-inf') -INFINITY -inf Negative infinity (non-finite value)
math.nan or float('nan') NAN nan Not a Number (undefined result)
math.ulp(0.0) ≈ 5e-324 Smallest positive subnormal (denormalized) float
math.ulp(x) Varies Unit in the Last Place (spacing to next representable float)

Critical Distinction: math.inf vs. sys.float_info.max

These are fundamentally different and test different aspects of your code:

  1. math.inf (or float('inf')) - IEEE-754 Positive Infinity
    • NOT a number you can store in finite mantissa/exponent
    • A special value that compares larger than every finite float
    • Propagates through many operations: inf + 1 = inf, inf * 2 = inf
    • Result of overflow: sys.float_info.max * 2 = inf
    • Use to test: How your code handles infinity (sentinel logic, clamping, division by zero results)
  2. sys.float_info.max (≈ 1.8e308) - Largest Finite Float
    • The largest finite IEEE-754 double
    • The “upper boundary” before you overflow to infinity
    • Use to test: Finite range limits (overflow/underflow edges)

Example illustrating the difference:

import sys
import math

# These are NOT the same!
print(math.inf > sys.float_info.max)  # True - infinity is bigger than any finite
print(sys.float_info.max * 2)         # inf - overflow!
print(1 / math.inf)                   # 0.0 - reciprocal of infinity
print(1 / sys.float_info.max)         # ≈ 5.6e-309 - tiny but finite!

Important distinctions:

Other Useful Finite Boundaries (Often Overlooked):

Boundary Type Python Expression Typical Value When to Test
Smallest positive normalized sys.float_info.min ≈ 2.2e-308 Testing underflow thresholds, relative error near zero
Smallest positive subnormal math.ulp(0.0) ≈ 5e-324 Catching denormal handling, flush-to-zero surprises
Unit in Last Place (ULP) math.ulp(x) Varies by x Checking "next representable" values in precision-sensitive code
Next float toward direction math.nextafter(x, y) Varies Razor-edge boundaries (e.g., stepping from max to infinity)
Most negative finite -sys.float_info.max ≈ -1.8e308 Lower bound before underflow to -inf

Practical Guidance: Which Should I Use in Tests?

To test infinity handling (non-finite values):

import math

# Use math.inf (preferred for readability)
def test_reciprocal_handles_infinity():
    result = reciprocal(math.inf)
    assert result == 0.0  # 1/inf = 0

To test finite range limits (overflow/underflow edges):

import sys

def test_reciprocal_handles_max_float():
    result = reciprocal(sys.float_info.max)
    # Reciprocal of largest finite → extremely small
    assert result > 0
    assert result < sys.float_info.min  # Might underflow to subnormal

To test precision limits (ULP testing):

import math

def test_reciprocal_precision_at_boundary():
    # Test the "next" representable value after 1.0
    x = math.nextafter(1.0, 2.0)  # Slightly larger than 1.0
    result = reciprocal(x)
    # Should be slightly less than 1.0
    assert result < 1.0

Tip for NumPy Users:

If your code uses NumPy arrays, use np.finfo() for consistency with the dtype under test:

import numpy as np

# For float64 (Python float equivalent)
fi = np.finfo(np.float64)
print(fi.max)        # Largest finite (≈ 1.8e308)
print(fi.tiny)       # Smallest positive normalized (≈ 2.2e-308)
print(fi.eps)        # Machine epsilon (≈ 2.2e-16)

# Use with nextafter
x_next_to_max = np.nextafter(fi.max, np.inf)  # → inf

# For float32 (if working with single precision)
fi32 = np.finfo(np.float32)
print(fi32.max)      # ≈ 3.4e38 (much smaller than float64!)

This keeps everything consistent with the NumPy dtype you’re actually using.

Comprehensive Boundary Test Suite:

import sys
import math
import pytest

# Define constants for readability (following best practices)
INF = math.inf
NINF = -math.inf
NAN = math.nan
FMAX = sys.float_info.max
FMIN_NORM_POS = sys.float_info.min
FMIN_SUB_POS = math.ulp(0.0)
EPSILON = sys.float_info.epsilon


class TestReciprocalBoundaries:
    """Comprehensive boundary tests for reciprocal(x) function."""

    # BASIC BOUNDARIES (near zero, typical values)

    def test_reciprocal_boundary_near_zero_positive(self):
        """Boundary: Very small positive number"""
        result = reciprocal(1e-6)  # 0.000001
        assert result == pytest.approx(1e6, rel=1e-9)  # Should be 1000000

    def test_reciprocal_boundary_near_zero_negative(self):
        """Boundary: Very small negative number"""
        result = reciprocal(-1e-6)
        assert result == pytest.approx(-1e6, rel=1e-9)

    def test_reciprocal_boundary_exactly_zero(self):
        """Boundary: Exactly zero (should raise exception)"""
        with pytest.raises(ZeroDivisionError):
            reciprocal(0.0)

    def test_reciprocal_boundary_very_large_positive(self):
        """Boundary: Very large positive number"""
        result = reciprocal(1e10)
        assert result == pytest.approx(1e-10, rel=1e-9)

    # FINITE RANGE BOUNDARIES (sys.float_info limits)

    def test_reciprocal_boundary_max_finite_float(self):
        """Boundary: Largest finite float (sys.float_info.max ≈ 1.8e308)

        Tests overflow prevention at upper finite boundary.
        Reciprocal of max finite → extremely small result (may underflow to subnormal).
        """
        result = reciprocal(FMAX)
        assert result > 0, "Reciprocal of max float should be positive"
        assert result < FMIN_NORM_POS, "Result underflows below normalized range (subnormal)"
        # Result is approximately 5.6e-309 (subnormal/denormalized)

    def test_reciprocal_boundary_min_normalized_float(self):
        """Boundary: Smallest positive normalized float (sys.float_info.min ≈ 2.2e-308)

        Tests underflow-to-overflow transition.
        Reciprocal of min normalized → extremely large result (overflows to infinity).
        """
        result = reciprocal(FMIN_NORM_POS)
        assert result == INF, "Reciprocal of min normalized float overflows to infinity"

    def test_reciprocal_boundary_min_subnormal_float(self):
        """Boundary: Smallest positive subnormal float (math.ulp(0.0) ≈ 5e-324)

        Tests denormalized number handling.
        Reciprocal of smallest representable → massive overflow to infinity.
        """
        result = reciprocal(FMIN_SUB_POS)
        assert result == INF, "Reciprocal of min subnormal overflows to infinity"

    def test_reciprocal_boundary_epsilon(self):
        """Boundary: Machine epsilon (sys.float_info.epsilon ≈ 2.2e-16)

        Tests precision limit behavior.
        Epsilon is smallest value where 1.0 + epsilon != 1.0
        """
        result = reciprocal(EPSILON)
        expected = 1.0 / EPSILON  # ≈ 4.5e15
        assert result == pytest.approx(expected, rel=1e-9)

    def test_reciprocal_boundary_most_negative_finite(self):
        """Boundary: Most negative finite float (-sys.float_info.max ≈ -1.8e308)

        Tests lower finite boundary (there is no sys.float_info.min_negative).
        """
        result = reciprocal(-FMAX)
        assert result < 0, "Reciprocal of most negative finite should be negative"
        assert result > -FMIN_NORM_POS, "Result is tiny negative (subnormal range)"

    # NON-FINITE BOUNDARIES (infinities and NaN)

    def test_reciprocal_boundary_positive_infinity(self):
        """Boundary: Positive infinity (non-finite value)

        Tests special value handling (not overflow prevention).
        1 / inf = 0 (mathematical limit)
        """
        result = reciprocal(INF)
        assert result == 0.0, "Reciprocal of positive infinity should be zero"

    def test_reciprocal_boundary_negative_infinity(self):
        """Boundary: Negative infinity (non-finite value)

        1 / -inf = -0 (negative zero)
        """
        result = reciprocal(NINF)
        # Python treats 0.0 == -0.0 as True (IEEE 754 behavior)
        assert result == 0.0 or result == -0.0, "Reciprocal of -inf should be zero"

    def test_reciprocal_boundary_nan(self):
        """Boundary: Not a Number (undefined/invalid input)

        NaN propagates through operations (doesn't crash).
        Important: nan == nan is always False! Use math.isnan().
        """
        result = reciprocal(NAN)
        assert math.isnan(result), "Reciprocal of NaN should be NaN (propagates)"

    # RAZOR-EDGE BOUNDARIES (using math.nextafter)

    def test_reciprocal_boundary_next_after_max_toward_inf(self):
        """Boundary: Next float after max (toward infinity)

        Uses math.nextafter() to test the exact transition point.
        Next float after FMAX toward infinity IS infinity.
        """
        x_after_max = math.nextafter(FMAX, INF)
        assert x_after_max == INF, "Next float after max toward inf is infinity"
        result = reciprocal(x_after_max)
        assert result == 0.0, "Reciprocal of inf is 0"

    def test_reciprocal_boundary_next_after_one_toward_two(self):
        """Boundary: Next representable float after 1.0

        Tests ULP (Unit in Last Place) sensitivity.
        1.0 + epsilon is the next representable value after 1.0
        """
        x_just_above_one = math.nextafter(1.0, 2.0)
        result = reciprocal(x_just_above_one)
        # Reciprocal should be slightly less than 1.0
        assert result < 1.0, f"Expected < 1.0, got {result}"
        assert result == pytest.approx(1.0, rel=1e-14), "Should be very close to 1.0"

    def test_reciprocal_boundary_ulp_sensitivity(self):
        """Boundary: Testing ULP (Unit in Last Place) around a value

        Demonstrates floating-point granularity.
        """
        x = 1000.0
        ulp_at_x = math.ulp(x)  # Spacing at 1000.0

        result1 = reciprocal(x)
        result2 = reciprocal(x + ulp_at_x)  # Next representable value

        assert result1 != result2, "Adjacent floats should produce different reciprocals"
        assert result1 > result2, "Larger input → smaller reciprocal"

Introducing pytest’s @pytest.mark.parametrize - Avoid Test Duplication

So far, we’ve written separate test functions for each boundary. But what if you need to test the same logic with multiple input values?

The Problem: Repetitive Tests

# ❌ TEDIOUS: Writing 6 nearly-identical tests
def test_reciprocal_max_float():
    result = reciprocal(FMAX)
    assert result > 0 and result < FMIN_NORM_POS

def test_reciprocal_negative_max_float():
    result = reciprocal(-FMAX)
    assert result < 0 and result > -FMIN_NORM_POS

def test_reciprocal_min_norm_float():
    result = reciprocal(FMIN_NORM_POS)
    assert result == INF

# ... 3 more similar tests ...

Problems with this approach:

The Solution: Parametrized Tests

pytest provides @pytest.mark.parametrize to run the same test function with different input data:

# ✅ CLEAN: One test function, multiple inputs
@pytest.mark.parametrize("x, expected_behavior", [
    (FMAX, "underflow to subnormal"),
    (-FMAX, "underflow to negative subnormal"),
    (FMIN_NORM_POS, "overflow to infinity"),
    (FMIN_SUB_POS, "overflow to infinity"),
    (INF, "exact zero"),
    (NINF, "exact zero"),
])
def test_reciprocal_extreme_boundaries_summary(x, expected_behavior):
    """Parametrized test covering all extreme boundaries with descriptions."""
    result = reciprocal(x)

    if "subnormal" in expected_behavior:
        assert math.isfinite(result), f"{expected_behavior}: result should be finite"
        assert abs(result) < FMIN_NORM_POS, f"{expected_behavior}: should be in subnormal range"
    elif "overflow to infinity" in expected_behavior:
        assert result == INF, f"{expected_behavior}: should overflow to infinity"
    elif "exact zero" in expected_behavior:
        assert result == 0.0, f"{expected_behavior}: should be exactly zero"

How it works:

  1. Decorator syntax: @pytest.mark.parametrize("param1, param2", [...])
    • First argument: parameter names (comma-separated string)
    • Second argument: list of tuples (one tuple per test case)
  2. Test function receives parameters: def test_function(param1, param2):
    • pytest injects each tuple’s values into the function
  3. pytest runs the test multiple times: Once per tuple in the list
    • Each run is treated as a separate test
    • Test output shows which combination passed/failed

Example pytest output:

$ pytest test_boundaries.py::test_reciprocal_extreme_boundaries_summary -v

test_boundaries.py::test_reciprocal_extreme_boundaries_summary[FMAX-underflow to subnormal] PASSED
test_boundaries.py::test_reciprocal_extreme_boundaries_summary[-FMAX-underflow to negative subnormal] PASSED
test_boundaries.py::test_reciprocal_extreme_boundaries_summary[FMIN_NORM_POS-overflow to infinity] PASSED
test_boundaries.py::test_reciprocal_extreme_boundaries_summary[FMIN_SUB_POS-overflow to infinity] PASSED
test_boundaries.py::test_reciprocal_extreme_boundaries_summary[INF-exact zero] PASSED
test_boundaries.py::test_reciprocal_extreme_boundaries_summary[NINF-exact zero] PASSED

================================== 6 passed in 0.03s ==================================

Key benefits:

When to use parametrize:

DO use when:

DON’T use when:

More examples:

# Testing multiple equivalence classes with same validation
@pytest.mark.parametrize("angle", [-10, -20, -30, -45])
def test_find_intersection_downward_angles_all_succeed(angle):
    """Test that all downward angles find intersection"""
    x, y, dist = find_intersection(x_road, y_road, angle)
    assert x is not None
    assert dist > 0

# Testing boundary values with expected results
@pytest.mark.parametrize("x, expected", [
    (1.0, 1.0),
    (2.0, 0.5),
    (0.5, 2.0),
    (10.0, 0.1),
])
def test_reciprocal_normal_values(x, expected):
    """Test reciprocal with normal values"""
    result = reciprocal(x)
    assert result == pytest.approx(expected, rel=1e-9)

Complete parametrized test for our extreme boundaries:

# PARAMETRIZED TEST (pytest feature for multiple similar tests)

@pytest.mark.parametrize("x, expected_behavior", [
    (FMAX, "underflow to subnormal"),
    (-FMAX, "underflow to negative subnormal"),
    (FMIN_NORM_POS, "overflow to infinity"),
    (FMIN_SUB_POS, "overflow to infinity"),
    (INF, "exact zero"),
    (NINF, "exact zero"),
])
def test_reciprocal_extreme_boundaries_summary(x, expected_behavior):
    """Parametrized test covering all extreme boundaries with descriptions."""
    result = reciprocal(x)

    if "subnormal" in expected_behavior:
        assert math.isfinite(result), f"{expected_behavior}: result should be finite"
        assert abs(result) < FMIN_NORM_POS, f"{expected_behavior}: should be in subnormal range"
    elif "overflow to infinity" in expected_behavior:
        assert result == INF, f"{expected_behavior}: should overflow to infinity"
    elif "exact zero" in expected_behavior:
        assert result == 0.0, f"{expected_behavior}: should be exactly zero"

Why Test These Extreme Boundaries?

Boundary Type What It Tests Real-World Scenario
sys.float_info.max Finite overflow prevention Astronomical distances, cosmology calculations
sys.float_info.min Underflow-to-overflow transition Quantum physics, particle simulations
math.ulp(0.0) Subnormal/denormal handling High-precision scientific computing
sys.float_info.epsilon Precision limits near 1.0 Financial calculations, numerical stability
math.inf Non-finite value handling Division by zero results, sentinel values
math.nan Invalid input propagation Missing data in datasets, undefined operations
math.nextafter() Razor-edge transitions Verifying exact boundary behavior

Key Insights:

Common Pitfall:

# ❌ WRONG: Testing only infinity
def test_reciprocal_large_values():
    result = reciprocal(float('inf'))
    assert result == 0.0

# Problem: This doesn't test finite overflow (sys.float_info.max)!
# A function could handle inf correctly but crash on FMAX.
# ✅ CORRECT: Test BOTH finite and non-finite boundaries
def test_reciprocal_max_finite():
    result = reciprocal(sys.float_info.max)  # Finite
    assert result < sys.float_info.min  # Underflow behavior

def test_reciprocal_infinity():
    result = reciprocal(math.inf)  # Non-finite
    assert result == 0.0  # Special value handling

2.2 Boundaries for Multi-Parameter: reciprocal_sum(x, y, z)

Function: reciprocal_sum(x, y, z) = 1/(x+y+z)

Boundaries to test:

Boundary Condition Test Values Why Important
Sum exactly zero (1.0, -0.5, -0.5) Division by zero
Sum near zero (positive) (1.0, -0.9999, 0.0) Large positive result
Sum near zero (negative) (-1.0, 0.9999, 0.0) Large negative result
One parameter zero (0.0, 1.0, 1.0) Doesn't affect sum
Two parameters zero (0.0, 0.0, 2.0) Only one matters
All parameters zero (0.0, 0.0, 0.0) Division by zero

Boundary Tests:

def test_reciprocal_sum_boundary_sum_exactly_zero():
    """Boundary: Sum cancels exactly to zero"""
    with pytest.raises(ZeroDivisionError):
        reciprocal_sum(1.0, -0.5, -0.5)

def test_reciprocal_sum_boundary_sum_near_zero_positive():
    """Boundary: Sum is very small positive"""
    result = reciprocal_sum(1.0, -0.9999, 0.0)  # sum = 0.0001
    assert result == pytest.approx(10000.0, rel=1e-5)

def test_reciprocal_sum_boundary_all_zeros():
    """Boundary: All parameters zero"""
    with pytest.raises(ZeroDivisionError):
        reciprocal_sum(0.0, 0.0, 0.0)

def test_reciprocal_sum_boundary_two_zeros():
    """Boundary: Two parameters zero, one non-zero"""
    result = reciprocal_sum(0.0, 0.0, 0.5)  # sum = 0.5
    assert result == pytest.approx(2.0, rel=1e-10)

2.3 Boundaries for Array Example: array_sum(arr)

Function: array_sum(arr) - sum of array elements

Structural Boundaries:

Boundary Test Case
Empty array len(arr) = 0
Single element len(arr) = 1
Two elements len(arr) = 2 (smallest non-trivial case)

Value Boundaries:

Boundary Test Case
All zeros [0.0, 0.0, 0.0]
Contains one zero [0.0, 1.0, 2.0]
All same value [5.0, 5.0, 5.0]
Alternating signs near zero [1e-10, -1e-10, 1e-10]

Boundary Tests:

def test_array_sum_boundary_empty_array():
    """Boundary: Empty array (length = 0)"""
    with pytest.raises(ValueError):
        array_sum(np.array([]))

def test_array_sum_boundary_single_element():
    """Boundary: Single element (length = 1)"""
    result = array_sum(np.array([5.0]))
    assert result == 5.0

def test_array_sum_boundary_two_elements():
    """Boundary: Two elements (minimum non-trivial)"""
    result = array_sum(np.array([3.0, 4.0]))
    assert result == 7.0

def test_array_sum_boundary_all_zeros():
    """Boundary: All elements are zero"""
    result = array_sum(np.array([0.0, 0.0, 0.0]))
    assert result == 0.0

def test_array_sum_boundary_alternating_near_zero():
    """Boundary: Values near zero with alternating signs"""
    result = array_sum(np.array([1e-10, -1e-10, 1e-10]))
    assert abs(result - 1e-10) < 1e-15  # Sum should be close to 1e-10

2.3.1 What About Large Arrays?

Great question! You might be wondering: “We tested 0, 1, 2 elements… but what about 1000 or 1,000,000 elements?”

The Short Answer: Unit tests should generally use small, representative arrays (typically 10-100 elements), not massive ones. Large arrays belong in performance tests, not unit tests.

Why Not Test with Huge Arrays in Unit Tests?

Problem Impact Example
Slow tests Unit tests should run in milliseconds. Large arrays slow them to seconds. Array of 1 million → 100x slower test suite
Doesn't find more bugs If your code works with 10 elements, it usually works with 10,000. Summation logic doesn't change with size
Noise in test output Hard to debug failures with massive data "Array mismatch at index 47,293" - good luck!
Memory usage CI servers might run out of memory 10 tests × 1M floats × 8 bytes = 80 MB per test run

When DO You Need Large Arrays?

DO test large arrays when:

  1. Algorithm complexity matters (e.g., O(n²) vs O(n))
    def test_sort_performance_scales_linearly():
        """Verify sorting doesn't degrade to O(n²)"""
        import time
    
        # Small array
        arr_small = np.random.rand(100)
        start = time.time()
        sort_function(arr_small)
        time_small = time.time() - start
    
        # Large array (10x bigger)
        arr_large = np.random.rand(1000)
        start = time.time()
        sort_function(arr_large)
        time_large = time.time() - start
    
        # O(n log n) should be ~13x slower (10 * log₂(10) ≈ 13)
        # NOT 100x slower (which would be O(n²))
        assert time_large < time_small * 20, "Sorting appears to be O(n²)!"
    
  2. Memory allocation bugs (buffer overflows, off-by-one at specific sizes)
    def test_array_processing_handles_power_of_two_sizes():
        """Test sizes like 256, 512, 1024 (common buffer boundaries)"""
        for size in [256, 512, 1024, 2048]:
            arr = np.random.rand(size)
            result = process_array(arr)
            assert len(result) == size, f"Failed at size {size}"
    
  3. Regression test for a specific bug that only appeared with large data
    def test_regression_issue_42_overflow_at_10000_elements():
        """Regression: Integer overflow occurred at exactly 10,000 elements
    
        Bug report: https://github.com/yourproject/issues/42
        Fixed by switching from int32 to int64 accumulator
        """
        arr = np.ones(10_000)  # Specific size that caused the bug
        result = array_sum(arr)
        assert result == 10_000.0, "Overflow regression detected!"
    

Best Practice: Separate Performance Tests

# tests/test_geometry.py (unit tests - FAST)
def test_find_intersection_basic():
    """Unit test: Small representative array"""
    x_road = np.array([0, 10, 20])  # Only 3 points
    y_road = np.array([0, 2, 4])
    x, y, dist = find_intersection(x_road, y_road, -10.0)
    assert x is not None

# tests/test_geometry_performance.py (performance tests - SLOW)
@pytest.mark.slow  # Mark as slow so we can skip during development
def test_find_intersection_large_road():
    """Performance test: Realistic large road dataset"""
    x_road = np.linspace(0, 1000, 10_000)  # 10,000 points
    y_road = np.sin(x_road / 10) * 5

    import time
    start = time.time()
    x, y, dist = find_intersection(x_road, y_road, -10.0)
    elapsed = time.time() - start

    assert x is not None
    assert elapsed < 0.1, f"Too slow: {elapsed:.3f}s for 10k points"

Run performance tests separately:

# Normal development: Skip slow tests (runs in seconds)
$ pytest tests/ -v

# Before merging PR: Run all tests including slow ones (runs in minutes)
$ pytest tests/ -v --run-slow

# Just performance tests
$ pytest tests/test_geometry_performance.py -v

Property-Based Testing with Hypothesis (Advanced)

For thorough testing with varying sizes, use Hypothesis:

from hypothesis import given, strategies as st

@given(st.lists(st.floats(allow_nan=False, allow_infinity=False),
                min_size=0, max_size=100))
def test_array_sum_any_valid_array(arr):
    """Property: array_sum should never crash on valid float arrays

    Hypothesis will automatically generate hundreds of test cases
    with different array sizes and values.
    """
    if len(arr) == 0:
        with pytest.raises(ValueError):
            array_sum(np.array(arr))
    else:
        result = array_sum(np.array(arr))
        assert isinstance(result, (int, float, np.number))

How Large is “Large Enough” for Unit Tests?

Rule of thumb:

Array Size When to Use Example
0-2 elements Boundary testing Empty, single, pair (edge cases)
3-10 elements Unit tests (most common) Enough to test logic without noise
100-1000 elements Realistic scenario tests Typical real-world data size
10,000+ elements Performance tests (separate suite) Algorithm complexity, memory usage
1,000,000+ elements Stress tests (rare, CI only) Production-scale data validation

Example: Right-Sized Unit Tests

# ✅ GOOD: Representative sizes for unit tests
def test_find_intersection_typical_road():
    """Test with typical road size (~100 points)"""
    x_road = np.linspace(0, 80, 100)  # Realistic: 100 points over 80 meters
    y_road = generate_road_profile(num_points=100)

    x, y, dist = find_intersection(x_road, y_road, -10.0)
    assert x is not None  # Fast test, runs in ~1ms

# ❌ BAD: Unnecessarily large for a unit test
def test_find_intersection_huge_road():
    """This belongs in performance tests, not unit tests"""
    x_road = np.linspace(0, 10000, 1_000_000)  # Overkill!
    y_road = np.random.rand(1_000_000)

    x, y, dist = find_intersection(x_road, y_road, -10.0)
    assert x is not None  # Slow test, runs in ~500ms

Summary: Testing Array Sizes

Key principle: Unit tests verify correctness, performance tests verify speed. Keep them separate!


2.4 Boundaries for Complex Array Example: find_intersection()

Function: find_intersection(x_road, y_road, angle_degrees, camera_x, camera_y)

Array Length Boundaries:

Boundary Test Case Why Important
len(x_road) = 0 Empty arrays No road to intersect
len(x_road) = 1 Single point Can't form a segment
len(x_road) = 2 Two points Minimum valid road (one segment)
len(x_road) ≠ len(y_road) Mismatched lengths Invalid input

Angle Boundaries:

Boundary Test Values Why Important
Exactly -90° -90.0 Vertical downward (tan = ∞)
Near -90° -89.9, -89.999 Near vertical
Exactly 0° 0.0 Horizontal ray
Near 0° -0.1, 0.1 Nearly horizontal
Exactly 90° 90.0 Vertical upward (tan = ∞)
Near 90° 89.9, 89.999 Near vertical

Camera Position Boundaries:

Boundary Test Case Why Important
camera_x = x_road[0] Camera at road start Edge of road
camera_x = x_road[-1] Camera at road end Edge of road
camera_y = y_road[i] Camera at road level Might be tangent
camera_y = min(y_road) Camera at lowest point Boundary case
camera_y = max(y_road) Camera at highest point Boundary case

Intersection Position Boundaries:

Boundary Test Case Why Important
Intersection at x_road[0] Ray hits first point exactly Endpoint handling
Intersection at x_road[-1] Ray hits last point exactly Endpoint handling
Intersection between two segments Ray hits at segment boundary Interpolation edge case

Comprehensive Boundary Tests:

class TestFindIntersectionBoundaries:
    """Boundary value tests for find_intersection()"""

    # ARRAY LENGTH BOUNDARIES

    def test_boundary_empty_arrays(self):
        """Boundary: Empty road arrays (len = 0)"""
        x, y, dist = find_intersection(np.array([]), np.array([]), -10.0)
        assert x is None

    def test_boundary_single_point(self):
        """Boundary: Single point (len = 1)"""
        x, y, dist = find_intersection(np.array([5.0]), np.array([2.0]), -10.0)
        assert x is None  # Can't form segment

    def test_boundary_two_points_minimum_valid(self):
        """Boundary: Two points (len = 2, minimum valid)"""
        x_road = np.array([0.0, 10.0])
        y_road = np.array([0.0, 2.0])
        x, y, dist = find_intersection(x_road, y_road, -10.0, camera_x=0.0, camera_y=5.0)
        assert x is not None  # Should work

    # ANGLE BOUNDARIES

    def test_boundary_angle_exactly_negative_90(self):
        """Boundary: Angle exactly -90° (vertical downward)"""
        x_road = np.array([0, 10, 20])
        y_road = np.array([0, 2, 4])
        x, y, dist = find_intersection(x_road, y_road, -90.0)
        # Implementation returns None for vertical - verify this is intentional
        assert x is None

    def test_boundary_angle_near_negative_90(self):
        """Boundary: Angle near -90° (-89.9°)"""
        x_road = np.array([0, 10, 20])
        y_road = np.array([0, 2, 4])
        x, y, dist = find_intersection(x_road, y_road, -89.9, camera_x=0.0, camera_y=10.0)
        # Nearly vertical - should still find intersection if one exists

    def test_boundary_angle_exactly_zero(self):
        """Boundary: Angle exactly 0° (horizontal)"""
        x_road = np.array([0, 10, 20])
        y_road = np.array([2.0, 2.0, 2.0])  # Flat road at y=2
        x, y, dist = find_intersection(x_road, y_road, 0.0, camera_x=-5.0, camera_y=2.0)
        assert x is not None  # Horizontal ray should hit flat road

    def test_boundary_angle_near_zero(self):
        """Boundary: Angle near 0° (0.1° - nearly horizontal)"""
        x_road = np.array([0, 10, 20, 30])
        y_road = np.array([0, 1, 2, 3])
        x, y, dist = find_intersection(x_road, y_road, 0.1, camera_x=0.0, camera_y=1.5)
        # Very shallow angle

    def test_boundary_angle_exactly_90(self):
        """Boundary: Angle exactly 90° (vertical upward)"""
        x_road = np.array([0, 10, 20])
        y_road = np.array([0, 2, 4])
        x, y, dist = find_intersection(x_road, y_road, 90.0)
        assert x is None  # Implementation returns None for vertical

    # CAMERA POSITION BOUNDARIES

    def test_boundary_camera_at_road_start(self):
        """Boundary: Camera x-position at road start"""
        x_road = np.array([0, 10, 20])
        y_road = np.array([0, 2, 4])
        x, y, dist = find_intersection(x_road, y_road, -10.0, camera_x=0.0, camera_y=10.0)
        assert x is not None

    def test_boundary_camera_at_road_end(self):
        """Boundary: Camera x-position at road end"""
        x_road = np.array([0, 10, 20])
        y_road = np.array([0, 2, 4])
        x, y, dist = find_intersection(x_road, y_road, -10.0, camera_x=20.0, camera_y=10.0)
        # Camera at end, looking down - might or might not intersect

    def test_boundary_camera_at_road_level(self):
        """Boundary: Camera y-position at road level"""
        x_road = np.array([0, 10, 20])
        y_road = np.array([2.0, 2.0, 2.0])  # Flat at y=2
        x, y, dist = find_intersection(x_road, y_road, 0.0, camera_x=5.0, camera_y=2.0)
        # Camera ON the road, horizontal ray - tangent case

    def test_boundary_camera_at_minimum_y(self):
        """Boundary: Camera at lowest point of road"""
        x_road = np.array([0, 10, 20, 30])
        y_road = np.array([5, 2, 3, 6])  # min at x=10, y=2
        x, y, dist = find_intersection(x_road, y_road, -10.0, camera_x=15.0, camera_y=2.0)
        # Camera at same height as lowest road point

    # INTERSECTION POSITION BOUNDARIES

    def test_boundary_intersection_at_first_point(self):
        """Boundary: Ray intersects at first road point"""
        x_road = np.array([0, 10, 20])
        y_road = np.array([5, 3, 1])
        # Position camera so ray hits exactly at (0, 5)
        x, y, dist = find_intersection(x_road, y_road, -45.0, camera_x=-5.0, camera_y=10.0)
        if x is not None:
            assert abs(x - 0.0) < 0.1  # Should be near first point

    def test_boundary_intersection_at_last_point(self):
        """Boundary: Ray intersects at last road point"""
        x_road = np.array([0, 10, 20])
        y_road = np.array([0, 2, 4])
        # Position so ray hits last point
        x, y, dist = find_intersection(x_road, y_road, -10.0, camera_x=15.0, camera_y=10.0)
        if x is not None and x > 18:
            assert abs(x - 20.0) < 2.0  # Should be near last point

    def test_boundary_intersection_at_segment_boundary(self):
        """Boundary: Ray intersects exactly where two segments meet"""
        x_road = np.array([0, 10, 20, 30])
        y_road = np.array([0, 5, 5, 10])
        # Ray aimed to hit exactly at (10, 5)
        x, y, dist = find_intersection(x_road, y_road, -30.0, camera_x=5.0, camera_y=10.0)
        if x is not None:
            # Could be reported as end of first segment or start of second
            assert 9.0 <= x <= 11.0

Key Insights for Boundary Testing:

  1. Test at the boundary, just inside, and just outside
    • Angle = 90°, 89.9°, 90.1°
    • Length = 0, 1, 2
  2. Floating-point precision matters
    • Use pytest.approx() for floating-point comparisons
    • Test values near zero (1e-10, 1e-15)
  3. Array boundaries are both structural and positional
    • Length boundaries (empty, single, two)
    • Position boundaries (first element, last element, boundaries between)
  4. Combination boundaries are critical
    • Camera at road level + horizontal ray = tangent case
    • Empty array + any angle = should handle gracefully

General Boundary Testing Strategy:

Data Type Boundary Values to Test Python Tools
Integers 0, 1, -1, max_value, min_value sys.maxsize, -sys.maxsize-1
Floats (Finite) 0.0, near-zero (1e-10), 1.0, very large (1e10), sys.float_info.max, -sys.float_info.max, sys.float_info.min (smallest normalized), math.ulp(0.0) (smallest subnormal), sys.float_info.epsilon sys.float_info, math.ulp(x), math.nextafter(x, y)
Floats (Non-Finite) math.inf, -math.inf, math.nan math.isnan(x), math.isinf(x), math.isfinite(x)
Arrays/Lists Empty ([]), Single element, Two elements, Very large len(), np.size
Angles (degrees) , ±90°, ±180°, ±360°, values near these (89.9°, 90.1°) np.deg2rad(), np.rad2deg()
Strings Empty string (""), Single char, Very long string, Unicode edge cases len(), str.encode()

Quick Reference: Python Float Boundary Testing Cheat Sheet

import sys
import math

# CONSTANTS (define once, use everywhere)
FMAX = sys.float_info.max              # Largest finite (≈1.8e308)
FMIN_NORM = sys.float_info.min         # Smallest normalized (≈2.2e-308)
FMIN_SUB = math.ulp(0.0)               # Smallest subnormal (≈5e-324)
EPSILON = sys.float_info.epsilon       # Machine epsilon (≈2.2e-16)
INF = math.inf                         # Positive infinity
NINF = -math.inf                       # Negative infinity
NAN = math.nan                         # Not a number

# TEST CHECKLIST FOR FLOAT FUNCTIONS

# ✅ Finite boundaries (test overflow/underflow)
test_function(FMAX)           # Largest finite
test_function(-FMAX)          # Most negative finite
test_function(FMIN_NORM)      # Smallest normalized positive
test_function(FMIN_SUB)       # Smallest subnormal positive

# ✅ Non-finite values (test special value handling)
test_function(INF)            # Positive infinity
test_function(NINF)           # Negative infinity
test_function(NAN)            # Not a Number

# ✅ Precision boundaries
test_function(EPSILON)        # Machine epsilon
test_function(1.0 + EPSILON)  # Next after 1.0

# ✅ Razor-edge transitions (using math.nextafter)
math.nextafter(FMAX, INF)     # → inf (overflow transition)
math.nextafter(1.0, 2.0)      # Next representable after 1.0
math.nextafter(1.0, 0.0)      # Previous representable before 1.0

# ✅ Checking results
math.isfinite(x)              # True if not inf/nan
math.isinf(x)                 # True if +inf or -inf
math.isnan(x)                 # True if nan (don't use x == nan!)
math.ulp(x)                   # Spacing at x (precision)

Decision Tree: Which Boundaries Should I Test?

Does your function do arithmetic with floats?
│
├─ YES, division or reciprocal → Test ALL boundaries (finite + non-finite)
│   ├─ sys.float_info.max (finite overflow)
│   ├─ sys.float_info.min (underflow-to-overflow)
│   ├─ math.inf (division by zero result)
│   ├─ math.nan (invalid input)
│   └─ Near zero (1e-10, 1e-100)
│
├─ YES, but simple operations (+, -, *) → Test finite boundaries
│   ├─ sys.float_info.max (overflow detection)
│   ├─ sys.float_info.epsilon (precision loss)
│   └─ Normal range values
│
└─ NO, just comparison/sorting → Test basic boundaries
    ├─ 0.0, 1.0, -1.0
    ├─ math.inf, -math.inf (sorting edge cases)
    └─ math.nan (comparison always returns False!)

2.5 Boundaries for Discrete Functions: Thresholds Are Boundaries Too!

Common Misconception: “Discrete functions don’t have boundary values - they just have categories.”

Reality: Discrete functions DO have boundaries - they’re actually easier to identify than continuous function boundaries!

What we’ve seen so far:

All boundary examples in this lecture (reciprocal, reciprocal_sum, array_sum, find_intersection) were continuous real-valued functions - they return floats from an infinite range. For these, we had to think carefully about:

The good news: Functions with discrete/finite outputs make boundary identification much simpler! The boundaries are explicit thresholds in the code.

Key insight: For discrete functions, boundaries are threshold values where the output changes category. Instead of worrying about “how close to zero?”, you test both sides of each threshold.


2.5.1 Example: Simple Threshold Boundaries - calculate_grade(score)

Let’s start with the simplest case: a function that maps continuous inputs to discrete outputs.

Function Definition:

def calculate_grade(score: int) -> str:
    """
    Calculate letter grade based on score.

    Args:
        score: Integer between 0 and 100

    Returns:
        Letter grade (A, B, C, D, F)

    Raises:
        ValueError: If score is not in valid range
    """
    if score < 0 or score > 100:
        raise ValueError("Score must be between 0 and 100")

    if score >= 90:
        return "A"
    elif score >= 80:
        return "B"
    elif score >= 70:
        return "C"
    elif score >= 60:
        return "D"
    else:
        return "F"

Question: Where are the boundaries for this function?

Analysis:

Looking at the code, we immediately see explicit threshold values:

Equivalence classes and their boundaries:

Grade Class Range Boundaries to Test Why These Boundaries?
Grade A 90 ≤ score ≤ 100 90, 100 Lower threshold (A/B boundary), upper limit
Grade B 80 ≤ score < 90 80, 89 Lower threshold (B/C boundary), upper edge (B/A boundary)
Grade C 70 ≤ score < 80 70, 79 Lower threshold (C/D boundary), upper edge (C/B boundary)
Grade D 60 ≤ score < 70 60, 69 Lower threshold (D/F boundary), upper edge (D/C boundary)
Grade F 0 ≤ score < 60 0, 59 Lower limit, upper edge (F/D boundary)
Invalid score < 0 or score > 100 -1, 101 Just outside valid range

Key insight - Two types of boundary values:

  1. Threshold boundaries - Where output category changes:
    • score = 89 should return “B”
    • score = 90 should return “A”
    • These are critical! Off-by-one errors are common (using > instead of >=)
  2. Range boundaries - Valid input range limits:
    • score = 0 (minimum valid)
    • score = 100 (maximum valid)
    • score = -1 (just below valid range)
    • score = 101 (just above valid range)

Boundary Value Test Code:

import pytest

def test_grade_boundary_A_B_threshold():
    """Boundary: score = 89 (B) vs 90 (A)"""
    assert calculate_grade(89) == "B"  # Just below A threshold
    assert calculate_grade(90) == "A"  # At A threshold

def test_grade_boundary_B_C_threshold():
    """Boundary: score = 79 (C) vs 80 (B)"""
    assert calculate_grade(79) == "C"  # Just below B threshold
    assert calculate_grade(80) == "B"  # At B threshold

def test_grade_boundary_C_D_threshold():
    """Boundary: score = 69 (D) vs 70 (C)"""
    assert calculate_grade(69) == "D"  # Just below C threshold
    assert calculate_grade(70) == "C"  # At C threshold

def test_grade_boundary_D_F_threshold():
    """Boundary: score = 59 (F) vs 60 (D)"""
    assert calculate_grade(59) == "F"  # Just below D threshold
    assert calculate_grade(60) == "D"  # At D threshold

def test_grade_boundary_valid_range_lower():
    """Boundary: score = 0 (valid) vs -1 (invalid)"""
    assert calculate_grade(0) == "F"   # Minimum valid score
    with pytest.raises(ValueError):
        calculate_grade(-1)             # Just below valid range

def test_grade_boundary_valid_range_upper():
    """Boundary: score = 100 (valid) vs 101 (invalid)"""
    assert calculate_grade(100) == "A"  # Maximum valid score
    with pytest.raises(ValueError):
        calculate_grade(101)             # Just above valid range

Test design observations:

Contrast with continuous functions:

Aspect Continuous (reciprocal) Discrete (calculate_grade)
Boundary identification Must decide: "How close to zero?" (1e-10? 1e-100?) Explicit in code: 90, 80, 70, 60
Number of boundaries Depends on your choice of partitioning Fixed by function logic
Boundary precision Floating-point precision matters (epsilon) Integer values - no precision issues
Testing strategy Test near-zero, infinity, NaN, epsilon Test threshold ± 1

Why discrete boundaries are easier:

  1. Explicit thresholds - The code literally says if score >= 90, so you know to test 89 vs 90
  2. No precision worries - Integer thresholds mean no floating-point edge cases
  3. Clear pass/fail - Either the grade is correct or it isn’t - no “close enough” judgment calls
  4. Off-by-one detection - Boundary tests immediately catch > vs >= mistakes

2.5.2 Example: Multi-Dimensional Boundaries - calculate_shipping_cost(weight, distance, express)

Now let’s see a more complex case: multiple parameters, each with their own boundaries.

Function Definition:

def calculate_shipping_cost(weight: float, distance: float, express: bool = False) -> float:
    """
    Calculate shipping cost based on weight and distance.

    Args:
        weight: Weight in kg (0.1 to 50)
        distance: Distance in km (1 to 5000)
        express: Whether express shipping is requested

    Returns:
        float: Shipping cost in EUR

    Raises:
        ValueError: If weight or distance is out of range
    """
    if weight < 0.1 or weight > 50:
        raise ValueError("Weight must be between 0.1 and 50 kg")

    if distance < 1 or distance > 5000:
        raise ValueError("Distance must be between 1 and 5000 km")

    # Base cost calculation
    if weight <= 5:
        base_cost = 5.0
    elif weight <= 20:
        base_cost = 10.0
    else:
        base_cost = 20.0

    # Distance multiplier
    if distance <= 100:
        distance_multiplier = 1.0
    elif distance <= 500:
        distance_multiplier = 1.5
    else:
        distance_multiplier = 2.0

    cost = base_cost * distance_multiplier

    # Express shipping adds 50%
    if express:
        cost *= 1.5

    return round(cost, 2)

Question: Where are the boundaries for this function?

Analysis:

This function has three dimensions, each with its own boundaries:

Dimension 1: Weight boundaries

Boundary Threshold Value Test Cases Expected Behavior Change
Minimum weight 0.1 kg 0.09 kg (error) vs 0.1 kg (valid) Error → Light category
Light/Medium 5 kg 5.0 kg (light) vs 5.1 kg (medium) €5 base → €10 base
Medium/Heavy 20 kg 20.0 kg (medium) vs 20.1 kg (heavy) €10 base → €20 base
Maximum weight 50 kg 50 kg (valid) vs 50.1 kg (error) Heavy category → Error

Dimension 2: Distance boundaries

Boundary Threshold Value Test Cases Expected Behavior Change
Minimum distance 1 km 0.9 km (error) vs 1 km (valid) Error → Local category
Local/Regional 100 km 100 km (local) vs 101 km (regional) 1.0× multiplier → 1.5× multiplier
Regional/Long-distance 500 km 500 km (regional) vs 501 km (long) 1.5× multiplier → 2.0× multiplier
Maximum distance 5000 km 5000 km (valid) vs 5001 km (error) Long-distance → Error

Dimension 3: Express flag

Value Effect
False 1.0× (standard shipping)
True 1.5× (express multiplier)

Note: Boolean parameters don’t have “boundaries” in the traditional sense - they only have two values. But we still need to test both!

Challenge: Combinatorial explosion

If we tested every combination of boundaries:

Practical strategy: Test each dimension independently

Instead, we test each dimension’s boundaries while holding other dimensions constant:

import pytest

# ===== Weight boundaries (distance and express fixed) =====

def test_shipping_weight_boundary_minimum():
    """Boundary: weight = 0.1 (valid) vs 0.09 (invalid)"""
    assert calculate_shipping_cost(0.1, 100, False) == 5.0  # Valid
    with pytest.raises(ValueError, match="Weight must be between"):
        calculate_shipping_cost(0.09, 100, False)  # Invalid

def test_shipping_weight_boundary_light_medium():
    """Boundary: weight = 5.0 (light) vs 5.1 (medium)"""
    cost_light = calculate_shipping_cost(5.0, 100, False)
    cost_medium = calculate_shipping_cost(5.1, 100, False)
    assert cost_light == 5.0   # Light: base=5.0, dist_mult=1.0 → 5.0
    assert cost_medium == 10.0  # Medium: base=10.0, dist_mult=1.0 → 10.0

def test_shipping_weight_boundary_medium_heavy():
    """Boundary: weight = 20.0 (medium) vs 20.1 (heavy)"""
    cost_medium = calculate_shipping_cost(20.0, 100, False)
    cost_heavy = calculate_shipping_cost(20.1, 100, False)
    assert cost_medium == 10.0  # Medium: base=10.0, dist_mult=1.0 → 10.0
    assert cost_heavy == 20.0   # Heavy: base=20.0, dist_mult=1.0 → 20.0

def test_shipping_weight_boundary_maximum():
    """Boundary: weight = 50 (valid) vs 50.1 (invalid)"""
    assert calculate_shipping_cost(50, 100, False) == 20.0  # Valid
    with pytest.raises(ValueError, match="Weight must be between"):
        calculate_shipping_cost(50.1, 100, False)  # Invalid

# ===== Distance boundaries (weight and express fixed) =====

def test_shipping_distance_boundary_minimum():
    """Boundary: distance = 1 (valid) vs 0.9 (invalid)"""
    assert calculate_shipping_cost(5.0, 1, False) == 5.0  # Valid
    with pytest.raises(ValueError, match="Distance must be between"):
        calculate_shipping_cost(5.0, 0.9, False)  # Invalid

def test_shipping_distance_boundary_local_regional():
    """Boundary: distance = 100 (local) vs 101 (regional)"""
    cost_local = calculate_shipping_cost(5.0, 100, False)
    cost_regional = calculate_shipping_cost(5.0, 101, False)
    assert cost_local == 5.0     # Local: base=5.0, dist_mult=1.0 → 5.0
    assert cost_regional == 7.5  # Regional: base=5.0, dist_mult=1.5 → 7.5

def test_shipping_distance_boundary_regional_long():
    """Boundary: distance = 500 (regional) vs 501 (long-distance)"""
    cost_regional = calculate_shipping_cost(5.0, 500, False)
    cost_long = calculate_shipping_cost(5.0, 501, False)
    assert cost_regional == 7.5  # Regional: base=5.0, dist_mult=1.5 → 7.5
    assert cost_long == 10.0     # Long: base=5.0, dist_mult=2.0 → 10.0

def test_shipping_distance_boundary_maximum():
    """Boundary: distance = 5000 (valid) vs 5001 (invalid)"""
    assert calculate_shipping_cost(5.0, 5000, False) == 10.0  # Valid
    with pytest.raises(ValueError, match="Distance must be between"):
        calculate_shipping_cost(5.0, 5001, False)  # Invalid

# ===== Express flag (weight and distance fixed) =====

def test_shipping_express_flag():
    """Dimension: express = False vs True"""
    cost_standard = calculate_shipping_cost(5.0, 100, False)
    cost_express = calculate_shipping_cost(5.0, 100, True)
    assert cost_standard == 5.0   # Standard: base=5.0, dist=1.0, express=1.0 → 5.0
    assert cost_express == 7.5    # Express: base=5.0, dist=1.0, express=1.5 → 7.5

Test design observations:

  1. Independent dimension testing - Each dimension is tested while holding others constant
  2. Total: ~12 tests instead of 32 (all combinations) or 100+ (exhaustive)
  3. We still catch boundary bugs - If weight threshold is wrong (weight < 5 instead of weight <= 5), our test will fail
  4. Clear test names - Each test explicitly states what boundary it’s testing

When to test combinations:

You should add combination tests when:

For this function, dimensions are independent (just multiplication), so independent testing is sufficient.

Key insight for multi-dimensional boundaries:

Test each dimension independently with representative values in other dimensions. This gives you \(O(d \times b)\) tests (d dimensions, b boundaries each) instead of \(O(b^d)\) exhaustive tests!

For shipping cost: 12 tests (3 dimensions × ~4 boundaries) instead of 32 tests (all combinations).


2.5.3 Summary: Boundary Testing Strategy for Discrete Functions

Key principles:

  1. Boundaries are explicit - Look for threshold values in conditional logic (if, elif)
  2. Test both sides of each threshold - value-1 (old category) vs value (new category)
  3. Test range limits - Minimum valid, maximum valid, just outside range
  4. For multi-dimensional functions: Test each dimension independently (unless dimensions interact)

Common threshold patterns to look for:

Pattern in Code Boundaries to Test Example
if x >= threshold: threshold-1, threshold score >= 90 → test 89, 90
if x > threshold: threshold, threshold+1 weight > 5 → test 5.0, 5.1
if x < threshold: threshold-1, threshold distance < 100 → test 99, 100
if x <= threshold: threshold, threshold+1 weight <= 5 → test 5.0, 5.1
if min <= x <= max: min-1, min, max, max+1 0 <= score <= 100 → test -1, 0, 100, 101

Comparison: Continuous vs Discrete boundaries

Aspect Continuous Functions Discrete Functions
Boundary identification Must choose: "How close to problematic value?" Explicit thresholds in code
Test values Near-zero, infinity, NaN, epsilon Threshold ± smallest unit (1 for integers, 0.1 for floats)
Precision concerns Floating-point precision critical Usually none (integer thresholds)
Number of boundaries Depends on partitioning choice Fixed by function logic
Common bugs caught Division by zero, overflow, precision loss Off-by-one errors (> vs >=)

The good news: Discrete functions make boundary testing easier, not harder! The thresholds are right there in the code - you just need to test both sides of each one.

Final insight:

Every conditional creates a boundary. If you see if x >= 90, you know you need to test x=89 and x=90. If you see if weight <= 5, you know you need to test weight=5.0 and weight=5.1.

Boundary testing for discrete functions is systematic and mechanical - which is why it’s so effective!


3. LLM-Assisted Testing - Breaking the Test Cone

Why don’t developers write tests?

Research and surveys show:

  1. “Writing tests is tedious” (42% of developers)
    • Boilerplate: imports, setup, teardown
    • Repetitive: Similar structure for every function
  2. “I don’t know what to test” (38% of developers)
    • What are the equivalence classes?
    • What are the boundary cases?
    • How many tests do I need?
  3. “Initial setup takes forever” (35% of developers)
    • pytest configuration
    • Test file structure
    • First test is always hardest

Result: Developers procrastinate → Only write E2E tests → Inverted pyramid (test cone)

The Solution: Use LLMs to handle the tedious parts, leaving you to focus on correctness.

The Workflow:

1. Human: Identifies what needs testing (equivalence classes)
2. LLM: Generates test boilerplate (imports, structure, AAA pattern)
3. Human: Reviews and refines (fixes logic, adds edge cases)
4. LLM: Generates more tests based on feedback
5. Human: Verifies assertions are correct
6. Run tests: Catch bugs in actual code (not tests!)

Key insight: LLMs excel at boilerplate, humans excel at domain logic.

Scenario: You need to test find_intersection() from src/road_profile_viewer/geometry.py.

Step 1: Prompt the LLM (in tests/test_geometry.py)

# tests/test_geometry.py
import numpy as np
from numpy.typing import NDArray
import pytest
from road_profile_viewer.geometry import find_intersection

# Prompt: "Write pytest unit tests for find_intersection() covering:
# - Equivalence classes: downward angles, horizontal, upward angles, vertical
# - Boundary cases: empty arrays, single point, angle=90
# - Use AAA pattern (Arrange-Act-Assert)
# - Include proper type hints matching the actual repository code"

Step 2: LLM Generates Initial Tests

Copilot/Claude might generate:

def test_find_intersection_downward_angle() -> None:
    """Test find_intersection with normal downward angle."""
    x_road: NDArray[np.float64] = np.array([0, 10, 20, 30], dtype=np.float64)
    y_road: NDArray[np.float64] = np.array([0, 2, 4, 6], dtype=np.float64)
    x, y, dist = find_intersection(x_road, y_road, -10.0, 0.0, 10.0)
    assert x is not None
    assert y is not None
    assert dist is not None and dist > 0

def test_find_intersection_horizontal() -> None:
    """Test find_intersection with horizontal angle."""
    x_road: NDArray[np.float64] = np.array([0, 10, 20], dtype=np.float64)
    y_road: NDArray[np.float64] = np.array([0, 0, 0], dtype=np.float64)  # Flat road
    x, y, dist = find_intersection(x_road, y_road, 0.0, 0.0, 1.0)
    assert x is not None

def test_find_intersection_vertical() -> None:
    """Test find_intersection with vertical angle (90 degrees)."""
    x_road: NDArray[np.float64] = np.array([0, 10, 20], dtype=np.float64)
    y_road: NDArray[np.float64] = np.array([0, 2, 4], dtype=np.float64)
    x, y, dist = find_intersection(x_road, y_road, 90.0)
    assert x is None  # Current implementation returns None for vertical

def test_find_intersection_empty_road() -> None:
    """Test find_intersection with empty road arrays."""
    x_road: NDArray[np.float64] = np.array([], dtype=np.float64)
    y_road: NDArray[np.float64] = np.array([], dtype=np.float64)
    x, y, dist = find_intersection(x_road, y_road, -10.0)
    assert x is None

Value: 80% of boilerplate written instantly!

Step 3: Human Review - Find the Flaws

Now YOU review with domain knowledge:

❌ Problem 1: Missing edge case

# LLM didn't test: What if camera is BELOW the road?
# Add this test:
def test_find_intersection_camera_below_road() -> None:
    """Test when camera is below road level."""
    x_road: NDArray[np.float64] = np.array([0, 10, 20], dtype=np.float64)
    y_road: NDArray[np.float64] = np.array([5, 5, 5], dtype=np.float64)  # Flat road at y=5
    x, y, dist = find_intersection(x_road, y_road, -10.0, 0.0, 0.0)  # Camera at y=0
    # Should still find intersection (ray goes down, crosses road above)
    assert x is not None

❌ Problem 2: Weak assertions

# LLM wrote:
assert x is not None  # Too weak!

# Better assertion:
assert 0 <= x <= 30, f"Expected x in [0, 30], got {x}"
assert y >= 0, f"Expected y positive, got {y}"

❌ Problem 3: Invalid test data

# LLM wrote:
x_road: NDArray[np.float64] = np.array([0, 10, 20], dtype=np.float64)
y_road: NDArray[np.float64] = np.array([0, 0, 0], dtype=np.float64)  # Flat road at y=0
x, y, dist = find_intersection(x_road, y_road, 0.0, 0.0, 1.0)  # Camera at y=1

# Problem: Horizontal ray from y=1 won't intersect road at y=0!
# Fix: Either raise camera or tilt road
x_road = np.array([0, 10, 20], dtype=np.float64)
y_road = np.array([0, 1, 2], dtype=np.float64)  # Sloped road

Step 4: Iterative Refinement

# You to LLM: "Add tests for upward angles where ray might not intersect"
# LLM generates:
def test_find_intersection_upward_angle_no_intersection() -> None:
    """Test that upward angle with road below returns None."""
    x_road: NDArray[np.float64] = np.array([0, 10, 20], dtype=np.float64)
    y_road: NDArray[np.float64] = np.array([0, 1, 2], dtype=np.float64)
    x, y, dist = find_intersection(x_road, y_road, 45.0, 0.0, 0.0)  # Ray goes up
    # Depending on implementation, might not intersect
    # If None is correct behavior:
    assert x is None

Step 5: Run Tests and Find Real Bugs

$ uv run pytest tests/test_geometry.py -v

Surprise! One test fails:

FAILED test_find_intersection_empty_road - IndexError: index 0 is out of bounds

This is GOOD! The test found a bug in your actual code:

def find_intersection(x_road, y_road, ...):
    # Bug: Doesn't check if arrays are empty before accessing!
    for i in range(len(x_road) - 1):  # Crashes if len(x_road) = 0
        x1, y1 = x_road[i], y_road[i]
        ...

Fix the bug:

def find_intersection(x_road, y_road, ...):
    # Add validation
    if len(x_road) == 0 or len(y_road) == 0:
        return None, None, None

    # ... rest of function

Run tests again:

$ uv run pytest tests/test_geometry.py -v
============================= 8 passed in 0.12s ===============================

All tests pass! Bug fixed before it reached users.

What LLMs are GOOD at:

What LLMs are BAD at:

Example of LLM failure #1: Weak assertions

# LLM might generate:
def test_calculate_distance():
    dist = calculate_distance(0, 0, 3, 4)
    assert dist > 0  # Too weak! Just checks it's positive

# Human knows Pythagorean theorem:
def test_calculate_distance():
    dist = calculate_distance(0, 0, 3, 4)
    assert abs(dist - 5.0) < 0.01, f"Expected 5.0, got {dist}"  # 3-4-5 triangle!

Example of LLM failure #2: Logic in tests

LLMs sometimes generate tests with loops, conditionals, or calculations. This makes tests hard to understand and debug.

# ❌ LLM might generate this:
def test_find_intersection_multiple_angles():
    """Test intersection for various angles"""
    angles = [-10, -20, -30, -45]
    for angle in angles:  # ❌ Loop in test!
        x, y, dist = find_intersection(x_road, y_road, angle)
        if x is not None:  # ❌ Conditional in test!
            assert dist > 0
        else:
            assert dist is None  # ❌ Complex logic!

Problems with this test:

Human fix: Separate tests, no logic

# ✅ Human refactors to simple, clear tests:
def test_find_intersection_returns_positive_distance_for_minus_10_degrees():
    """Test -10° angle returns positive distance"""
    x_road = np.array([0, 10, 20], dtype=np.float64)
    y_road = np.array([0, 2, 4], dtype=np.float64)
    x, y, dist = find_intersection(x_road, y_road, -10.0, 0.0, 10.0)
    assert dist > 0  # No conditionals, no loops!

def test_find_intersection_returns_positive_distance_for_minus_20_degrees():
    """Test -20° angle returns positive distance"""
    x_road = np.array([0, 10, 20], dtype=np.float64)
    y_road = np.array([0, 2, 4], dtype=np.float64)
    x, y, dist = find_intersection(x_road, y_road, -20.0, 0.0, 10.0)
    assert dist > 0

# Or use pytest parametrize (cleaner for multiple similar tests):
@pytest.mark.parametrize("angle", [-10, -20, -30, -45])
def test_find_intersection_returns_positive_distance_for_downward_angles(angle):
    """Test that all downward angles return positive distance"""
    x_road = np.array([0, 10, 20], dtype=np.float64)
    y_road = np.array([0, 2, 4], dtype=np.float64)
    x, y, dist = find_intersection(x_road, y_road, angle, 0.0, 10.0)
    assert dist > 0  # Simple assertion, pytest runs this test 4 times with different angles

Key principle: Tests should be straight-line code (inspired by Google’s testing practices)

Why?

The Human-LLM Loop:

1. Human: "Test find_intersection for edge cases"
2. LLM: Generates 5 tests
3. Human: "This one is wrong - camera below road should still work"
4. LLM: Fixes that test
5. Human: Runs tests, finds bug in actual code (not test)
6. Human: Fixes code
7. Tests pass → Confidence!

3.1 Warning: LLMs Love Mocking (But You Shouldn’t Overuse It)

When you use LLMs to generate tests, they often suggest mocking - replacing real objects with fake ones to verify function calls. This can lead to brittle tests that break when you refactor code.

Key Google Testing Principle: Test State, Not Interactions

There are two ways to verify that code works:

  1. State Testing: Observe the system after invoking it to verify outcomes
  2. Interaction Testing: Verify expected sequences of function calls on collaborators (using mocks)

State testing is less brittle because it focuses on “what” results occurred rather than “how” results were achieved.


3.1.1 What Mocking Looks Like (and When It’s Problematic)

# ❌ LLM-generated test with excessive mocking
from unittest.mock import patch

def test_find_intersection_calls_tan():
    """Test that find_intersection calls np.tan"""
    x_road = np.array([0, 10, 20], dtype=np.float64)
    y_road = np.array([0, 2, 4], dtype=np.float64)

    with patch('numpy.tan') as mock_tan:
        mock_tan.return_value = 0.176  # Mocked slope
        x, y, dist = find_intersection(x_road, y_road, -10.0)
        mock_tan.assert_called_once()  # ❌ Tests HOW, not WHAT

Problem: This test breaks if you:

This is a brittle test - it fails when implementation changes, even though behavior stays the same.


3.1.2 Better Approach: Test the Result, Not the Method Calls

# ✅ Test the RESULT (state), not the METHOD CALLS (interactions)
def test_find_intersection_returns_correct_intersection():
    """Test that intersection position is geometrically correct"""
    x_road = np.array([0, 10, 20], dtype=np.float64)
    y_road = np.array([0, 2, 4], dtype=np.float64)

    x, y, dist = find_intersection(x_road, y_road, -10.0, 0.0, 10.0)

    # Assert FINAL STATE (outcomes)
    assert x is not None, "Should find intersection for downward angle"
    assert 0 <= x <= 20, f"Intersection x should be within road bounds, got {x}"
    assert y >= 0, f"Intersection y should be above ground, got {y}"
    assert dist > 0, f"Distance should be positive, got {dist}"

    # No assumptions about which numpy functions were called!

Benefits:


3.1.3 When IS Mocking Appropriate?

Mocking is valuable in specific scenarios:

✅ DO mock:

❌ DON’T mock:


3.1.4 Example: When Mocking Is Appropriate

# ✅ Good use of mocking: External API
from unittest.mock import patch, Mock

def test_fetch_weather_data_returns_temperature():
    """Test weather fetching without hitting real API"""
    # Mock the external HTTP request (slow, requires network)
    with patch('requests.get') as mock_get:
        mock_response = Mock()
        mock_response.json.return_value = {'temp': 22.5}
        mock_get.return_value = mock_response

        # Now test your function
        temp = fetch_weather_data('Berlin')

        # Assert RESULT (not implementation)
        assert temp == 22.5

Why this mocking is good:


3.1.5 Rule of Thumb: Prefer Real Objects Over Mocks

For find_intersection() and similar functions:

Google’s guideline: “Use real objects when they’re fast and deterministic. Mock only when necessary.”


3.1.6 Summary: State Testing vs. Interaction Testing

Aspect State Testing (Preferred) Interaction Testing (Use Sparingly)
What it tests Final results/outcomes Sequence of function calls
Assertion style assert x == expected_value mock.assert_called_once()
Brittleness Low - survives refactoring High - breaks when implementation changes
When to use Always, when possible External dependencies only
Example assert dist > 0 mock_api.get.assert_called()

Key takeaway: When LLMs suggest mocking, ask yourself: “Is this dependency slow or external?” If not, use the real object and test the state!


Best Practice: Use LLM to START, human to VERIFY and REFINE. Watch out for over-mocking!


4. Part 4: Hands-On Exercise - Test geometry.py

4.1 Exercise: Write Comprehensive Tests for geometry.py

Goal: Test all functions in src/road_profile_viewer/geometry.py using equivalence classes and boundary analysis.

Setup:

# Create test structure if not already done
$ mkdir -p tests
$ touch tests/__init__.py
$ touch tests/test_geometry.py

Task 1: Test find_intersection() - Equivalence Classes

Use an LLM to generate initial tests, then refine:

# tests/test_geometry.py
import numpy as np
from numpy.typing import NDArray
import pytest
from road_profile_viewer.geometry import find_intersection, calculate_ray_line


class TestFindIntersection:
    """Test suite for find_intersection() function."""

    def test_normal_downward_angle(self) -> None:
        """Equivalence class: Normal downward angle (-90° < angle < 0°)"""
        x_road: NDArray[np.float64] = np.array([0, 10, 20, 30], dtype=np.float64)
        y_road: NDArray[np.float64] = np.array([0, 2, 4, 6], dtype=np.float64)
        x, y, dist = find_intersection(x_road, y_road, -45.0, 0.0, 10.0)
        assert x is not None and y is not None and dist is not None

    def test_horizontal_angle(self) -> None:
        """Equivalence class: Horizontal ray (angle = 0°)"""
        # Your test here (let LLM generate, then review)
        pass

    def test_vertical_angle_boundary(self) -> None:
        """Boundary case: Vertical angle (90°)"""
        pass

    def test_empty_road_arrays(self) -> None:
        """Boundary case: Empty arrays"""
        pass

    def test_single_point_road(self) -> None:
        """Boundary case: Road with only one point"""
        pass

Task 2: Test calculate_ray_line() - Boundary Cases

class TestCalculateRayLine:
    """Test suite for calculate_ray_line() function."""

    def test_normal_angle(self) -> None:
        """Test with normal angle"""
        x_ray: NDArray[np.float64]
        y_ray: NDArray[np.float64]
        x_ray, y_ray = calculate_ray_line(-10.0, camera_x=0.0, camera_y=2.0)
        assert len(x_ray) == 2
        assert len(y_ray) == 2
        assert x_ray[0] == 0.0  # Starts at camera
        assert y_ray[0] == 2.0

    def test_vertical_angle(self) -> None:
        """Boundary: Vertical angle (90°)"""
        x_ray, y_ray = calculate_ray_line(90.0)
        # Should handle vertical line
        assert x_ray[0] == x_ray[1]  # Vertical means same x

    def test_zero_angle(self) -> None:
        """Boundary: Horizontal angle (0°)"""
        x_ray, y_ray = calculate_ray_line(0.0, camera_y=2.0)
        # Horizontal line at y=2.0
        assert y_ray[0] == y_ray[1] == 2.0

Task 3: Run Tests and Iterate

$ uv run pytest tests/test_geometry.py -v

# If failures occur:
# 1. Is the test wrong? Fix the test.
# 2. Is the code wrong? Fix the code.
# 3. Unsure? Add a print statement in the test to see actual values.

Success criteria:


5. Part 5: Module and Integration Testing

5.1 What are Module/Integration Tests?

Unit test: Tests one function in isolation.

Module/Integration test: Tests multiple modules working together.

Example:

# Unit test: Test ONLY geometry.find_intersection()
def test_find_intersection():
    x_road = np.array([0, 10])
    y_road = np.array([0, 2])
    x, y, dist = find_intersection(x_road, y_road, -10.0)
    assert x is not None

# Integration test: Test geometry + road together
def test_road_generation_with_intersection() -> None:
    """Test that road generation produces data that geometry can process."""
    from road_profile_viewer.road import generate_road_profile
    from road_profile_viewer.geometry import find_intersection

    # Generate road using road.py
    x_road: NDArray[np.float64]
    y_road: NDArray[np.float64]
    x_road, y_road = generate_road_profile(num_points=100, x_max=80)

    # Use geometry.py to find intersection
    x, y, dist = find_intersection(x_road, y_road, -10.0, camera_x=0.0, camera_y=2.0)

    # Verify integration works
    assert x is not None, "Generated road should intersect camera ray"
    assert 0 <= x <= 80, "Intersection should be within road bounds"
    assert y >= 0, "Intersection should be above ground"

Key difference:

5.2 When to Write Integration Tests

Write integration tests to catch:

  1. Interface mismatches
    # road.py returns (x, y)
    # geometry.py expects (x, y)
    # If road.py changes to return (x, y, metadata), tests will catch it!
    
  2. Data format issues
    # What if generate_road_profile() returns list instead of np.array?
    # Integration test will fail!
    
  3. Assumptions about data
    # geometry.py assumes x_road is sorted
    # What if road.py returns unsorted data?
    # Integration test catches this!
    

5.3 Example Integration Test Suite

# tests/test_integration.py
import numpy as np
from numpy.typing import NDArray
import pytest
from road_profile_viewer.road import generate_road_profile
from road_profile_viewer.geometry import find_intersection, calculate_ray_line


class TestRoadGeometryIntegration:
    """Test integration between road and geometry modules."""

    def test_generated_road_intersects_downward_ray(self) -> None:
        """Verify that generated roads can be processed by geometry functions."""
        # Use real road generation
        x_road: NDArray[np.float64]
        y_road: NDArray[np.float64]
        x_road, y_road = generate_road_profile(num_points=100, x_max=80)

        # Verify it works with geometry module
        x, y, dist = find_intersection(x_road, y_road, -10.0, camera_x=0.0, camera_y=10.0)

        assert x is not None, "Should find intersection with generated road"
        assert isinstance(x, (int, float, np.number)), "Should return numeric type"
        assert isinstance(y, (int, float, np.number)), "Should return numeric type"
        assert isinstance(dist, (int, float, np.number)), "Should return numeric type"

    def test_road_data_format_compatible_with_geometry(self) -> None:
        """Verify road.py returns data in format geometry.py expects."""
        x_road: NDArray[np.float64]
        y_road: NDArray[np.float64]
        x_road, y_road = generate_road_profile()

        # Check data types
        assert isinstance(x_road, np.ndarray), "x_road should be numpy array"
        assert isinstance(y_road, np.ndarray), "y_road should be numpy array"

        # Check lengths match
        assert len(x_road) == len(y_road), "Road arrays should have same length"

        # Check data is sorted
        assert np.all(np.diff(x_road) > 0), "x_road should be strictly increasing"

    def test_varying_road_parameters_work_with_geometry(self) -> None:
        """Test that different road generation parameters work with geometry."""
        for num_points in [10, 50, 100, 200]:
            for x_max in [40, 80, 120]:
                x_road, y_road = generate_road_profile(num_points, x_max)
                x, y, dist = find_intersection(x_road, y_road, -10.0)
                # Should not crash, might or might not find intersection
                assert x is None or isinstance(x, (int, float, np.number))

Run integration tests:

$ uv run pytest tests/test_integration.py -v

5.4 E2E Testing: Why We Skip It (For Now)

End-to-End test would mean:

  1. Start the Dash application
  2. Open a browser (using Selenium/Playwright)
  3. Enter angle in input field
  4. Verify graph updates correctly

Why skip it in this course?

  1. Complex setup: Requires Selenium, browser drivers, etc.
  2. Slow: Each test takes seconds to run
  3. Brittle: UI changes break tests frequently
  4. Diminishing returns: Unit + integration tests catch 90% of bugs

For this course:

In real projects: Yes, E2E tests matter. But get your pyramid base solid first!


6. 5.5 Test Maintainability: Writing Tests That Don’t Break

You’ve learned how to write unit tests, integration tests, and how to use LLMs to accelerate testing. Now let’s talk about a critical quality: test maintainability.

The Problem:

You write 20 unit tests. They all pass. ✅

You refactor find_intersection() to improve performance (no behavior change). Suddenly 10 tests fail. ❌

Question: Is this good or bad?

Bad! These tests are brittle - they break when implementation changes, even though behavior stayed the same.


6.1 The Brittleness Problem

Brittle tests fail in response to unrelated production code changes that introduce no real bugs. They:

Google’s insight: In large codebases, brittle tests are a major productivity killer. Engineers spend more time fixing tests than fixing actual bugs!


6.2 Example: Brittle Test (Tests Implementation Details)

# ❌ BRITTLE: Tests HOW the function works internally
from unittest.mock import patch

def test_find_intersection_uses_tan_for_slope():
    """Test that function uses np.tan to calculate slope"""
    x_road = np.array([0, 10, 20], dtype=np.float64)
    y_road = np.array([0, 2, 4], dtype=np.float64)

    # Mock np.tan to verify it's called
    with patch('numpy.tan') as mock_tan:
        mock_tan.return_value = 0.176  # Mocked slope value
        find_intersection(x_road, y_road, -10.0)
        assert mock_tan.called  # ❌ Breaks if you change slope calculation!

Problem: If you refactor to use a different trigonometry approach (e.g., using atan2 or pre-computed lookup tables), this test fails even though:

This test is testing the IMPLEMENTATION, not the BEHAVIOR.


6.3 Better: Robust Test (Tests Public Behavior)

# ✅ ROBUST: Tests WHAT the code does, not HOW it does it
def test_find_intersection_returns_correct_position_for_downward_angle():
    """Test that intersection position is geometrically correct for downward ray"""
    # Arrange
    x_road = np.array([0, 10, 20], dtype=np.float64)
    y_road = np.array([0, 2, 4], dtype=np.float64)

    # Act
    x, y, dist = find_intersection(x_road, y_road, -10.0, 0.0, 10.0)

    # Assert behavior: intersection should be within road bounds
    assert x is not None, "Should find intersection for downward angle"
    assert 0 <= x <= 20, f"Intersection x should be in road bounds [0, 20], got {x}"
    assert 0 <= y <= 4, f"Intersection y should be in road bounds [0, 4], got {y}"
    assert dist > 0, f"Distance should be positive, got {dist}"

    # No assumptions about HOW it calculated this!
    # Works with tan, atan2, lookup tables, or any other implementation

Benefits:


6.4 The Key Principle: Test Via Public APIs

Google’s guideline: “Write tests that invoke the system being tested in the same way its users would.”

What does this mean for find_intersection()?

The public API (what users see):

x, y, dist = find_intersection(x_road, y_road, angle_degrees, camera_x, camera_y)

Public contract (what users expect):

✅ DO test:

❌ DON’T test:


6.5 Four Categories of Code Changes (Google’s Framework)

Google categorizes production code changes and how tests should respond:

Change Type Example Should Tests Change? Why
Pure Refactoring Optimize find_intersection() algorithm ❌ NO Tests verify behavior remains constant
New Features Add find_all_intersections() function ✅ Add new tests only Existing tests stay unchanged
Bug Fixes Fix crash on empty arrays ✅ Add new test case Test the bug to prevent regression
Behavior Changes Return list of intersections instead of first ✅ Modify existing tests Contract changed, tests must reflect new behavior

The ideal: Tests only change when behavior changes (category 4). All other changes should leave tests untouched!


6.6 Real-World Example: Refactoring Scenario

Scenario: You want to optimize find_intersection() by using a faster algorithm.

Before (current implementation):

def find_intersection(x_road, y_road, angle_degrees, camera_x=0, camera_y=1.5):
    """Find intersection using linear search"""
    angle_rad = -np.deg2rad(angle_degrees)

    # Check vertical
    if np.abs(np.cos(angle_rad)) < 1e-10:
        return None, None, None

    slope = np.tan(angle_rad)

    # Linear search through segments
    for i in range(len(x_road) - 1):
        x1, y1 = x_road[i], y_road[i]
        x2, y2 = x_road[i + 1], y_road[i + 1]
        # ... intersection calculation

After (optimized implementation):

def find_intersection(x_road, y_road, angle_degrees, camera_x=0, camera_y=1.5):
    """Find intersection using binary search (10x faster!)"""
    angle_rad = -np.deg2rad(angle_degrees)

    # Different vertical check using sine
    if np.abs(np.sin(angle_rad - np.pi/2)) < 1e-10:  # Changed!
        return None, None, None

    # Use atan2 instead of tan (more numerically stable)
    direction = np.array([np.cos(angle_rad), np.sin(angle_rad)])  # Changed!

    # Binary search through segments (faster for large roads)
    left, right = 0, len(x_road) - 1
    # ... binary search logic (completely different!)

What happens to tests?

❌ Brittle tests (testing implementation):

def test_find_intersection_uses_tan():
    with patch('numpy.tan') as mock_tan:
        find_intersection(...)
        assert mock_tan.called  # ❌ FAILS! We use atan2 now!

✅ Robust tests (testing behavior):

def test_find_intersection_returns_correct_position():
    x, y, dist = find_intersection(x_road, y_road, -10.0)
    assert 0 <= x <= 20  # ✅ PASSES! Still finds correct intersection!

Result:

This is the difference between productive testing and test hell!


6.7 Practical Guidelines for Maintainable Tests

✅ DO:

  1. Test public API only - Call functions the same way users would
  2. Test final outcomes - Assert on return values, not internal state
  3. Use real objects when fast - Avoid mocking NumPy, math functions
  4. Test behaviors, not methods - One test per behavior (not one per function)
  5. Expect tests to be stable - Good tests only change when requirements change

❌ DON’T:

  1. Mock internal implementation - Don’t mock np.tan() in your own code
  2. Assert on private state - Don’t check internal variables
  3. Test method call sequences - Don’t verify tan was called before cos
  4. Couple tests to algorithm - Don’t assume linear vs. binary search
  5. Test performance in unit tests - Speed is separate from correctness

6.8 Summary: Maintainable vs. Brittle Tests

Aspect Maintainable Tests Brittle Tests
What they test Public behavior (WHAT) Implementation details (HOW)
Assertion style assert x is not None mock_tan.assert_called()
Refactoring impact Tests still pass ✅ Tests break ❌
Developer experience "Tests just work!" "Ugh, fix tests again..."
When they fail Real bug found Often false alarm

Key takeaway: The best tests are the ones you never have to touch until a real bug appears. Test WHAT your code does, not HOW it does it!


6. Part 6: Applying Feature Branch Workflow

Let’s add these tests using the professional workflow from Chapter 02 (Feature Development)!

6.1 Step 1: Create Feature Branch

$ git checkout main
$ git pull origin main
$ git checkout -b feature/add-unit-tests

6.2 Step 2: Create Test Files

$ mkdir -p tests
$ touch tests/__init__.py
$ touch tests/test_geometry.py
$ touch tests/test_road.py
$ touch tests/test_integration.py

6.3 Step 3: Write Tests (Using LLM Assistance)

Use Copilot/Claude to generate initial tests, then refine:

# tests/test_geometry.py
# (See examples from Part 5 above)

# tests/test_road.py
import numpy as np
from numpy.typing import NDArray
import pytest
from road_profile_viewer.road import generate_road_profile


class TestGenerateRoadProfile:
    def test_default_parameters(self) -> None:
        """Test road generation with default parameters."""
        x: NDArray[np.float64]
        y: NDArray[np.float64]
        x, y = generate_road_profile()
        assert len(x) == 100  # Default num_points
        assert x[0] == 0.0
        assert y[0] == 0.0  # Road starts at origin

    def test_custom_num_points(self) -> None:
        """Test road generation with custom number of points."""
        x, y = generate_road_profile(num_points=50)
        assert len(x) == 50
        assert len(y) == 50

    def test_road_is_continuous(self) -> None:
        """Verify generated road has no gaps."""
        x, y = generate_road_profile(num_points=100)
        assert np.all(np.diff(x) > 0), "x should be strictly increasing"

6.4 Step 4: Run Tests Locally

$ uv run pytest tests/ -v
============================= test session starts ==============================
tests/test_geometry.py::TestFindIntersection::test_normal_downward_angle PASSED
tests/test_geometry.py::TestFindIntersection::test_empty_road_arrays PASSED
tests/test_road.py::TestGenerateRoadProfile::test_default_parameters PASSED
...
============================== 15 passed in 0.25s ===============================

6.5 Step 5: Commit Your Tests

$ git add tests/
$ git commit -m "Add unit tests for geometry and road modules

- test_geometry.py: 10 tests covering equivalence classes and boundaries
- test_road.py: 5 tests for road generation
- test_integration.py: 3 integration tests
- All tests use AAA pattern (Arrange-Act-Assert)
- Tests generated with LLM assistance, refined by human review

Total: 18 tests, all passing"

6.6 Step 6: Push and Create PR

$ git push -u origin feature/add-unit-tests

$ gh pr create --title "Add unit tests for geometry and road modules" \
  --body "Adds comprehensive test suite using pytest.

**Coverage:**
- geometry.py: 10 unit tests (equivalence classes + boundaries)
- road.py: 5 unit tests
- Integration: 3 tests verifying modules work together

**Testing approach:**
- Used LLM to generate initial test structure
- Human review to fix assertions and add edge cases
- All tests pass locally

**Next steps:**
- Chapter 03 (TDD and CI) will add CI integration
- TDD workflow for new features"

6.7 Step 7: CI Will Run (in Chapter 03)

In the next lecture, we’ll update CI to run tests automatically!


7. Summary: What You’ve Accomplished

7.1 Before This Lecture

✅ Ruff (style)
✅ Pyright (types)
❌ No tests → Logic bugs slip through

7.2 After This Lecture

✅ Ruff (style)
✅ Pyright (types)
✅ Pytest (correctness) → 18 tests catch bugs!

7.3 Key Concepts You’ve Mastered

1. Testing Pyramid

2. Unit Testing Skills

3. LLM-Assisted Testing

4. Practical Skills

7.4 The Difference Tests Make

Without tests:

Developer: *Changes find_intersection()*
Developer: *Manually tests by clicking UI*
Developer: "Looks good!" *Pushes to main*
User: *Discovers bug with vertical angle* "App crashed!"

With tests:

Developer: *Changes find_intersection()*
Developer: $ pytest tests/
Developer: "❌ Test failed! Bug caught before commit"
Developer: *Fixes bug, tests pass*
Developer: *Pushes with confidence*
User: *Everything works!*

8. Key Takeaways

Remember these principles:

  1. Code quality ≠ Code correctness - Ruff catches style, tests catch bugs
  2. Testing pyramid, not cone - More unit tests, fewer E2E tests
  3. Equivalence classes + boundaries - Test smart, not exhaustive
  4. AAA pattern - Arrange, Act, Assert for readable tests
  5. LLMs assist, humans verify - Use LLMs for boilerplate, not correctness
  6. Fast tests = frequent testing - Unit tests run in milliseconds
  7. Tests give confidence - Refactor without fear
  8. Feature branch workflow applies to tests - Tests are a feature too!

You’re now ready for Chapter 03 (TDD and CI): Test-Driven Development!

In the next lecture, we’ll flip the script: write tests BEFORE code (TDD), integrate tests into CI, and make failing tests block merges.


9. Further Reading

On Testing:

On the Testing Pyramid:

On Equivalence Classes:

Interactive Learning:


Last Updated: 2025-11-04 Prerequisites: Lectures 1-4 (Especially Chapter 02 (Refactoring) - Modular Code), Chapter 03 (Testing Basics) Next Lecture: Chapter 03 (TDD and CI) - Test-Driven Development & CI Integration

© 2026 Dominik Mueller   •  Powered by Soopr   •  Theme  Moonwalk