This cheat sheet has (obviously?) been made by a language model from the transcript of the PyBay 2025 talk by Pamela Fox.
Use LMs as fast test scaffolding, not as oracles: humans still define invariants, edge cases, and what “correct” means.
Quick Checklist
1. Reduce Redundancy
Use parametrization instead of near-duplicate tests, and centralize setup with fixtures that also clean up.
Key ideas:
- Parametrize when only a few values differ.
- Keep shared setup/teardown in fixtures, not copy-pasted into every test.
2. Make Fake Data Look Real and Diverse
Prefer libraries over hand-rolled strings; seed for reproducibility and control locales.
Also explicitly test edge cases:
- Names with non-ASCII characters
- Names with multiple middle names
- Names with accents or hyphens
- Very long or single-character names
- Addresses and locales outside your usual region
LMs tend to give “John Doe” / “Jane Smith”; your tests shouldn’t.
3. Assert Behavior, Not Just Shape
Don’t only check that keys exist—validate values and constraints. For JSON APIs, snapshot full responses when that’s more expressive.
Guidelines:
- Snapshot only stable fields (no timestamps, random IDs, etc.), or filter before snapshotting.
- Treat snapshot diffs like code changes: review them, don’t click “accept” blindly.
4. Treat Coverage as a Tool, Not a Target
Coverage highlights what you forgot to test; it does not prove correctness.
- Run line and branch coverage.
- Inspect uncovered lines and branches, especially around conditionals and error handling.
- Add tests that exercise those paths with meaningful assertions.
Even at ~100% line coverage, you may still be missing behaviors—property-based tests (§5) and fuzzing (§6) help there.
5. Break the Happy Path with Property-Based Testing
Use Hypothesis to explore tricky inputs (empty/huge/negative/unicode, weird floats, etc.).
You can also model realistic numeric ranges:
@given(
lat=st.floats(min_value=-90, max_value=90, allow_nan=False),
lon=st.floats(min_value=-180, max_value=180, allow_nan=False),
)
def test_bees_active_accepts_valid_coords(client, lat, lon):
res = client.get("/bees/active", params={"lat": lat, "lon": lon})
assert res.status_code in {200, 400, 404} # but not 500Add input validation so invalid inputs yield clear 4xx responses, not 500s.
6. Fuzz APIs End-to-End from the Spec
Given an OpenAPI/Swagger spec, use Schemathesis (built on Hypothesis) to generate inputs across routes/params and reproduce failures with minimal examples or curl commands.
Conceptual example:
This explores your API like Hypothesis explores functions, and gives you a concrete request to reproduce each failure.
7. Make Tests Deterministic and Re-Runnable
- Seed Faker,
random, NumPy, and Hypothesis. - Freeze time if logic depends on “now”.
- Ensure fixtures clean DB, files, and any in-memory/global state.
- Run the suite twice and in parallel (locally/CI) to reveal hidden shared state.
- Pin dependency versions and control environment variables (
TZ, feature flags, API base URLs).
If a test passes sometimes and fails sometimes, assume there is a bug until proven otherwise.
8. Prefer Higher-Value Test Levels When Appropriate
For user-facing web apps, bias toward integration/E2E tests (e.g., Playwright) that cover real user journeys:
- “User signs up, logs in, creates X, sees Y”
- “User searches with weird filters and still gets a reasonable result or a clear error”
Support these with a smaller set of focused unit tests for core pure logic (parsers, calculations, transformations).
9. Use LMs as Generalists
Let an LM scaffold, but enforce specialist practices (pytest features, Faker, Hypothesis, snapshots), and always review/refactor.
Prompt stub (adapt as needed):
Write pytest tests for a FastAPI app. Use fixtures from
conftest.py. Parametrize similar cases; use fixtures for DB setup/teardown; use Faker with a fixed seed and explicit locale; add at least one Hypothesis test per route’s inputs; write one snapshot test per JSON route (only snapshot stable fields); run coverage and add tests for uncovered lines/branches; ensure tests are idempotent and re-runnable.
LMs are good at volume; humans are responsible for correctness.
10. Practical Workflow
- Establish robust fixtures (app/client/DB) with cleanup.
- Generate a first pass with an LM using the prompt above.
- Refactor: parameterize, centralize factories, seed Faker, add snapshots.
- Add Hypothesis on risky inputs; fix validation to return 4xx, not 500.
- Coverage pass: close gaps for important lines/branches.
- Add E2E smoke tests for key flows (login, main user journeys).
- Ensure determinism: seed, freeze time, run twice/parallel, pin deps.
- Merge gates: green CI, coverage threshold met, snapshot diffs reviewed, human review of LM-generated tests.