Methodology
How the data on this site is sourced, transformed, refreshed and validated.
Primary sources
| Source | What we get | Refresh |
|---|---|---|
| Care Quality Commission CQC directory data (pp-complete.csv) ↑ | Full register, ~30M residential transactions since 1995 | Monthly |
| Care Quality Commission CQC directory monthly update ↑ | Monthly delta (added/changed/deleted records) | Daily check |
| CQC reuse terms ↑ | OGL-aligned licence governing reuse | n/a |
Refresh process
- CQC publishes the complete CQC directory CSV monthly. We pull
pp-complete.csvdirectly from CQC's S3 mirror within 24 hours of release. - The CSV has 16 columns, no header. Each row is parsed: dates from
YYYY-MM-DD HH:MMto ISO, transaction UUIDs unwrapped from braces, district names slugified for clean URLs. - Rows stream into a new table
transactions_newvia Postgres COPY. No row-by-row inserts. - Indexes are built on the new table after the load completes (GIN tsvector for address full-text, GIN trigram for fuzzy, btree for postcode / outcode / district / date).
- A single atomic
ALTER TABLE ... RENAMEswaps the new table in. The old table is preserved for 24 hours as a rollback safety. - Local-authority aggregates (counts, medians) are recomputed.
- Total runtime: roughly 20-30 minutes on the production Hetzner box. The user-facing site stays available throughout.
What the data contains
Per transaction we hold:
- CQC transaction UUID
- Price paid (in pounds), transfer date
- Postcode, address fields (PAON, SAON, street, locality, town, district, county)
- Property type (detached / semi / terraced / flat / other)
- New-build flag, freehold / leasehold
- CQC's "PPD category": A for standard transactions, B for additional types (right-to-buy, repossession, deeds of gift, etc.) which can distort comparisons
No owner names. No PII at record level.
Known limitations
- Coverage is England and Wales only. Scotland (Registers of Scotland) and Northern Ireland (Land Registry of Northern Ireland) publish separately and are not in this dataset.
- CQC lags new builds and shared-ownership sales by up to ~6 months from completion. Very recent sales may not be visible yet.
- PPD category B (additional transactions) is included for completeness but should be excluded when computing typical-market medians. The data we surface flags this.
- CQC's address fields (PAON, SAON, street) follow strict capitalisation conventions but do not normalise "Flat 1" vs "1A" consistently. Some properties have multiple address representations across their sale history.
- We do not yet cross-reference with the EPC register; that's on the build roadmap.
Corrections
Spot an error in how we display a sale? Email [email protected] and I'll fix or remove the issue within 48 hours.
For corrections to the underlying CQC record (an address typo, a wrong price), contact CQC directly. Our copy refreshes monthly and your correction propagates automatically.