Leeds EPC / Property age Data Exploration

Imports:

Load in England & Wales EPC data

Extract EPC data for Leeds only:

Read in EPC data for Leeds:

We can see that there is 90 columns in the data, many of which contain lots of 'null' values. Let's limit the data to a subset of columns:

Load ONS Postcode lookup file and merge with the EPC data:

Grab ONS Postcode Lookup file (2020):

Read in the ONS Postcode Lookup file & drop unnecessary columns:

Cleanup the EPC Data

Merge the EPC data with the ONS postcode lookup (removing any rows with invalid postcodes):

Remove entries where the LSOA isn't in Leeds

Sort by EPC date and remove outdated EPC's (ie. if a property has multiple EPC's)

Create a table to store summarised data by LSOA:

Calculate average (mode) current & potential EPC ratings by LSOA:

Calculate average mode property age bands by LSOA from the EPC Data:

Replace blank age band values with 'UNKNOWN':

Determine what the current Age Band values are in the EPC data:

Standardise the naming of these age bands:

Let's look at the total number of properties in each age band:

Now let's calculate the number of properties in each age band by LSOA (from the EPC data):

Calculate total number of properties and mode age band by LSOA:

Now let's look at the number of LSOA's per average age band:

Now let's compare the EPC property age data with data from the VOA Housing Stock 2020 data

Get VOA 2020 Housing Stock data to compare:

Filter to Leeds LSOA's only:

Drop unnecessary columns and standardise age band naming:

Convert column datatypes to numerical:

Calculate total properties and mode age band by LSOA:

Calculate percentage of total properties with a published EPC:

Sort by LSOA and save as CSV file:

Grab the LSOA 2011 Boundaries Shapefile from the ONS and filter to Leeds only:

Merge the LSOA boundaries table with our LSOA summary table to create a geographic summary:

Save as GeoJSON: