Heating, Transaction Types & Green Deal

Analysis Prep

Packages

Load Data

Read in Leeds EPC and UK postcode data:

Analysis: Heating Types

Data Cleaning

Missing Values

There are no missing values. This is due to this being a compulsory field and potentially the use of different missing data indicators like "NODATA" or "NA". But we will get to that when we look at unique values.

Duplicate Observations

Remove all observations that contain duplicated building reference numbers since we only want to look at the most recent records. (It is worth noting that this may limit the outputs since the building reference number can reger to buildings or flats within them.)

Invalid Postcodes

Join with the postcode dataframe to remove outdated postcodes and add on LSOAs and filter out all observations that do not have the Leeds ladcd code.

Unique Values

There are a lot of unique values in the Main Heating Description column. This is probably due to it combining descriptions of heating fixtures and overarching heating systems or fuels.

Since heating system and fuel seem to generally be after a comma, we decided to remove the strings before commas to retain only overarching heating types and fuels.

This left us with a more manageable amount of unique values (that also show no sign of missing value descriptions):

Still, some values seem to only show heating fixtures (e.g. radiators), which is caused by them not containing any additional information.

Grouping Unique Values

Create a dataframe that only contains relevant columns:

Create an extra column to rename unique values according to reasonable subgroups. We chose to leave some unique values that only hinted at the fuel type (e.g. Electric Storage Heaters) in their own groups. We did this to give an unbiased overview of what EPC data offers.

Tables

We created two tables one containing the frequencies of the Heating Types for visualisation, and Heating Types by LSOAs for the Data Mapper.

Visualisation

Create a bar chart for the Heating Types:

This approach results in a somewhat unhelpful graph. However, it shows how much need there is to standardise in the EPC creation process.

Extra: Grouping according to ONS report

The ONS report on the EPC data also contains a visualisation of Heating Systems and Fuel distribution. They chose to heavily summarise the groupings by only using "Mains Gas", "Electric", "Community Scheme", "Oil", "Other" and "Unknown".

Grouping

We grouped smaller subgroups as 'Other' and made the assumption that observations that showed only electrical fixtures could be grouped into 'Electric'.

Tables

We, again, created two tables one containing the frequencies of the Heating Types for visualisation, and Heating Types by LSOAs for the Data Mapper.

Visualisation

Create a bar chart for the Heating Types:

This creates a much more less cluttered visual of the Heating Systems and Fuels that are being used in Leeds. However, this also heavily simplifies the groupings, which may bias potential further analysis into heating cost differences.

Analysis: Transaction Types

Data Cleaning

This uses the cleaned dataframe 'cert_final' from the Heating Types - Data Cleaning Section.

Missing Values

Again there are no missing values.

Unique Values

A look into the unique values of this column shows that it is more standardised and missing entries are summarised as "unknown".

Grouping Unique Values

There are a few variables that clutter the data. This means that the long and unuseful private rental variable is being removed. Then "not recorded" cases are renamed to "unknown". "None of the above", is not grouped into "unknown" since it does not necessarily mean this data is unavailable.

Tables

Visualisation

Extra: Analysis for all observations (so including duplicates)

We can also look at all reasons for EPC inspections in Leeds rather than just the most recent ones.

Cleaning

For this, we only clean out invalid postcodes, create a table with only relevant columns, and rename and group variables as before.

Tables

Visualisation

Green Deal

We decided to take a more detailed look at the results of the Green Deal (GD) scheme. This is relatively well reflected in the EPC data, since the Transaction Types include "Assessment for Green Deal" and "Following Green Deal".

Data Cleaning

This time, we do not work with the previously cleaned cert_final data frame, since we want to keep duplicates and invalid postcodes. This is because we want to compare te effect of GD initiatives on the same properties. Since assessment for the GD may be a bit older and may contain invalid postcodes, we keep these to avoid losing observations.

First, we drop every observation that does not contain GD in its Transaction Type.

We keep only duplicated building reference numbers since we want to compare before and after for the same building reference numbers.

We then remove older doubles for the observations classed as "assessment for GD" and newer doubles of those classed as "following GD". We do this to compare the state of properties, as closely surrounding the implementation of GD measures as possible. This is done by creating two separate dataframes.

The two dataframes created in the last step are then joined to compare them directly:

Interestingly, there are some building reference numbers where inspection dates for GD assessments are later than inspections classed as "following GD". This may indicate that there may have been several GDs in these properties, suggesting that cleaning out duplicates in the last step may have caused these inconsistencies. This may be looked into in further analyses. However, for now these observations will be dropped, since the assessment date for the GD should reasonably not be after the GD has been implemented.

Calculating Difference Variables

In this step the differences between the different levels of Energy Efficiency are calculated, to enable the illustration of changes.

Changes in 'Current Energy Efficiency' before and after Green Deal

Difference between 'Potential Energy Efficiency' before Green Deal and 'Current Energy Efficiency' after

Difference between Past and Present: Current and Potential Divide

Means of Difference Variables

'Current Energy Efficiency' before and after Green Deal:

This means that Energy Efficiency on average improved by roughly 1 band upon the implementation of GD measures.

'Potential Energy Efficiency' before Green Deal and 'Current Energy Efficiency' after:

This shows that the Energy Efficiency following the implementation of GD measures was generally 1 band under the proposed potential prior to GD measures.

'Potential Energy Efficiency' and 'Current Energy Efficiency' before Green Deal:

The Energy Efficiency before the implementation of GD measures was on average 2 bands lower than its respective potential. Combined with the previous average, this means that the gap between Energy Efficiency and the old potential has aproximately halved.

Past and Present: Current-Potential Divide:

This comparison may not be too necessary but it shows that the difference between current and potential Energy Efficiency after GD implementation is generally lower than before it.

Table

Visualisation of Differences

'Current Energy Efficiency' before and after Green Deal:

This graph shows that the Energy Efficiency of dwellings following GD implementation is mostly larger than before. There are some cases at the upper end of the spectrum where the Energy Efficiency has improved greatly, but there are also a few cases where Energy Efficiency has decreased. These outliers should be further explored to identify underlying reasons or if they are errors.

'Potential Energy Efficiency' before Green Deal and 'Current Energy Efficiency' after:

This visualisation shows that the Energy Efficiency dwellings have achieved through the GD was sometimes on target for the potential proposed prior to GD implementation but generally just under. Some dwellings have even surpassed previous potentials following GD measures.

'Potential Energy Efficiency' and 'Current Energy Efficiency' before Green Deal:

This chart offers a good point of comparison for the previous one. It shows that the gap between the current energy efficiency preceding GD measures and the respective potential was generally bigger than following GD implementation. It may be interesting to look into changes in Energy Efficiency for individual dwellings, especially at the lower end of the spectrum to test the magnitude of changes and improvements.

Past and Present: Current and Potential Divide:

This final graph visualises the difference between the current-potential gaps in Energy Efficiency before and after GD implementation. It shows that the proposed potentials after dwellings have taken advantage of GD measures are often similarly distant to their respective current Energy Efficiency values, as the before GD implementation. However, the current-potential gap is generally smaller following the GD than before it.