Across our research programme, the teams are looking for increasing amounts of monitored building performance data. But what is this monitored data and why do we want it? Let’s take a look at this in a bit more detail, and why it is important in enabling the UK’s transition to net zero.

Firstly, let’s start with what makes up the data we are seeking. I suggest that there are six main characteristics of any monitored building energy dataset, these are:

  • The variables monitored
  • The time resolution
  • The level of metadata
  • The sample size
  • The monitoring period
  • The data quality

Let us consider these in turn.


The variables monitored are generally energy demands and/or temperatures. Energy may be in the form of heat or electricity. These variables could be monitored at any level from the whole building down to individual rooms, appliances, sockets, radiators, and taps.


The time resolution could be anything from annual to sub-minutely. Such high-resolution data would not have been obtainable in the past as we simply didn’t have the technology to capture and store it. Although high-resolution data can be very useful for some purposes (such as the analysis of electricity distribution networks), the amount of data involved can become burdensome and costly to manage.


Under metadata, I include information on the building construction, dimensions, heating system and location, as well as information on occupants, such as number, age, employment status, socio-demographic data etc. where appropriate. Any data from surveys or interviews with occupants could also be included here, such as occupant satisfaction with the building and heating system, environmental attitudes, or the use of windows and secondary heating. Such metadata provides added richness and may help to explain and understand the patterns of energy use seen in the monitored data. Different datasets vary widely in the level of metadata which is included, thus determining the types of analysis which may be conducted. Privacy concerns may also limit the metadata which is available.

Sample size

The sample size can range from individual buildings up to thousands of buildings. Typically larger datasets include less detail. It is also important to consider the diversity of the sample – are all the buildings of a particular type or in a particular location?

Monitoring period

The length of the monitoring period and the range of weather conditions experienced during this period is also important. The weather can be a challenge when collecting data, as the chosen monitoring period may turn out to be unusually cool or warm, but this cannot be known in advance!


The quality of the data refers to the presence of errors in the data. This could include the frequency and magnitude of errors, as well as the level of certainty with which they can be identified and rectified. For example, redundancy in the monitoring set-up can make identifying errors easier. With smaller datasets, the researcher can “eyeball” the data to look for problems, whereas with larger datasets the data cleaning process must be automated. Indeed, with some smaller datasets, it might be possible to visit the building in question and ascertain what is really happening. There are numerous possible sources of error: incorrect sensor installation, degredation of monitoring equipment, data logging and transmission problems, building users moving or tampering with equipment etc.


Clearly there are trade-offs – a larger dataset is likely to include less detail on each building, both in terms of the variables monitored and the level of metadata, but a smaller dataset may not include enough dwellings to cover the range of energy demand patterns of interest. The monitoring period may or may not happen to include the range of weather conditions of interest. The greater the quantity of data, the more difficult it is to investigate any anomalies in the data and the computer processing time will be longer.

Obtaining monitored data is expensive. Therefore, it makes sense that the data which is collected can be used in as many ways as possible. This requires an understanding from those carrying out any monitoring of the ways the data might be used, and the requirements that these users might have.

Using monitored data

For our work on the Active Building model at Loughborough, we would ideally like a dataset which is monitored at half-hourly or higher resolution, providing space heating, domestic hot water, and electricity demand separately, together with metadata on building size, fabric efficiency and number of occupants. The larger the sample size, the more value it provides, especially if it includes a wide range of occupants and a significant number of low-energy buildings.

Such an ideal dataset is not currently available. Instead, we are using the SERL dataset (, which is an excellent resource. It provides smart meter gas and electricity demand and good metadata for a large sample of dwellings. We are using this dataset to construct a model of half-hourly energy demand from groups of Active Buildings, taking into account the patterns of energy use found in real homes with real occupants.

It is appealing to imagine that there could be one perfect dataset, which would be ideal for all purposes! I am not sure whether this is possible, given the range of purposes for which monitored data can be used, but with care datasets may be useful for a wider range of users and purposes than at present. There is also some ingenuity in figuring out how to make best use of the datasets that are available. This is perhaps a better use of time than idly pondering the perfect dataset!

What would your ideal dataset be like? How might building energy datasets be made useful for a wider range of purposes?

Stephen Watson is a Research Associate at Loughborough University whose academic career has focused on modelled heat pump electricity demand and analysis of monitored energy demand data.