March 2018 

The first article in the series focussed on 'scoping the model', this, the second in the series of articles, looks at the importance of ensuring models, particularly in the context of demand and capacity planning within healthcare, are underpinned with robust baseline data.

Baseline data is essential to enable stakeholders to monitor and track changes… and may be used later to provide a comparison for assessing program outcomes or impacts (World Bank Institute)

I’m not going to dwell too much on the data collection aspect at this time, other than to suggest a couple of key considerations. The first of these is to ensure that the requested data is provided in sufficient detail to allow you to model the assumptions that were scoped with the customer. For example, if one of the assumptions pertains to gender, you will clearly need to ensure that the data provides for such a breakdown. However, as data governance is taking on greater prominence, particularly in the healthcare sector due to the sensitive nature of the data, you should ensure that any data request is proportional to the modelling requirements. If in doubt, check with colleagues or sources such as the Office of the Information Commissioner (UK specific), particularly in the context of the General Data Protection Regulation (GDPR) which comes in to force in May 2018.

Data requests should be proportional to the modelling requirements

Once you have processed the raw data and have populated your model with this, the next, and perhaps one of the most crucial steps in the entire modelling process is to validate this with your customer. A model without an agreed baseline is akin to building a house on sand, it will eventually crumble and fall down. The baseline is the foundation of the model and as such needs to be solid and robust.

Experience has taught me that it can often be challenging to reach an agreed baseline position within a healthcare environment. In many cases, such challenges can be avoided by bringing clarity and transparency to the process. There are typically a handful of reasons why a baseline is rejected:

1.     The source data is inaccurate and doesn’t reflect what actually happened.

2.     The data request has been processed inaccurately.

3.     The model has processed the data incorrectly (but the data is accurate).

4.     There is a mismatch between what actually happened and the perception of stakeholders.

Sometimes the data is simply wrong, or has been processed incorrectly, but in my experience this is not all that common. Where this is the case however, then you clearly need to go round a loop to understand why this is so and to take remedial action.

If the model is wrong, then further work is required to understand the reasons for this. A simple comparison of raw data and the model output can help point you towards a solution.

This article is however predicated on the assumption that the data and the model is inherently good and thus focusses on the final of the four points above. I have found that a common reason for lack of agreement on the baseline is the divergent opinions of different stakeholders in terms of what the baseline counts. This can often result in a situation where people are all too quick to rubbish the data, which creates mistrust in any model.

So why does this happen? Clinicians may, for example, count patient procedures and diagnoses, financial managers are more likely to count activity that generates revenue or incurs cost (and experience tells me that they may not even count activity which doesn’t contribute to either of these), whereas performance and general managers are perhaps more likely to count a patients entire stay (e.g. a spell). All of the above could be relevant, depending on the situation and the questions we’re asking the model to answer.

To overcome problems with the definition of the baseline, there needs to be clear agreement on the rules on how the data has been processed and any caveats that have been applied. Be explicit about what are you counting and what the currencies are. For example, are you using spells or episodes, attendances or appointments, procedures or sessions?

Be explicit about what are you counting

It is important to use common language, being clear on the rules and lookup references that have been applied to raw data. You should be clear, for example, about how you are defining ‘patient types’. This is generally straightforward for day case and elective patients, but may be slightly more complex for non-elective patients. There are no hard and fast rules, but it is important to clearly articulate and agree the rules that will be used.

From a modelling perspective, it can be important to determine the level of aggregation which is required to make sense of the data, (the more aggregation, the higher the likely level of efficiency, which may be relevant for more complex models). Stakeholders will typically assess the baseline at a level of aggregation which makes sense to them. You should therefore consider developing baseline outputs which reflect different views on the data to enable this. One approach may be to cascade the outputs so that you build a story. In doing this, you are also building in functionality which may identify areas where the data could be genuinely misleading or inaccurate.

Ensure there is clarity on the time period that the baseline covers and how activity is counted. Is it, for example, based on admissions or discharges; are there open activities which may not be included? It is also important to ensure that the baseline period is current – the greater the elapsed time between the baseline end date and the processing date, the greater the risk of divergence of opinion, which could simply be down to memory recall.

Also linked to time period is the important consideration of whether anything occurred in the baseline period which makes it potentially unrepresentative of normal and expected trends. Although the data may be representative of what happened, consider whether any adjustments need to be made to the assumptions to reflect ‘normal’ activity levels in subsequent years. My own take is that the baseline should not be adjusted – it should always reflect what actually happened.

It is important to know who has the authority to ‘sign-off’ the baseline. Although it is important and recommended that you get formal sign off from the customer, consider whether other stakeholders are also in agreement to avoid problems with ownership and embedding the model within the wider organisation. I have always found that a model which has the approval of relevant clinicians is much more likely to be implemented successfully.

In summary, arriving at an agreed baseline position is a vital step in the modelling process. Save yourself time and energy later by ticking this off before expending time developing the functionality of the model. Without a robust baseline, no model, however elaborate will be accepted.


This article was written by This email address is being protected from spambots. You need JavaScript enabled to view it., a Consulting Manager for GE Healthcare Partners specialising in data analytics.


> <

Share This:

For more information contact:


Karen Cheetham,
Principal at GEHC Partners

Email Karen

Or Call: +44 (0) 20 7479 9720