Analysis: Care.data – where next?

Following the public debate surrounding the NHS Care.data plan, it was unsurprising the topic dominated the UK’s largest health IT event

Following the debate surrounding the NHS’s care.data plans last month, it was unsurprising that the topic dominated the UK’s largest gathering of health IT professionals, HC2014, last week.

The controversies surrounding the plans to expand the collection of patient care data from hospitals to include general practices came to a head in February when failure to explain the benefits to the general public forced the NHS to put the plans on hold for six months.

The arguments for and against care.data are clear, and during HC2014, most agreed the benefits of better understanding of health needs, how the data can be used to improve quality of care and also by researchers to identify patterns in diseases.

The debate comes a year after the Francis Report, which looked into the failings of the Mid Staffordshire hospital between January 2005 and March 2009, and called for better data to be available to prevent repeat failings elsewhere.

Kingsley Manning, chair of the Health and Social Care Information Centre (HSCIC), said it would also help in redesigning NHS services: “The combination of data and technology constitutes our major hope of making the NHS affordable and sustainable.”

Important questions

More on care.data

Patient information from care.data would also answer some fundamental questions about the NHS. Geraint Lewis, chief data officer at NHS England told delegates at HC2014 that the NHS does not know how many people have had chemotherapy in the last year. “We just don’t have the data,” he said.

There are also lots of black holes in information about patients and the quality of care the NHS receives after people leave hospital. The care.data plan is to expand the NHS’s dataset from the current collection of hospital data (HES – Hospital Episode Statistics) to include data from GP practices.

“We can’t join up datasets,” said Lewis. “No one can tell you the average time presenting to a GP with a symptom, versus the diagnosis and to compare between practices and hospitals.”

“The main benefits are not just from the collection of data itself, but the power of it will be starting to link the data together so we can start asking questions across different care settings and geographies.”

Independent consultant Ian Herbert agreed during HC2014, and said that there was an absolute need for high-quality NHS data. “Reluctantly, we need rich consistent databases as proposed by care.data,” he said. “There are huge economic advantages and it’s convenient, but we need to make sure convenience doesn’t trump everything else.”

Mark Whitehorn, emeritus professor of analytics Dundee University, told Computer Weekly that the big problem for the NHS and HSCIC would be to marry up the disparate datasets.

Large sections of the public have genuine concerns about how their data is used by large institutions

Kingsley Manning, Health and Social Care Information Centre (HSCIC)

He said the proliferation of software across the NHS could cause a problem, as GP surgeries and hospitals may be using different fields and formats to store data. “One system could have codes for heart conditions that don’t quite match to each other for example,” he said. “This sounds trivial, but the trouble is healthcare systems store a lot of complex details. And it is understanding the meaning of the data and making not-quite-compatible datasets work together a huge problem in this kind of work.”

Trust us with your data

Critics of care.data have expressed concerns about the protections around data and its ability to be used for purposes other than anonymised medical research, in particular for commercial purposes such as insurance.

In the run up to the care.data debate, the NHS had an attitude of “trust us with your data”, but the public, recently stricken by NSA, Snowden and banking scandals, was not going to sit back and allow this to happen without explanation.

“Large sections of the public fully understand the power of information and have genuine concerns about how their data is used by large institutions,” said Manning. “And there is a general perception that those institutions aren't very good at keeping your data safe, as evidenced by Edward Snowden and Bradley Manning.”

Over the next few months, the HSCIC – the body responsible for collecting patient data – and the wider NHS will be announcing new initiatives in areas of transparency, effectiveness and security to reassure the general public about sharing their data.

Care.data definitions

Red data: Contains identifiers including date of birth, postcode and NHS number. Strictly controlled within the law and only disclosed by HSCIC where there is a legal basis, eg public health emergency or patient approval.

Data access advisory group will independent group consisders applications.

Amber data: Contains a unique pseudonym for each person that strips out all identifiers. Used to track individuals over time, but because the data is so rich a hacker could try to re-identify the data. This means the HSCIC could never publish the data, but it is only available under strict circumstances and after following ICO safeguards.

Green data: Contains aggregated or anonymous data and is safe to publish. Safeguards in place to ensure it is not reidentifiable, eg small number suppression which replaces digits less than five with an asterix, or injecting fake data.

Anonymised data: Data is stripped of identifiable information.

Pseudonymised data: Data is stripped of main identifiers such as postcode, NHS number, date of birth, and then an algorithm is used to produce a pseudonymised identifier.

Type 1 opt-out: Prevents patient data flowing from GP practice to HSCIC.

Type 2 opt-out: Prevents the flow of red data only.

At the time of publication, the Independent Advisory Group (IAG) has only approved the flow of data from GP practices into the HSCIC central database and then out in green or amber publication format. Nothing else has yet been approved, and amber data would be used for commissioning use only.

Meanwhile, new legislation to amend the Care Bill has reached the House of Commons which aims to put additional restrictions on the dissemination of information by the HSCIC. The amendment aims to ensure that the HSCIC could only disseminate information to requesting organisations if “disseminating the information would be for the purposes of the provision of health care or adult social care.”

The new communications and awareness lead of care.data, Tim Carter, admitted the NHS had failed to explain in real terms what the risks and benefits would be to NHS England patients.

A leaflet was sent out to 26.5 million English households in January to educate the public about the proposed plans, but less than a third remember receiving the leaflet. According to a BBC poll, around 45% of people remain unaware of the plan to share some data from GP medical records.

Automatic opt-in

The current proposals for the care.data programme dictate an automatic opt-in of all NHS England patients. Patients would also have to tell their GP surgery they wouldn’t want their data being included before the spring-time deadline – which has since been delayed until October. And as more of the debate became headline news over the these past weeks, there has been a fear that too many people will opt out.

Whitehorn said if a large number of people decide to opt out of the care.data programme this could negatively affect the quality of the datasets.

“If a random 50% of the population decided to opt out, you would still have more than enough data to make sensible decisions,” he explained. “The problem comes if people start opting out in large numbers, which is not at random. They may be elderly, females, or healthy people for instance, that would bias your sample, and would create a real problem when making general decisions about health care due to a bad sample.”

He said that the decisions could be influenced by bias in the numbers. A take-up of 98% wouldn’t have a great bias, but after that it graduates and the analytics sees a decay of what you can definitively say about that dataset.

Pseudonymised data

The pseudonymisation of data would make people less identifiable in datasets, but it was also one of the big communication failures by the NHS, as the leaflet failed to explain what this term would mean for patient anonymity.  

“Rather than talk pseudonymised, we need to explain in real terms,” said Carter. “We need to explain the benefits, but have to put the risks alongside that.”

Once the data reaches the HSCIC’s central database, in order to pseudonymise data, it is stripped of its main identifiers such as postcode, NHS number, date of birth. Following that, the HSCIC will apply an industry standard encryption algorithm from the Secure Hash Algorithm group of cryptographic hash functions

The HSCIC will encrypt this data as it is collected, and before it is released it will also be pseudonymised per customer and per purpose.

The benefits of pseudonymised data means the NHS can link up different datasets to provide valuable insights for research purposes.

Phil Booth, coordinator of medConfidential said pseudonymisation is a useful technique, but it won’t protect patient-level linked data alone.

He said pseudonymised data could be open to “jigsaw attacks” where multiple datasets can be used to identify individuals.

Critics argue a way to add an extra level of security would be to pseudonymise the data “at source” so raw data would never leave the GP surgery. But Tim Kelsey, national director for patients and information at NHS England, told BBC Radio Shropshire last month that the technology to replace individuals with a pseudonym is not ready to be used on a localised basis.

Risk vs benefit

Kelsey said at the conference that the NHS was at a turning point when it comes to sharing data.

We need to be clear of the use and benefits, but also articulate those risks and try to quantify them

Phil Koczan, UCL Academic Health Science Network

“I think this is a turning point for a different kind of contract between us and the citizens of this country about a properly shared understanding of benefits and risk on the one hand of data sharing and the guarantees on the other about what the use of the data will be,” he said.

Even the critics of care.data couldn’t dismiss the benefits during a panel debate at the conference, but it was highlighted that it is important to ensure that those risks are proportionate to the benefits that could be reaped.

Phil Koczan, chief clinical information officer (CCIO), UCL Academic Health Science Network said: “We need to be clear of the use and benefits, but also articulate those risks and try to quantify them.”

He said that he would like to see the programme move forwards and not “let the baby get thrown out with the bathwater".

“It’s been a helpful and interesting discussion in the media, now let’s make sure we get those messages very clear, define uses of data and have a clear consent model," he said.

Booth agreed, saying transparency would be key. “We need to be very very specific about what the system will do endpoint to endpoint,” he said. “Then we need to communicate it properly and clearly, explain the risks, the benefits and give people the choice and the means to exercise that choice.”

Read more on Privacy and data protection