Rationale
Compliance with General Data Protection Regulation (GDPR) and other data protection laws, as well as enforcing an ethical usage of data, is paramount to gain the general public’s support and trust in the HRIC.
Existing ethics and legal framework
Unblocking the legal and administrative barriers for sharing human research data across geographical and organizational boundaries will, if the trust of research participants is preserved, pave the way for continent-scale cohorts in life-science research. This will represent a significant innovation as the sharing and joint analysis of sensitive data has until now been severely limited because of the different restrictions inherent to the different classes of sensitive data. By using a federated database model, with a metadata repository within the HRIC cloud encrypted environment, data security is maintained while innovative data analyses can be performed by bringing the algorithms to the data rather than centralizing the data [32]. Federation, rather than full integration of all available resources, poses an important challenge for the implementation and deployment of an effective HRIC. The set-up and functioning of a HRIC requires a robust foundation of legal agreements and ethics rules and procedures, as well as security and data protection compliance protocols. Importantly, these elements must be introduced during the conception and design phase as part of HRIC governance. Indeed, in order to enable different HRIC actors to provide access to their data sources, manage these resources within the cloud, and access these resources, it is essential to incorporate policy requirements into the design of the HRIC itself and to manage the complexity involved by implementing simple and intuitive user interfaces and project portals. This may be challenging considering the heterogeneity of health systems and health market access across Europe and will need the sharing of a common agreed vision across EU member states.
Ethical, societal, and privacy considerations for (re) using health-related data have been outlined in the Code of Practice on Secondary Use of Medical Research Data, which was developed in the European Translational Information and Knowledge Management Services (eTRIKS) project funded by the Innovative Medicines Initiative (IMI) [33]. In addition to clear and explicit consent, explicit dissent may need to be considered for the use/re-use of data. Ultimately, each citizen and patient must be able to access her/his own data and know when and where it has been used and for what purpose. In addition, the difficult question of the business model of using those data should be discussed at various levels from ethics, social, and economics standpoints, taking into account the potential future development of products and services using personal medical data. Furthermore, the goal of providing citizens with personalized services requires technical advancements in the collection and analysis of data (for example, in data analytics and machine learning). For this type of usage, simple consent mechanisms might not be sufficient. For example, how should a clear data-collection purpose statement be defined if data are collected for multiple usage scenarios across a distributed/federated cloud, in which actors from different geographical and legislative environments will need to interact and cooperate? Would an excessive number of consent requests minimize data provision for research or clinical applications? Another level of complexity is introduced by the heterogeneity of data protection and privacy regulations when the data originate from states with federated national health systems (e.g., Germany and Italy). Development of large-scale European access mechanisms will require open consultation and engagement with national policymakers, patient organizations, and wider society to build the trust and confidence needed for widespread adoption and sustainable operations.
In addition to technical, ethical, and legal specifications, a global integrated governance model needs to be established for the HRIC that is in line with that of the EOSC, regulating the roles and responsibilities of all contributing institutions and users, and procedures for authentication and access control to individual resources. Principles, with specific guidelines on implementation in a HRIC environment, will need to be developed in order to manage and regulate aspects such as ownership, access, transparency, sharing, integration, standardization of data and metadata formats, tools, and frameworks, while ensuring confidentiality and sustainability. All of these principles need to be developed with the overarching objective of providing benefit to and preserving the trust of patients and the general public.
Health data mostly represent sensitive data, which need to be managed to preserve the trust of patients, research participants, and the general public, respect social norms, and naturally comply with the rules and regulations of data protection laws, notably the EU GDPR [15]. Although the GDPR directly applies across the EU and its provisions prevail over national laws, EU member states retain the ability to introduce their own national legislation under certain derogations provided for by the GDPR itself. The GDPR also introduces the notions of ‘Privacy by Design’, which means that any organization that processes personal data must ensure that privacy is built into a system during the whole life cycle of the system or process; and ‘Privacy by Default’, which means that the strictest privacy settings should apply by default, without any manual input from the end user. In addition, any personal data provided by the user to enable the optimal use of a given health dataset should only be kept for the amount of time necessary to provide the intended product or service [15].
Thus, successfully linking and accessing biomedical and health data across Europe will require many different disciplines and specialists working together, with a coordinated effort that should encompass controlled access mechanisms to ensure compliance with privacy and data protection regulations. Data providers need logging and monitoring functionalities to comply with the GDPR and to enable tracking of data and methods within the system, controlling instances and routines that check for the adherence to predefined standards and formats to guarantee data integrity. Access mechanisms need to be developed that support the researchers, data producers, and data analysts to request permissions and fulfill the reporting requirements for data use in national and international research projects; this is a significant regulatory, political, and sustainability challenge [34]. Such mechanisms include, in particular, considerations about the rights of patient donors and research participants, taking into account the data protection aspects of various legal systems and local regulations. Researchers have to face differences in the understanding of the right to data protection in those different regional or national European ecosystems.
There is an urgent need for standardized, usable, data-protection-policy-compliant solutions for sensitive data sharing which are capable of integrating and analyzing health data from different sources, organizations, and potentially from different research disciplines. These aspects are subject to ongoing discussions and debates in the EOSC initiative [35]; for instance, progress has been made in the Human Brain Project (HBP) through its Ethics and Society sub-project in collaboration with the project platforms [36, 37]. Other examples of data sharing that are compliant with data protection policies can be found in the recent literature [38,39,40,41,42,43,44,45]. Furthermore, there is the issue of capacity, with the amount of data starting to strain the infrastructure of any individual hospital or research institute. Thus, the interplay between privacy, data security, and access control on one hand and access (including cost-recovery models) to storage, computational, and analysis resources on the other hand will be a defining element of the policy and technology development of a decentralized digital health infrastructure. The evolution of a cloud model that could be used in European health research will also have to take into account other specific aspects of the GDPR [15]. For instance, the European Commission intends to facilitate the free flow of non-personal data in the European Digital Single Market, and for health-related research participants, it codifies the ‘right to be forgotten’. This stipulates that patient donors should be able to retain control over their data regardless of technological developments. A European HRIC could be important in enabling researchers to comply with these requirements. For example, once certain conditions are met between European and international partners, including those pertaining to data protection and use, federated and hybrid clouds could facilitate the deletion of data sets once a donor exercises her/his ‘right to be forgotten’, which could minimize the necessary transfer of large raw data sets across borders, as the deletion can be performed in the original dataset and easily propagated to the relevant federated data sources.