Towards a data sharing Code of Conduct for international genomic research

Data sharing is increasingly regarded as an ethical and scientific imperative that advances knowledge and thereby respects the contributions of the participants. Because of this and the ever-increasing amount of data access requests currently filed around the world, three groups have decided to develop data sharing principles specific to the context of collaborative international genomics research. These groups are: the international Public Population Project in Genomics (P3G), an international consortium of projects partaking in large-scale genetic epidemiological studies and biobanks; the European Network for Genetic and Genomic Epidemiology (ENGAGE), a research project aiming to translate data from large-scale epidemiological research initiatives into relevant clinical information; and the Centre for Health, Law and Emerging Technologies (HeLEX). We propose seven different principles and a preliminary international data sharing Code of Conduct for ongoing discussion.


Introduction
As early as 2002, the International Ethics Committee of the Human Genome Organization (HUGO) stated that human genomic databases should be considered as global public goods [1]. In this statement, global public goods were defined as goods 'whose scope extends worldwide, are enjoyable by all with no groups excluded, and when consumed by one individual, are not depleted for others' [2]. Buttressed by the Bermuda Principles of 1996 [2] and mirrored in the Fort Lauderdale rules of 2003 [3], the common philosophy of sharing resources was reaffirmed in the 2008 International Summit on Proteomics Data Release and Sharing Policy in Amsterdam [4] and in the Toronto International Data Release Workshop of 2009 [5].
Finally, in January 2011, 17 major health funding agencies signed a joint statement on sharing research data to promote and improve public health [6]. However, the challenge is to take these fundamental values of sharing and access and to develop guiding principles and procedures that can be used as a basis for emerging practice.
To begin, we consider data sharing as a form of data processing as defined by the EU Directive 95/46/EC on data protection [7]. In this directive, data processing refers to: 'any operation or set of operations which is performed upon personal data, whether or not by automatic means, such as […] retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available […]' [7]. Data can include raw data, genotype/phenotype data and data included within governmental health administrative databases. Theoreti cally, personal medical records could be subsumed under this term, but we have not specifically addressed such data because their regulation is jurisdictionspecific. The code's principles, however, remain pertinent to such data. For the terms 'coded' and 'anonymized' , we use the definitions provided by the 2007 International Con fer ence on Harmonization [8].
Data sharing is regarded as essential for enabling and promoting genomic research in a way that will maximize the benefits to public health [6] and society [9]. All countries, funders and investigators are aware of the need for research ethics and governance mechanisms in research, but currently there is little policy guidance that is specific to the international sharing of genomic research data. In view of the recent calls for the develop ment of common principles applying to data access and use [7,10], Public Population Project in Genomics (P 3 G) [11], European Network for Genetic and Genomic Epidemiology (ENGAGE) [12] and Centre for Health, Law and Emerging Technologies (HeLEX) [13] are work ing on an international data sharing Code of Conduct (Box 1). This has a dual purpose: to elucidate shared values and to provide guidance on the basic obligations

Abstract
Data sharing is increasingly regarded as an ethical and scientific imperative that advances knowledge and thereby respects the contributions of the participants. Because of this and the ever-increasing amount of data access requests currently filed around the world, three groups have decided to develop data sharing principles specific to the context of collaborative international genomics research. These groups are: the international Public Population Project in Genomics (P 3 G), an international consortium of projects partaking in large-scale genetic epidemiological studies and biobanks; the European Network for Genetic and Genomic Epidemiology (ENGAGE), a research project aiming to translate data from large-scale epidemiological research initiatives into relevant clinical information; and the Centre for Health, Law and Emerging Technologies (HeLEX). We propose seven different principles and a preliminary international data sharing Code of Conduct for ongoing discussion. flowing from it. Given the varied disciplinary back grounds of researchers working in genomic research, it can no longer be presumed that all the scientists engaged in data sharing are bound by the same medical or other professional deontological frameworks or can be subject to disciplinary action for a breach. Therefore, the pro posed international Code of Conduct for data sharing in genomic research seeks to provide common guidance on the basis of two fundamental values: (i) mutual respect and trust between scientists, stakeholders and partici pants; and (ii) a commitment to safeguarding public trust, participation and investment. The elaboration and eventual implementation of such a code should be the object of ongoing discussion and will begin with a series

Box 1. International Data Sharing Code of Conduct Preamble
This proposed international data sharing Code of Conduct seeks to promote greater access to and use of data in ways that are (as proposed by the joint statement by funders of health research [6]): 'Equitable: any approach to the sharing of data should recognize and balance the needs of researchers who generate and use data, other analysts who might want to reuse those data, and communities and funders who expect health benefits to arise from research.
Ethical: all data sharing should protect the privacy of individuals and the dignity of communities, while simultaneously respecting the imperative to improve public health through the most productive use of data.
Efficient: any approach to data sharing should improve the quality and value of research and increase its contribution to improving public health. Approaches should be proportionate and build on existing practice and reduce unnecessary duplication and competition. '

Quality
Irrespective of the discipline, scientists involved in data sharing should be bona fide researchers.
Proof of academic or other recognized peer reviewed standing is essential.
Harmonization of data collection and archiving methods and tools ensures validation of scientific quality.
Collaboration promotes efficiency, sustainability and comparability.

Accessibility
Facilitation of both the deposit of data and secure access to data are the foundations of data sharing.
Curators of databases should promote sharing to generate maximum value.
Harmonization of deposit, access procedures and use promotes accessibility, equity and transparency.

Responsibility
Responsible governance should be shared between funders, generators and users of data.
Investments in databases require coordination, strategy and long-term core funding.
Mechanisms for building interoperability should be encouraged and appropriate management anticipated.
Capacity building and recognition of all the data generators contributes to best practices.

Security
Trust and the promotion of data sharing rely on data management and security mechanisms and also on oversight of their functioning.
Mechanisms for identifying and tracking data generators and users should be international.

Transparency
Key policies on publications, intellectual property, and industry involvement should be public.
Websites that are accessible to the general public serve to provide feedback on progress and general results.

Accountability
Inter-agency co-operation and funding fosters streamlined and efficient monitoring and good governance.
Provisions should be made for ongoing public engagement that is tailored to the nature of the database and local cultures.

Integrity
Mutual respect between all stakeholders is founded on personal and professional integrity.
Prevention of harms and anticipation of public concerns and scientific needs through foresight mechanisms encourage the development of common, prospective policies.
Irresponsible research practices should be reported.
Sanctions for breach of this Code or of other legal or ethical obligations must be clear. of consultative discussions at international, European and national fora.

Principles and procedures: background and rationale
Although we are not attempting to prioritize or in any way create a hierarchy among various principles in the field of data sharing, they all derive from a shared belief in maximizing both scientific quality (Box 1, point 1) and public benefit through rapid release and public accessi bility to data (Box 1, point 2) [14].
The assurance of quality is sine qua non for ethical science. Making it an explicit requirement reiterates its importance and mandates comparison, validation and replication, thereby ensuring appropriate and common standard operating procedures and the use of accredited facilities. Prospectively harmonizing procedures to facili tate interoperability and comparability is likely to promote such quality and accessibility.
There is no doubt that maximizing public benefit, investment and participation is facilitated through data sharing. Not only should access be equitable for research ers in both the public and private sectors, but ethics reviewers should have the proper training and tools to evaluate international requests. The datasets themselves may be derived from the contributions of multiple sources from different countries and projects. The current legal and ethical constraints and bottlenecks to access are obvious. Indeed, multiplicity of ethics review may well be the Achilles heel for efficient sharing.
The tripartite responsibility of the data producers, users and funders lays the foundation for data sharing (Box 1, point 3). We see data sharing, which is often a condition of funding, as part of the efficient and proper stewardship of public funds. It also binds eventual users in the recognition of a just return on public investment and participation. This responsibility is chiefly expressed both in the security mechanisms that translate the principle into the construction of information technology tools and firewalls and in the governance framework.

Security mechanisms
Security mechanisms (Box 1, point 4) go well beyond the application of firewalls or deidentification techniques, such as coding or anonymization. Indeed, unique, digital identifiers (IDs) for biobanks [15,16] and for researchers [17] have been proposed not only for security purposes but to facilitate access. Such IDs would enable verification and validation of the identities and credentials of researchers by institutions and would become a mecha nism for allowing, tracking and auditing access, as well as attributing contributions.
Digital identifier systems allow data tracing and pros pec tively limit the potential for malicious activities involving reidentification of participants. This trans parency of data flow, access and use also curtails the possibility of prepublication scooping between producers and users (Box 1, point 5). Prepublication data release depends on the respect by users and journals of publication moratoria that allow data producers to share data openly but provides a period of time to analyze and publish their own data before secondary users do so. Proper acknowledgement of the use of data resources also allows funders to track their 'investments' . It allows the public to see that their altruistic participation has led to fruitful scientific endeavors. Most importantly, data users agree not to use intellectual property protection in ways that would prevent or block access to, or use of, any element of the dataset or any conclusion drawn directly from it [18]. This does not prevent further research with attendant intellectual property rights in downstream discoveries provided that the best practices for licensing policies for genomic inventions are followed.

Governance framework
Good governance underpins a system of data sharing that depends on trust. Approaches to governance necessarily vary between contexts and countries. Irrespective of these differences, governance should be flexible in the oversight and monitoring systems put in place. This is crucial because public trust, which is increasingly trans lated through broad consents, is counterbalanced by both security systems and governance. It could be asked whether in considering the longevity of large inter national datasets, including samples, separate governance models should be developed as distinct from local insti tutional mechanisms or those applicable to the oversight of clinical trials.
Good governance assures the public and funders of proper accountability and ethics review (Box 1, point 6). Although local laws and ethics review systems vary, the ethics norms and biobank policies applicable to large data repositories are beginning to emerge [19,20]. These common norms are increasingly mirrored in model material transfer and access agreements [10]. Contractual in nature, they serve to bind researchers and their institu tions. Implicit in such agreements are the very principles under discussion here. By making them explicit by using such contracts, researchers, policymakers and ethics com mittees have tools to work with that are more transparent. For scientific integrity (Box 1, point 7) to be viable, discussion on the nature of such principles and their procedural translation in different contexts will necessarily vary. Nevertheless, mutual respect between all stakeholders and participants can be built on these fundamental principles and procedures. Integrity also entails the prevention of harms, anticipation of public concerns and scientific needs as well as the reporting of irresponsible research practices and the creation of appropriate sanctions [21].
Most importantly, ongoing communication with the public on the 'reality' of data sharing principles and procedures is essential. Thus, lay summaries of the research proposals accessing and using data repositories should be publicly posted. Although there is no personal benefit to participants, such a public registry of research uses ultimately allows participants to withdraw if they disagree with the direction of the research. There are also other mechanisms of communication, such as bulletins and websites. Population studies recontact their partici pants for updates, or to take new measurements, thereby keeping ongoing consent alive and valid.
The most telling aspect of the developments described above, however, is that the underlying values presented here come from the current approaches promoted and used by the scientists and funders themselves. Concern for scientific integrity and mutual respect are then not imposed by legislative or professional fiat but rather reveal an already existing shared ethos on the proper foundations for international science in the 21st century. This augers well for the future viability of the preliminary version of our proposed international data sharing Code of Conduct in genomic research (Box 1).

Conclusion
Addressing the issue of data sharing in the context of international genomic research requires not only a holistic approach, but also the fair balancing of the interests, rights and duties of various stakeholders involved in collaborative endeavors. We have highlighted the need for equitable, ethical and efficient access to data and proposed a Code of Conduct (Box 1) that incorporates seven principles: quality, accessibility, responsibility, security, transparency, accountability and integrity. We trust that this code will foster broader discussion involv ing multiple stakeholders.