CCSG

Data Dissemination

Peter Granda and Emily Blasczyk, 2016

Introduction Guidelines

1. Make a dissemination and data preservation plan that includes archiving, publishing, and distribution, early in the project lifecycle.

2. Preserve sustainable copies of all key data and documentation files produced during the data collection process, as well as those made available for secondary analyses.

3. Conduct effective disclosure analysis to protect respondent confidentiality.

4. Consider the production of both public- and restricted-use data files.

5. Produce data files that are easy for researchers to use.

6. Develop finding aids to guide users in their quest to locate data collections they want to use.

7. Create comprehensive training, outreach, and user support programs to inform the research community about the dataset.

8. Produce comprehensive documentation for all public- and restricted-use data files.

9. Consider disseminating research findings.

10. Make quality control an integral part of all dissemination steps.

References

Further Reading

Introduction

Dissemination is the process by which producers of microdata from surveys and from public and official statistics make their data available to other users. These users may include government officials, academic researchers, policymakers, and the general public. Data may be disseminated publicly without any restrictions (as public-use files) or only to certain users under specific conditions (as restricted-use files). The availability of microdata is often dependent on national laws and regulations. Data and documentation may be disseminated in various formats, but the goal should be to provide complete information in a non-proprietary format that is amenable to long-term preservation.

Several aspects of making data and documentation files available to analysts require special consideration. More is involved in the dissemination process than merely providing data access to interested researchers. Data producers and archivists must assure analysts that the data they provide accurately reflects the efforts of the data collection process, is trustworthy and fully documented, has no confidentiality concerns, and is securely preserved for future use. Disseminating data from multinational, multicultural, or multiregional surveys, which we refer to as '3MC' surveys, can include specific processes, such as standardization, harmonization, and multi-lingual documentation, which may not apply to surveys done in a single country.

An additional aspect of dissemination is how to share research findings with interested parties. Determining who is using the data and why is important to consider as part of a comprehensive dissemination strategy. Many international organizations, social science data archives, and survey research projects also embrace these objectives. Although focused on microeconomic data, the [zotpressInText item="{2265844:NUAM2PXP}" format="%a% (%d%)"], for example, established a set of guidelines on macroeconomic data for member countries to follow in order to provide the public with “comprehensive, timely, accessible, and reliable economic, financial, and socio-demographic data” [zotpressInText item="{2265844:THXKPNFI}"].

⇡ Back to top

Guidelines

Goal: To ensure that survey and statistical research teams in all cultures and countries involved in a 3MC survey follow accepted standards for the long-term preservation and dissemination of data to the social science research community and the wider public.

⇡ Back to top

1. Make a dissemination and data preservation plan that includes archiving, publishing, and distribution, early in the project lifecycle. Rationale

Dissemination is an integral part of the survey research process. It involves the documentation of major steps in the data lifecycle, from initial planning to the production of final data files. This includes, when available and appropriate, detailed information about the survey process (paradata), all data editing steps, and protocols that determine which types of data and documentation files are made available to which users.

Procedural steps

1.1 For multi-lingual surveys, decide on the standard documentation language to be used.

1.2 Identify any documents that should be published in their original language, such as individual country questionnaires, codes, verbatim responses, and nation-specific data files.

1.3 Have a system in place to preserve all major planning and operational documents as soon as they are created.

1.4 Consider including information about the survey process when disseminating data, documentation, and reports. Producers may want to balance the amount of paradata they release with the need to maintain proprietary information about the data collection process.

Lessons learned

1.1 All studies must develop a system for preserving and storing materials. There are a variety of methods that can be utilized which rely on centralized depository. Some examples of dissemination strategies are below.

1.2 Round 4 of the [zotpressInText item="{2265844:Y3GGKRR2}" format="%a% (%d%)"] strongly recommends that participating countries scan their completed paper-and-pencil questionnaires. Hard copies are acceptable where circumstances (e.g., cost) prevent scanning. National partners are responsible for either the scanning or storing of their own questionnaires. Each national partner is responsible for entering and cleaning their own data and delivering a clean SPSS data set.

1.3 The Demographic and Health Surveys Program (DHS) provides 'standard recode' datasets to users. The recode datasets contain the same data as the raw datasets, but in a standardized format where variable names and definitions are, wherever possible, consistent across all surveys. DHS also provides a step-by-step introduction to using DHS data, a tabulation plan, and many other resources for analyzing DHS data.

1.4 All documents related to each round of the [zotpressInText item="{2265844:WYYYNFA5}" format="%a% (%d%)"] are uploaded to a server. This includes, but is not limited to, original unedited (raw) data, fieldwork documents, metadata, and population statistics for coverage and response rates.

1.5 Documentation of [zotpressInText item="{2265844:56MZPUKP}" format="%a% (%d%)"] survey methods and data files is sent to a central data archive no later than nine months after fieldwork is completed. Data are to be sent unweighted, but descriptions of weighting procedures should accompany the datasets.

1.6 Master copies of all important Living Standard Measurement Survey (LSMS) files are kept in a separate, backed-up archive.

1.7 Documentation for the World Mental Health (WMH) Survey is done using the Survey Metadata Documentation System designed by the WMH Data Collection Coordination Centre [zotpressInText item="{2265844:YQSMJYNV}"].

1.8 Countries participating in the World Values Survey are required to submit documentation of their survey methods and data to a central data archive no later than three months after fieldwork has been completed. Documentation must include a completed methodology questionnaire, a report of any questions omitted or added to the original official questionnaire, a report of additional and/or country-specific codes to any questions, official demographic statistics, weights used, and a copy of the original country questionnaire.

1.9 Many institutions which provide research grants for data collection now strongly recommend that grantees prepare a data sharing plan as part of the proposal process. The [zotpressInText item="{2265844:C54KDRPS}" format="%a% (%d%)"] in the United States provide the following justification for their emphasis on dissemination: “data sharing promotes many goals of the NIH research endeavor. It is particularly important for unique data that cannot be readily replicated (see here). Data sharing allows scientists to expedite the translation of research results into knowledge, products, and procedures to improve human health. There are many reasons to share data from NIH-supported studies. Sharing data reinforces open scientific inquiry, encourages diversity of analysis and opinion, promotes new research, makes possible the testing of new or alternative hypotheses and methods of analysis, supports studies on data collection methods and measurement, facilitates the education of new researchers, enables the exploration of topics not envisioned by the initial investigators, and permits the creation of new datasets when data from multiple sources are combined.” This policy has resulted in more data becoming available in the public domain.

1.10 The [zotpressInText item="{2265844:TRE6EAUS}" format="%a% (%d%)"] conducted an informal Web survey of institutional data policies in the social sciences in 2013. IFDO found that there was a growing awareness and interest in data sharing. However, the implementation varies across countries and research funders. The results indicate that the social sciences have more developed policies than the medical and health sciences.

1.11 More than ten years ago, the [zotpressInText item="{2265844:NUAM2PXP}" format="%a% (%d%)"] began to develop a set of dissemination standards “to guide countries in the provision to the public of comprehensive, timely, accessible, and reliable economic, financial, and socio-demographic data.” These standards were considered best practices, but their implementation was completely voluntary, depending on the policies and wishes of each nation. The Fund published a report [zotpressInText item="{2265844:4WUEFGS3}"] about the success of this initiative over the first ten years of the initiative. It concluded that more accurate and reliable statistical information is now being produced by more nations than ever before, but also recognized that dissemination mechanisms are not fully developed in many locations. Nations also have internal challenges and constraints in addressing dissemination goals from resource constraints, shifting priorities, and their ability to generate periodic and timely statistical data.

1.11 The [zotpressInText item="{2265844:WHW7WN2P}" format="%a% (%d%)"] has produced guidelines for access to research data from public funding. It aims to help governments, research support and funding organizations, research institutions, and researchers themselves in dealing with challenges in improving the international access and sharing of research data.

⇡ Back to top

2. Preserve sustainable copies of all key data and documentation files produced during the data collection process, as well as those made available for secondary analyses.

Rationale

Preservation is an important part of the survey lifecycle, a prerequisite for long-term access to valuable physical objects and digital materials. The materials that need to be preserved and kept available to members of the research community include such objects as public-use data and documentation files (including key files used in their construction), copies of the data collection instruments, user guides, information about the data collection process, and reports on field operations. Since dissemination policies may differ among countries, it is important that data producers take the necessary steps to make their collections as accessible as possible to members of the research community. If appropriate repositories are not available, producers may need to organize dissemination of their materials themselves.

Procedural steps

2.1 Define the long-term preservation standards and protocols to be used. Consider digitizing physical objects, commonly-used questionnaires, or other administrative materials documenting the whole data lifecycle, including the design phase of the project.

2.2 There are several digital preservation metrics that can be used to assess digital repositories. Two metrics are the Trusted Repository Audit Checklist (TRAC) and the Trusted Digital Repository Checklist (TDR), or ISO 16363 [zotpressInText item="{2265844:NQWRR77B}"].

2.3 Protect digital materials through storage of multiple copies in multiple locations. An ideal preservation storage situation includes a minimum of several offsite copies of digital materials undergoing regularly scheduled backups. If it is not possible to store materials at multiple sites, preserve at least one copy in a different location.

2.4 Make certain that digital materials remain retrievable through constant refreshment of the media on which they are stored. This is particularly important if removable media (i.e., tapes) are used for storage, since formats and the machines required to read these media change quickly over time.

2.5 Implement a system of version control to maintain older versions of important data and documentation files. Users should be able to follow the changes made from one version to the next. Version control is necessary for users to replicate previous analysis or to test analysis done by others.

2.6 At a minimum, store a copy of all data and metadata files in software-independent formats (e.g., ASCII or XML) which, with proper accompanying documentation, can be read into all major statistical packages.

2.7 Investigate the protocols and standards of digital repositories, such as availability of extracting data and in the areas of multi-site storage, security, and costs.

2.8 Make test runs of copied data to ensure error-free copy processes.

2.9 If possible, work with a trusted digital repository, such as a national or public social science data archive, to preserve all study materials. In doing so, data producers do their best to ensure that their data collections will remain available to the research community.

2.9.1 Such repositories make an explicit commitment to preserving digital information by:

- - Complying with the Open Archival Information System (OAIS) in the U.S. and similar standards in other countries which have their own digital preservation standards and practices [zotpressInText item="{2265844:KHEHZKZY},{2265844:5CZRAUYN},{2265844:J6UQIBY9}"].
  - Ensuring that digital content can be provided to users and exchanged with archives without damaging its integrity.
  - Participating in the development and promotion of digital preservation community standards, practice, and research-based solutions.
  - Developing a reliable, sustainable, and auditable digital preservation repository that has the flexibility to grow and expand.
  - Managing the hardware, software, and storage media components of the digital preservation function in accordance with environmental standards, quality control specifications, and security requirements

2.10 If no national or public social science data archives exist, consider depositing data with an archive in another country, or investigate the possibility of doing so with a national statistical agency or certified provider. Consider archiving collections in one archive, which would keep master copies of files in several locations but minimize the possibility of conflicting versions of data and documentation files.

Lessons learned

2.1 The [zotpressInText item="{2265844:H77SEMSI}" format="%a% (%d%)"] requires data to be archived for a minimum of 10 years as part of its anti-fraud activities.

2.2 Some earlier studies, such as older Eurobarometer surveys, did not preserve individual country data, and thus issues about harmonization emerging some decades later could not be easily settled.

2.3 Data producers should make every effort to extract data that is on media which may no longer be easy to read. Too many data files have been irretrievably lost because the files were never copied to newer types of media.

⇡ Back to top

3. Conduct effective disclosure analysis to protect respondent confidentiality. Rationale

Any plan to disseminate survey data must include very specific procedures for understanding and minimizing the risk of breaching the promise of confidentiality that is made to respondents at the time of data collection. The key goal of disclosure risk analysis and processing is to ensure that the data maintain the greatest potential usefulness while simultaneously offering the strongest possible protection to the confidentiality of the individual respondents. Disclosure analysis has become increasingly important as more and more datasets become available online and as the possibility of linking survey data to other contextual and administrative databases has grown exponentially [zotpressInText item="{2265844:ZR3KYVCE},{2265844:5DB6L8RD}"].

Procedural steps

3.1 Be aware of, and adhere to, the different legislation for disclosure control in each country.

3.2 Disclosures can be categorized as either identity disclosure or attribute disclosure [zotpressInText item="{2265844:8I7528QG}"].

3.2.1 Identity disclosure results from using a single identifying characteristic or a combination of characteristics to discover an individual respondent (e.g. name and address).

3.2.2 Attribute disclosure results from using a combination of indirect characteristics associated with an individual (e.g. an individual in a specific region whose household size is an outlier and who has an unusual marital history, which together reveal the individual's identity).

3.3 Implement a disclosure protocol. A proper disclosure protocol includes an analysis of the most likely outside sources which might allow for the identification of respondents or households.

3.4 Search systematically in the data file for sensitive information such as transcripts of open-ended answers, including International Standard Classification of Occupations (ISCO) occupational variables, identification of primary sampling units (PSUs), birth dates, income, or housing and dwelling information.

3.5 Search also for unusual characteristics and for cells in tables with very low frequencies.

3.6 Undertake both practical and statistical steps to identify cases and variables. This allows for the identification of areas or variables that need to be further masked in order to prevent identification of subjects, either through analysis or by matching study data with data from other external databases. After having decided on which variables present unacceptable risks, mask the relevant information.

3.7 Evaluate data files once those cases and variables are identified. In virtually every case, the data can be masked in various ways that make it possible for public-use data to be distributed, usually through a Web-based system.

3.8 Use appropriate masking procedures to preserve respondent confidentiality while also trying to optimize the usefulness of the resultant data file for analysis. These procedures might include top- or bottom-coding of key demographic variables such as income, removing data for very sensitive variables, and swapping data values between similar cases [zotpressInText item="{2265844:YBDU6ZXY}"].

3.9 Document all confidentiality assurance processes and make a final assessment about the anonymity of the data file.

Lessons learned

3.1 With the enhanced emphasis on privacy in almost all countries, confidentiality reviews of microdata are increasingly important, if not indispensable, to assuring the future availability of public-use data.

3.2 A 2011 experiment used individual-level reoffending and sentencing data in the UK to demonstrate the possibility of disclosure prior to public release. Disclosure resulted from matching data to a local news website [zotpressInText item="{2265844:F97XKZL7}"].

3.3 The practice of reporting examples of privacy violations, particularly in the healthcare field in the United States, has increased awareness of this issue [zotpressInText item="{2265844:88BMWBK6}"].

⇡ Back to top

4. Consider the production of both public- and restricted-use data files. Rationale

In order to ensure that researchers have access to the greatest amount of data possible without compromising respondent confidentiality, data producers, when appropriate, must make every effort to create both public- and restricted-use data documentation files, and make these files available to the research community through secure and predictable channels.

Procedural steps

4.1 Make data files fully available to the research community as soon as possible, within the confines of how the project is organized and financed. If general distribution is not feasible, establish clear rules under which researchers can obtain the data.

4.2 Remain cognizant of any copyright restrictions the data may have. In some cases, even after dissemination, the ownership of the data remains with the principal investigators.

4.3 Provide access directly by the data producer if resources permit, but always send copies to a trusted digital repository for permanent preservation as well, in case the data producer should cease to provide access at some point in the future.

4.4 Consider the creation of less thoroughly masked versions that can be distributed under restricted-use contracts or made available within a research data center or 'enclave' (i.e., a secure environment in which the user has access to restricted data and analytic outputs under controlled conditions).

4.5 Establish clear policies for how researchers may access restricted data files by creating a set of application materials and restricted-use data agreements that specify how researchers can obtain and use such data [zotpressInText item="{2265844:73YJ2YAE}"].

4.6 Distribute restricted files through signed data use agreements. These may incorporate data protection plans, formal licenses, and travel to a special facility at which researchers can access the data in a very controlled environment.

4.7 Create special files for researchers that cannot be matched with public-use files (for example, provide finer grained local information and simultaneously change respondents’ IDs and other matching variables).

Lessons learned

4.1 Consider making clear agreements on data heritage (i.e., copyright transfer after the original principal investigator retires). A German elite study was nearly lost to the academic public due to heritage issues.

4.2 Most data are already paid for by taxpayer money or foundations. Thus, foundations and public funders often ask for free data access (i.e., they deny the principal investigator’s sole ownership on collected data).

4.3 Despite general agreement on the advantages of making data accessible to other researchers, as well as strong data-sharing cultures in many nations, too few social science data collections are effectively preserved. Data archives should do as much as possible to facilitate the deposit process by contacting principal investigators and data producers as they prepare data and documentation files.

⇡ Back to top

5. Produce data files that are easy for researchers to use. Rationale

An effective data processing strategy focuses on the production of data files that will provide optimal utility for researchers. Such files have been thoroughly checked and cleaned, possess uniform and consistent coding strategies, use common formats, and address the potential research needs of secondary analysts.

Procedural steps Processors should perform a series of steps to ensure the integrity and maximum utility of public-use files. Such steps include:

5.1 Address the various ways data may be utilized by creating tools within a Web-based system that permits online analysis, subsetting, and access to documentation. Be aware that online analysis must use fully anonymized data. Data users may be policymakers seeking summary information, analysts browsing for new data sources, or individuals seeking summary analytic information or wanting to quickly download specific variables.

5.2 In order to provide optimal utility for researchers, produce a variety of products for varied constituencies.

5.2.1 Produce setup files and ready-to-use ‘portable’ files in SAS, SPSS, and Stata to address the needs of those who seek to do intensive statistical analyses with particular software packages.

5.2.2 Consider disseminating data on removable media (e.g., CD-ROM or DVD) if appropriate.

5.2.3 Clearly identify the master version and provide access to any previously released versions.

5.3 Format the data files in a way that permits access through a wide variety of statistical packages, all of which will produce the same results no matter how complicated the analysis requested, particularly with any variable where decimal precision is an important consideration.

5.4 Consider creating simplified versions of datasets for use by the wider public, such as journalists and policymakers (i.e., by creating recode variables, such as age of respondents in groups or income in groups; removing detailed information such as household lists; setting missing data properly; etc.). Make such datasets accessible via Web-analysis.

5.5 Make a thorough investigation of any undocumented codes or inconsistent responses, and whenever possible, provide labels such as ‘not ascertained’ if there is no alternative.

5.6 Standardize all missing data values, unless it is not possible to do so due to different cultural understandings (flag such issues carefully). Users doing analyses will appreciate that all 'does not apply,' 'don’t know,' 'refused,' and 'no data available' responses are coded the same way in the data file.

5.7 Create complete and concise variable and value labels which will provide researchers with clear descriptions of their analytic results.

5.8 Provide a printable questionnaire that contains all variable names and values in an appropriate format.

5.9 Consider producing ancillary files for those data collection efforts which cover multiple waves of respondents or several geographic areas. Such files may include recoded variables to summarize information contained in many questions or special constructed variables that producers feel will aid researchers in their analyses.

5.10 Create special subsets of data that take advantage of the longitudinal richness of long-term collections and provide unique opportunities to study important social, political, and economic issues from different perspectives, particularly with regard to the changing characteristics of the sampled respondents. Some examples include:

5.10.1 The [zotpressInText item="{2265844:U7PGAXS9}" format="%a% (%d%)"] project integrated a subset of data from the Demographic and Health Surveys for women of childbearing age and their children from 18 countries.

5.10.2 The [zotpressInText item="{2265844:56MZPUKP}" format="%a% (%d%)"] created modules on specific topics that integrated data for repeating years and across countries. Example modules include 'Religion,' 'Role of Government,' and 'Leisure Time and Sports'.

5.11 Whenever possible and expedient, make individual country datasets available in 3MC surveys.

Lessons learned

5.1 Users increasingly expect data files to come in a variety of formats that will work easily with their statistical package of choice. In some settings, this may be just an SPSS portable file, but in others, data producers and/or archives might need to create the same file in a variety of formats, particularly if a standard database conversion package such as STAT-TRANSFER is not available.

5.2 Be very clear about coding responses that refer to 'item response refused,' 'item response does not apply due to filtering,' 'can’t choose all,' or 'don’t know,' and especially 'no code in data file where a code should be.' All these have different meanings and must get different values. The 'no code in data file' indicates either an interviewer error or error in data editing.

5.3 'Don’t know'/'can’t choose' responses may have different meanings in different countries based on different response styles. Treating all of these responses as missing data may lead to unwarranted conclusions about the attitudes of whole populations [zotpressInText item="{2265844:WFHIMYSQ}"].

5.4 Established 3MC studies share their data in a variety of ways:

5.4.1 The [zotpressInText item="{2265844:Y3GGKRR2}" format="%a% (%d%)"] publicly releases all data and documentation via their website, one year after the completion of fieldwork.

5.4.2 The [zotpressInText item="{2265844:WYYYNFA5}" format="%a% (%d%)"] releases anonymized data through their public website within one year of the onset of data collection.

5.4.3 The [zotpressInText item="{2265844:56MZPUKP}" format="%a% (%d%)"] makes individual national and/or combined datasets available to the scientific community by the Data Archive one year after the calendar year to which it relates.

5.4.4 Living Standard Measurement Study Survey (LSMS) data are usually available within twelve months of the end of fieldwork and are published on the World Bank website for the LSMS study, as well as each country’s statistics office website.

5.4.5 Survey of Health, Ageing, and Retirement in Europe (SHARE) data are distributed through their Research Data Center.

5.4.6 The World Values Survey provides data only to participating countries for a period of two years after fieldwork has been completed; after this period, the data are made available to the worldwide social science community in the form of data archives.

⇡ Back to top

6. Develop finding aids to guide users in their quest to locate data collections they want to use.

Rationale

The capability to query for specific information is critical to all data dissemination systems, from individual data producers with only a few data collections to social science archives with thousands of such collections.

Procedural steps

6.1 Create a robust search engine to query the fielded metadata so that the user can find variables of interest efficiently.

6.2 Allow the search engine to run against a study’s bibliography to enable two-way linking between variables and publications based on analyses of those variables.

6.3 Display the abstracts of the publications with links to the full text whenever possible, in order to realize the full potential of the online research environment.

6.4 Dedicate staff time to continuously searching journals and online databases to discover new citations where the data have been used. Many search engines have the ability to set up 'alerts' that notify a user when new items are found based on a query.

6.5 Encourage data archives to create metadata records for surveys they do not preserve and distribute these records to facilitate their discovery and use.

Lessons learned

6.1 Data usage increases when the data are easy to find and when users know of publications scholars have produced from the data. There are many datasets that would be of interest to secondary analysts if the analysts only knew about them. For example, many surveys were conducted in Latin America and Africa in the 1960s and 1970s which might offer opportunities for interesting comparative analyses with the more recent and much more popular Latino and Afrobarometer surveys. These are not always as visible to researchers, however, as they might not possess immediately obvious substantive or methodological interest.

⇡ Back to top

7. Create comprehensive training, outreach, and user support programs to inform the research community about the dataset. Rationale

Training and support of users will increase usage of the data and encourage comprehensive analyses. It is very important that major survey research producers or archives reach out to the user community effectively, in order to explain the structure of new datasets and to encourage the greatest possible use. The most straightforward way to reach out is to develop an effective online presence, ensuring that the data are easily located and acquired and that metadata and bibliographical citations are also available. Good user support will prevent obvious misuse or possible misunderstanding of the structure and content of the dataset.

Procedural steps

7.1 Organize workshops at relevant professional organizations or attend conferences where 3MC research is a focus soon after the data are released in order to bring early users together to discuss important preliminary results, as well as to ensure that the data are used effectively and that any problems with the data are recognized and corrected.

7.2 Maintain a presence at professional meetings even after the data have been available for a long time. Staff from the project can describe the data, distribute documentation and sample data, and encourage researchers to make use of the data.

7.3 Hold training workshops in different countries to ensure that novice users have a chance to learn about the data from experts and, if possible, from the data production team itself. Users should learn about specific issues involved in data collected in their own countries, as well as how comparable the data collection experience was in other countries.

7.3.1 Without specialized instruction and training, analyses of cross-cultural longitudinal data and repeated cross-sectional data are particularly challenging.

7.3.2 These training courses can be brief half-day or one-day sessions at the time of professional meetings, or they can continue for longer periods (e.g., three- or five-day sessions with a more detailed focus).

7.3.3 Provide the training materials online so that people who are unable to attend can still have access to the information.

7.4 Provide easy access to user support through phone, email, online chat, user forums, and tutorials.

7.5 Track all user questions in a database that creates an accumulating knowledge base and that can also serve to generate frequently asked questions (FAQs).

7.6 Create tutorials, some of which may be offered in video format, to provide help in using the data, the online analysis system, and the major statistical software packages.

7.7 Establish moderated user forums to provide the foundation for an online community of researchers and students who can discuss their experiences using data and learn from each other.

7.8 While all of these procedures can increase the effective use of 3MC datasets, each country must decide on which steps would be most beneficial for their own research communities.

Lessons learned

7.1 In order for participants to fully benefit from the experience, training programs must be well-planned, with a high level of substantive, methodological, and technical expertise. While data producers are usually the people who best understand their data, they may not have the resources or desire to provide ongoing user support for the research community. Some may delegate this task to a data archive, but a joint approach, with data archives providing basic user support and data producers addressing more complicated substantive questions, often works best. In countries were national data archives do not exist, data producers may want to partner with university social science departments or research centers to increase awareness and use of important datasets.

7.2 Complex data sets often require specialized training. Data collection methods or sampling frames often change between different waves or in different countries, and weighting variables may require extensive descriptions. In this context, there is no real substitute for intensive training and ongoing user support.

7.3 The Demographic and Health Surveys have an online user forum for users to post and discuss issues.

⇡ Back to top

8. Produce comprehensive documentation for all public- and restricted-use data files. Rationale

High-quality documentation is essential for effective data use in all surveys, but particularly in 3MC datasets, because of the need to provide comparable information from all countries or study populations. As resources permit, data producers must strive to provide documentation, commonly referred to as metadata, on all aspects of the survey or statistical lifecycle, from initial planning through final data production and its release to the research community. For more information on data processing techniques used preceding dissemination, see Data Processing and Statistical Adjustment.

Procedural steps

8.1 Keep detailed records from the very beginning of the project and make every attempt to record important project events at the time they occurred. This will assist analysts in understanding the goals and purpose of each survey.

8.2 Update documentation continually during the entire lifecycle of the project, and preserve old versions of key files.

8.3 For 3MC surveys, provide complete information about how the survey was conducted in each country or study population, and describe specific procedures and practices involving data collection and data processing activities.

8.4 Consider adopting the [zotpressInText item="{2265844:TRLIEN7U}" format="%a% (%d%)"] standard for producing metadata. The use of this standard, which is based on the use of Extensible Markup Language (XML), allows for specification of each metadata element (e.g., title of the survey, name of the principal investigators, type of sampling) for storage and future searching.

8.4.1 Define a database structure that will be used to store XML elements.

8.4.2 Identify appropriate tools that will access and create XML-coded information in a natural language environment, such as a browser displaying a Web-based form generator.

8.5 XML metadata markup offers opportunities for data producers to create their documentation, as well as several advantages to users of the documentation:

8.5.1 All information that the analyst needs is available in a core document, from which other products (such as text files that contain the necessary information to run statistical analyses in software programs) can be produced.

8.5.2 The XML file can be viewed with Web browsers and lends itself to Web display and navigation.

8.5.3 Because the content of each field of the documentation is tagged, the documentation can serve as the foundation for extraction and analysis programs, search engines, and other software agents written to assist the research process.

8.5.4 Preparing documentation in DDI format at the outset of a project means that the documentation will also be suitable for archival deposit and preservation, because it will contain all of the information necessary to describe all aspects of the corresponding data files. DDI XML should ideally be generated by the CAI system used to collect data, but can also be collected from paper-and-pencil surveys through access to the information in the original questionnaire.

8.5.5 There are many examples of projects that utilize DDI-compliant metadata, both at the individual study level and in multi-study data repositories [zotpressInText item="{2265844:TRLIEN7U}"]. These studies illustrate the value of using these standards, such as:

- - The presentation of instrument documentation, so that users can track the logic of the questionnaire.
  - The creation of question banks, comprising all items asked in multi-year studies, years items were asked, differences in question wording, and so on. XML marked up information gets its full potential when coupled with a database management system and powerful front end tools.
  - The establishment of links to the documentation of related surveys (e.g., those conducted in other countries) with variable text viewable in the native languages assists analysts who want to study relationships among all of the survey items.

Lessons learned

8.1 Many 3MC studies provide extensive documentation online. Some examples include:

8.1.1 The Demographic and Health Survey provides their questionnaires and manuals via their website.

8.1.2 The [zotpressInText item="{2265844:WYYYNFA5}" format="%a% (%d%)"] produces an annual survey documentation report, as well as a report summarizing fieldwork and any deviations for each round.

8.2 Even though the amount of documentation that 3MC studies provide has increased in recent years, there is still a need to provide users with more information about the entire survey lifecycle, particularly through detailed quality profiles (see Survey Quality).

⇡ Back to top

9. Consider disseminating research findings.

Rationale

Dissemination is more than storing or archiving data. Presenting research findings, in addition to making the data files available to other users, is an important step in quality dissemination practices. This section of the chapter discusses dissemination in terms of presenting results of the study and considering who will use the information and why. This guideline is based on the guidelines written by the Community Advisory Board of the University of California San Francisco Center for AIDS Prevention Studies (CAB CAPS), and is adapted for the 3MC context [zotpressInText item="{2265844:77MUVBVK}"].

Procedural steps

9.1 Create a dissemination plan.

9.1.1 Include presenting findings in the study's initial budget. This may include salary, translation, printing, mailing, and/or meeting costs (see Tenders, Bids, and Contracts and Translation: Management and Budgeting).

9.1.2 Create a team which will organize and create dissemination materials.

9.1.3 Get input from study participants, community representatives, and other potentially interested parties on the preferred forum for viewing findings, such as press releases, websites, newsletters, or conferences. Consider offering multiple venues, if possible.

9.1.4 Remember that there may be a need to disseminate findings several times, as new information is collected and updated.

9.2 Make research results accessible to the desired audience(s). Potential audiences and effective methods include:

9.2.1 Study participants:

- - Ask participants if and how they would want to receive results. This can be incorporated as a question in the survey instrument.
  - Create a newsletter for participants.
  - Write any information disseminated in accessible language, and keep in mind the literacy and language needs of the study population.

9.2.2 Community members/target populations:

- - Consider multiple methods, including newspaper articles, radio, and TV news, in order to reach many people.
  - As with study participants, consider the language needs of the community.
  - Explore how research results from cross-national surveys can be disseminated to as many participating countries as possible. Different dissemination strategies may need to be employed in different countries/cultures.

9.2.3 Agencies and service providers:

- - Prioritize contacting agencies that aided with participant recruitment and/or serve the target population.
  - Emphasize practical use of the study results.

9.2.4 Policymakers:

- - Evaluate if research results have potential to impact policy.
  - Send newsletters/articles or reports to local and national government representatives.

9.3 Consider the ethical and legal policies within each country and culture. Individual countries may have different dictates on sharing data within and between countries. See Ethical Considerations for further discussion.

Lessons learned

9.1 Traditionally, researchers disseminate work in peer-reviewed journals. However, practitioners, as well as the general public, rarely have the time, or even ability, to read these types of articles. The CAB CAPS guidelines were created by a committee of activists, teachers, and other stakeholders. Committee members who had participated in research studies were concerned about the lack of accessible findings and developed the above points in order to address dissemination needs. Making the attempt to disseminate results in this way provides more benefit to those who funded the research project, and encourages discussion about the strengths and weaknesses of the original data.

9.2 The [zotpressInText item="{2265844:Y3GGKRR2}" format="%a% (%d%)"] issues reports or bulletins within three months of the end of fieldwork. An advance briefing is offered to top policy makers in the executive and legislative branches of participating countries; immediately thereafter, results are released publicly to the national and international media, civil society, and donors. Releases must be approved by a core partner. Similarly, data from the World Mental Health Survey is available to policy makers in participating countries [zotpressInText item="{2265844:YQSMJYNV}"].

⇡ Back to top

10. Make quality control an integral part of all dissemination steps. Rationale

The value of data depends on the quality of the data itself. Dissemination requires strict compliance to archiving, editing, publishing, and distribution protocols. Dissemination also requires the long-term availability of data and documentation files through constant updates to hardware and software and possible changes in management and staff. Clear procedures must be in place to make certain all files remain readable as statistical and word processing software systems change over time.

Procedural steps

10.1 Establish a quality compliance protocol: an overall plan for regularly monitoring the integrity and validity of all data and documentation files that are available for secondary use.

10.2 Consult with institutions, research associations, and analysts to develop appropriate quality standards. Standards should be developed with researchers to ensure they meet the needs of the relevant discipline [zotpressInText item="{2265844:WHW7WN2P}"].

10.3 Check all dissemination production steps throughout.

10.4 Test archived files periodically to verify user accessibility.

10.5 Establish procedures early in the survey lifecycle to ensure that all important files are preserved.

10.6 Create digitized versions of all project materials whenever feasible.

10.7 Develop specific procedures for assessing disclosure risk to respondents, and execute these procedures whenever public-use files are produced.

10.8 Produce and implement procedures to distribute restricted-use files if applicable.

10.9 Provide data files in all the major statistical software package formats and test all content thoroughly before they are made available for dissemination. In addition, provide data in a non-proprietary format so that users may utilize the statistical package of their choice.

10.10 Designate resources to provide user support and training for secondary researchers.

10.11 Discuss with users their experiences working with the data. This may include surveying users, conference presentations, and collecting user data.

Lessons learned

10.1 The [zotpressInText item="{2265844:LH34UMPN}" format="%a% (%d%)"] in the United States worked with other federal agencies to do a study of Web-based systems for the dissemination of health data, and produced a Guide for Public Health Agencies Developing, Adopting, or Purchasing Interactive Web-based Data Dissemination Systems. The Guide was developed based on the experiences of many health agencies in disseminating their data, and attempts to establish a set of general standards and practices. A checklist is provided to guide agencies in developing a comprehensive Web dissemination system.

⇡ Back to top

References [zotpressInTextBib style="apa" sortby="author"]

⇡ Back to top