UCLA Researh Data Collection Policy

Other UCLA Collection Development Policies Open Scholarship and Collections Policy

PURPOSE

This document describes the policies and practices followed by the Data Archive with respect to acquisitions and archiving of research data. It will be used to work with faculty and research units where surveys and other data formats are generated in the course of research and where there is a desire for long term management of these materials.

Introduction

Over the years, the ability to manage a collection of raw data files used in social research has been directly linked to the technology available for doing so. Data archives need a robust computing infrastructure to carry out routine procedures on the data files to ensure their viability and usability. When computing platforms and operating systems change; when statistical packages change; and when tools for creating descriptive information and metadata about data changes, the archival operation, including its policies, must change too. Therefore this document on acquisitions and archiving will evolve over time to adapt to a changing technological environment.

In addition, issues with respect to respondent privacy and confidentiality, copyright, ownership, and reuse of data are evolving and changing in terms of what is ethical, acceptable and legal. This document will also be revised according to changes in university and regulatory policies and laws and will reflect established best practices in social research and data curation and preservation.

The policies described in this document include sections on the kinds of research data we will accept for deposit, formats, details on collection size limitations, metadata and responsibilities for producing metadata, depositor eligibility, data quality, confidentiality, rights and ownership, categories of access to deposited data, and preservation plans.

This document will be reviewed annually and revised. Version changes will be noted.

Content Coverage

This section describes the kinds and formats of materials accepted for deposit.

Scope

The Archive will manage materials focused on social research broadly defined: Scientific inquiry which measures, describes, explains and predicts changes in social and economic structures, attitudes, values and behaviours and the factors which motivate and constrain individuals and groups in society.

Materials are accepted in any language. Depositors will be encouraged to provide translations of non-English materials into English. Depositors should provide bibliographic level metadata in English.

The Archive reserves the right to refuse to accept materials which will be difficult or impossible to process given the resources, staff, facilities or capacities of the Archive operation.

Kinds Of Research Data

The Archive may accept

  • Models and simulations, consisting of two parts: the model with associated metadata and the computational data arising from the model (see also file size limitations, Section 1f.)
  • Observations where the data will usually constitute a unique and irreplaceable record, including surveys, censuses, voting records, field recordings, etc.
  • Derived data resulting from processing or combining ‘raw’ or other data (where care may be required to respect the rights of the owners of the raw data)
  • Accompanying material including codebooks, data collection instruments, questionnaires, data collection methods and techniques, coding instructions, interviewer guides, graphic flow charts of data collection, summary statistics, database dictionaries, project summaries and descriptions, bibliographies of publications pertaining to the data, and links to virtual or online tools.
  • All materials must be in electronic format. No paper items accepted. No completed questionnaires accepted.

Status Of The Research Data

The Archive may accept

  • Raw data accompanied by complete documentation (See also metadata types and sources, section 2c.)
  • Data ready to be used by persons unfamiliar with the original project
  • Data that are publicly available; data should be de-identified
  • Data that can be used if certain owner-specified conditions are met
  • Summary/tabular data associated with publications
  • Derived data

Versions

The Archive maintains information about each version of a dataset.

  • Version details are recorded in master file for the data
  • Version details may be recorded in a DDI metadata record
  • Multiple copies of the same data in different statistical formats may be kept
  • Version details are recorded for different copies of files or materials held in different formats
  • All versions of the same data are publicly available unless the depositor requests that an earlier version be restricted from access
  • Persistent identifiers, when assigned, will always point to the latest and most complete version and will be linked to the original file(s) where this ensures accurate manipulation of the data
  • The original copy of data file(s) and documentation are retained as deposited
  • Supplementary digital objects are stored with the original data file(s)
  • Subsets or supplementary data file(s) are stored with the original data file(s)

Data File Formats

Statistical (quantitative) data may be accepted as:

  • Portable system files in SAS, SPSS and/or STATA.
  • ASCII format files are preferred as long as they are accompanied by data definition statements for specific software applications.
  • Excel or other spreadsheet format files should be converted to tab- or comma-delimited text

Non-statistical data in electronic format (no paper files) are accepted for: * Textual qualitative data in TXT, PDF, or word processing format * Supplemental materials in PDF or word processing software * Image files in JPEG, GIF format * Audio files in AIF, WMA or WAV format * Video files in MPEG2 or MPEG4 format * Compressed files are accepted as long as they can be uncompressed using commonly available software such as 7-Zip or Winzip.

Format caveats:

  • Every attempt will be made to ensure that file formats will be converted to newer versions of proprietary software (ex. SPSS, STATA, Word) or to new formats, however the Archive cannot guarantee that specific formats will always be converted or remain readable and useful.
  • Preference is given to commonly used file formats that are easily converted to open or non-proprietary formats meeting ISO standards.
  • Statistical files may be converted to ASCII as an additional file format.
  • Persistent identifiers, when assigned, will always link to the latest and most complete version.
  • Where there are multiple formats, the Archive will maintain records to describe each version and format.

Volume And Size Limitations

  • There are currently no restrictions on file sizes, or number of files. However, the Archive reserves the right to refuse to accept materials which are of a volume which may make it difficult or impossible to process given the resources, staff, facilities or capacities of the Archive operation.
  • When necessary, materials may be stored in compressed format.

Metadata

This section concerns various forms of metadata, including administrative, descriptive, technical, structural, and preservation, using established standards to ensure adequate description and control over the long term.

  • Anyone may access the metadata free of charge. Access to some or all of the metadata may be controlled.
  • Use and reuse of metadata is governed by policies and procedures of the University of California.
  • Metadata created by the Archive may be used in another medium with written permission and provided there is a link to the original metadata and/or this repository is mentioned.
  • Using metadata created by the Archive for commercial purposes is not permitted.
  • The Archives repository system will aim to allow metadata harvesting of dataset descriptions by other institutions following the OAI-PMH guidelines, or other harvesting protocols provided written permission is obtained prior to harvesting, and resources of the Archive permit.
  • The Archive may produce full descriptive metadata using the DDI XML standard for uniquely held items. This metadata may be re-usable when written permission is obtained and provided there is a link to the original metadata and/or this repository is mentioned.
  • Depositors of data to be uniquely held by the Archive should allow the reuse of metadata.

###Metadata Types And Sources

Data deposited by researchers should be accompanied by detailed documentation that describes the data, and the processes used to create it. Examples of documentation include but are not limited to codebooks for data files, code for software programs, format specifications, technical reports describing sampling, weighting and other protocols or methods. Data files not accompanied by documentation are not accepted. The following types of metadata will be created from accompanied documentation in the following forms:

Descriptive metadata
The Archive may create MARC and Dublin Core compatible records for inclusion in online library catalogs, such as the UCLA Library, or OCLC WorldCat. Additional finding aids and guides will be created as appropriate using available technology and resources. The Archive will maintain full information relating to the content, structure, context and source of the data; information about the methods, instruments, and techniques used in the creation or collection of the data. Where possible, the Archive will record bibliographic information about publications using the data and will provide links to online versions of those publications.
Administrative metadata
The Archive will maintain details on the data file storage format, access conditions or limitations, copyright and licensing information, and any other information needed for long term maintenance.
Structural metadata
Where appropriate the Archive may maintain details about data files that link together in some logical way. The Metadata Encoding and Transmission Standard (METS) will be used with multiple format collections where appropriate and where resources permit. METS will be used to encourage the reuse and repurposing of multiple format collections but will not be used as a substitute for full descriptive metadata using the DDI XML or other format standards for uniquely held items.

Metadata Schemas

  • The Archive will employ the DDI XML metadata schema for uniquely held items.

Submission Of Data (Ingest)

This section describes policies for depositing data and covers who is eligible to deposit data, data quality, confidentiality of data, embargo, and depositor rights.

Eligible Depositors

  • Members of the UCLA faculty, students and staff are eligible to deposit data types and formats which are described in the Content Coverage (section 1a through 1f) portion of this document.
  • Potential depositors from outside the UCLA community may be considered when materials offered meet the Content Coverage policies outlined in this document.
  • Acceptance of materials is not guaranteed.
  • Depositors may only deposit their own work.
  • Depositors will be asked to sign a Data Transfer Agreement and should have legal authority to do so.

Moderation By Repository

  • Items proposed for deposit are reviewed for eligibility of the depositor, relevance to the scope of the collection and collection formats, resources required to provide access and long term maintenance, and overall appropriateness for long term management.
  • Submitted materials will be checked by the Archive at the time of deposit to ensure that data integrity has been fully maintained during the transfer process.
  • Data sets are compared with data documentation to ensure that there is a match between variables, variable labels, values and value labels and that the number of observations and variables matches the stated numbers in study documentation. Digital videos are viewed in spots and descriptive information on title, participants, dates, and recording details are verified with the depositor.
  • Digital audio recordings are listened to in spots and where necessary converted to a sharable or preservation format. Descriptive information on title, participants, dates, and recording details are verified with the depositor.

Data Quality Requirements

Responsibility

  • Depositors of data (Data Producers) are responsible for the quality of their research data. The Data Archive will be responsible for the quality of storage and access to data.
  • The Data Archive accepts no responsibility for mistakes, omissions, or legal infringements with the deposited data.

Quality Assessment

  • The Data Archive reserves the right to evaluate data quality in order to make decisions about whether to accept content or not.
  • A key consideration is whether or not the data are useful for reuse through secondary analysis, and the Archive has the resources to curate and preserve the data over the long term.
  • Materials should be accompanied by full and complete documentation to enable reuse by others not involved in the original research project.

Confidentiality And Disclosure

  • All depositors should ensure that data meet requirements of confidentiality and non-disclosure for data collected from human subjects.
  • Datasets should be free of direct and indirect identifiers.
  • Direct identifiers include but are not limited to names, addresses, or numbers for telephone, social security, driver’s license, etc.
  • Indirect identifiers are items which, when used with other variables in the data set, may provide enough detail to identify the respondent.

Examples of indirect identifiers: * Geographic detail (zip code, census tract, block number, etc.) * Membership in clubs, groups, organizations, etc. * Names of schools attended by the respondent or respondent’s family * Job titles, positions held in organizations, elected offices, etc. * Personal information such as income, events, certain medical procedures, etc.

Embargo Status

  • Data may be deposited with an embargo on use by others outside of the principal investigator(s) and designated members of the research team.
  • The length of embargo and embargo conditions are negotiated jointly with the principal investigator/depositor.
  • Embargoed data will receive the same processing and checking as other deposited materials at the time of deposit.
  • Metadata may be created but will not be made public nor shared with others.
  • Depositors will be contacted when the embargo period is ending.
  • An extension of the embargo may be negotiated between the Archive and the depositor/principal investigator.

Rights And Ownership

The Data Archive * The Data Archive may translate or reformat datasets to ensure their future preservation and accessibility. * Depositors may embargo deposited data under conditions negotiated jointly between the principal investigator/depositor and the Data Archive. * The Data Archive may make copies of deposited materials for security and backup. * The Data Archive may share deposited materials with other archives under conditions negotiated jointly between the principal investigator/depositor and the Data Archive. The Data Archive may levy fees for the transfer in order to cover costs. * The Data Archive may include metadata or documentation into public access catalogs such as the UCLA Library, or OCLC WorldCat. Additional finding aids and guides may be made publicly available as appropriate using available technology and resources. * The Data Archive is not obligated to reproduce a dataset in the same software as that in which it was originally created. * The Data Archive will take every care to curate and preserve datasets however the Archive assumes no liability for loss or damage to the data sets or any other data while it is stored in the … [Archive] or a repository to which the dataset is subsequently migrated. * Depositors retain the right to deposit items elsewhere in current or future versions of formats.

The Depositor

  • Data collected by UCLA faculty is governed by policies of the University of California. General University Policy Regarding Academic Appointees, APM 020, Section II.5 Publicity of Results states: “Notebooks and other original records of research are the property of the University .”
  • The Depositor will sign a Depositor Agreement.
  • The Depositor will state that the deposited dataset does not breach any law.
  • The Depositor will state that the dataset is not derived from a licensed or commercial product.
  • The Depositor will state that the dataset is an original work and does not infringe the copyright of any other person, organization or institution.
  • If the dataset does contain copyrighted material, the Depositor will be responsible for securing permission from the copyright holder to include this material in the dataset.

OR

  • The Depositor is responsible for deleting any copyrighted or third party material from the dataset before deposit.
  • The Depositor will state that all obligations to a sponsoring agency have been fulfilled.

Access And Reuse Of Data

This section is focused on how deposited data may be accessed and used by the depositor and others not connected to the original data collection.

Access To Data Objects

The Data Archive promotes the philosophy and practice of Open Access. The policies and procedures of the Data Archive adhere to the idea that “Digital research data should be easy to find, and access should be provided in an environment which maximizes ease of use; provides credit for and protects the rights of those who have gathered or created data; and protects the rights of those who have legitimate interests in how data are made accessible and used.”

Datasets and accompanying materials held by the Data Archive are generally publicly available; however there may be applicable restrictions on the reuse of specific items. (See section 4b)

Controlled Access

  • Access to datasets may be limited to specific users, specific campus units, and/or specific external users or organizational units.
  • Access to datasets may be restricted by IP address or other technical authentication technique.
  • Access to datasets may be limited by the number of concurrent users.
  • Controlled access datasets will not be identified in any online archive or library catalogs.

Restricted Access

  • The Data Archive does not maintain a secure non-networked server, nor a Data Enclave for the storage of highly confidential datasets or other restricted format information. The Data Archive will assist depositors in locating such facilities.

Registration

  • Depositors may request that users of datasets register to gain access. When resources permit, the Data Archive may create web forms for registration and will maintain lists of registered users. Such lists will only be shared with the depositor for the purposes of contacting users with news, updates or corrections for deposited datasets.
  • Datasets are downloaded via the Data Archive website. All users will read and agree to terms and conditions of access before downloading.
  • Users of the Data Archive web site will always gain access through their UCLA BOL identification and UCLA campus authentication technology.

Access Methods

  • Datasets may be downloaded directly by users via the Data Archive website.
  • Some datasets may be prepared for use with the Survey Documentation and Analysis (SDA) system. “SDA is a set of programs for the documentation and web based analysis of survey data. There are also procedures for creating customized subsets of datasets.”
  • In some cases, the Data Archive may provide a link to an external source for a dataset which has not been deposited.

###Use And Reuse Of Data Objects

Access And Use

  • The Data Archive will ask all users to read and agree to terms and conditions of access to/use of datasets held by the Data Archive.
  • The terms and conditions of access to/use of datasets held by the Data Archive include a stipulation for the data to be used in accordance with standards for ethical and responsible research practices.
  • The Data Archive will limit use and reuse of deposited datasets to non-commercial, research and instructional purposes.
  • The Data Archive will prohibit users from redistributing datasets (whole or in part) provided by the Data Archive.
  • The Data Archive will permit users to include small portions of datasets, charts, tables and the like in publications.

Instructional Uses

  • The Data Archive will permit faculty to place complete or parts of datasets onto course websites provided access is limited to those registered for the course and authentication through UCLA BOL identification is carried out.

Citation Of Use

  • The Data Archive will ask users of deposited data to acknowledge and cite the original data producer in any published or unpublished research.
  • The Data Archive will ask depositors to refer to the Data Archive as the repository of record when referring to specific datasets in research and which are held uniquely by the Data Archive.

####Copies

  • Copies of data may only be made for personal or instructional use.
  • Copies of data may not be sold or otherwise used for commercial purposes.

Preservation Of Data

This section deals with activities and tasks carried out by the Data Archive to ensure continued access to deposited materials.

Retention Period

  • Deposited items will be retained for the lifetime of the Data Archive, unless otherwise specified by the depositor, or as mandated by policies of the University of California, funding agencies or by government regulations.

Functional Preservation

  • The Data Archive will make every effort to carry out procedures and tasks to enable the continued access to datasets deposited with the Archive.
  • Some file formats may be proprietary and long term preservation may be limited or not possible.
  • Some datasets may be held in an ‘as is’ condition.
  • Some datasets may be transferred to other repositories where doing so will better support the long term curation and preservation of the data.

File Curation

  • The Data Archive will follow established best practices for managing datasets over the long term.
  • The Data Archive will attempt to coordinate data deposit throughout the research lifecycle.
  • Datasets may be stored in a compressed format; details on the compression software used and compression ratio will be stored with the metadata about the dataset.
  • The Data Archive will regularly back up datasets in accordance with established best practices.
  • Datasets will be stored in a secure, climate controlled environment.

Curation procedures may entail:

  • reformatting datasets from proprietary software to software-independent formats
  • migration to newer versions of proprietary software
  • conversion to accepted archival storage formats;
  • recording of curation tasks in metadata about individual datasets

Fixity and Authenticity

  • The Data Archive checks deposited datasets to ensure that the data and the documentation match in terms of variables, variable names, values, value labels, and column number and width.
  • The Data Archive does not currently carry out other routine fixity checks or authentication procedures.

Withdrawal of Data and Succession Plans

This section addresses when and how datasets may be withdrawn or removed and provides details on how datasets will be handled should the Data Archive cease operation.

Conditions for Withdrawal or Removal of Datasets

Datasets may be withdrawn for any of the following reasons:

  • Copyright violation
  • Legal requirements and/or proven violations of legal requirements
  • National security
  • Falsified research
  • Confidentiality concerns

Datasets may be withdrawn at the request of the principal investigator/depositor. The following conditions may apply as negotiated between the Data Archive and the principal investigator/depositor:

  • Datasets and documentation may be entirely removed from the Data Archive
  • Datasets and documentation may be removed from public view/access but retained by the Data Archive
  • Datasets and documentation may be transferred to another archive
  • Metadata may be retained by the Data Archive but may not be searchable or available for public view/access

Closure And Succession

  • If the Data Archive will cease operation every effort will be made to transfer data to another appropriate archive with negotiation between the principal investigator/depositor and in accordance with policies and regulations of funding agencies and the University of California.
  • If the Data Archive will cease operation, data may be returned to the depositor with negotiation between the principal investigator/depositor and in accordance with policies and regulations of funding agencies and the University of California.