DataCite released its first public data file with metadata for over 52 million DOIs

From metadata harvesters, to research institutions, to bibliometricians, everyone is welcome to use the DataCite public data file.


DataCite DOI metadata has always been openly available. In line with its  commitment to the POSI principles, it makes all metadata registered with DataCite part of the public domain through a CC0 copyright waiver. Its metadata retrieval services—the DataCite REST APIOAI-PMH service, and GraphQL API—allows anyone to retrieve DataCite DOI metadata to enable discovery, promote reuse, and understand the research landscape.

As the number of DataCite DOIs continues to increase, harvesting the complete set of records via our existing tools inevitably takes longer than it once did. Compared with using our APIs, downloading the data file is a faster way to retrieve DataCite DOI metadata: instead of requesting the list of DOIs page by page, users can now download a single (compressed) file in one go.

The public data file contains metadata for all DataCite DOIs. Specifically, this first release contains metadata records in JSON format for all DataCite DOIs in Findable state that were registered up to the end of 2023. Each DOI has descriptive metadata for research outputs and resources structured according to the DataCite Metadata Schema. Many of these records include links to other persistent identifiers (PIDs) for works (DOIs), people (ORCID iDs), and organizations (ROR IDs). Read more about the details of the data file format and structure in our support documentation. Going forward, we plan to release a complete public data file on an annual basis.

A new portal is where you can request a link to download the public data file directly.

For more information and how to give feedback, go here .