Preserving software: the Software Heritage Archive

The Software Heritage Archive project sets out to collect and preserve software in source code form.


The goal of the Software Heritage initiative is to collect all publicly available software in source code form together with its development history, replicate it massively to ensure its preservation, and share it with everyone who needs it.  The software is curated and made accessible by the project team “because only by sharing it we can guarantee its preservation in the very long term.”

The Archive project team emphasises that all the source code it collects will be:

  • available - the code will be stored, preserved and made accessible on the long term
  • traceable - each software component will get a unique identifier that can be relied upon in the long term
  • uniform - despite the great variety of origins, all of the source code collected in our archive will be accessed through the same uniform API

The project infrastructure is designed to be:

  • transparent – (open architecture; Free/open source software; collaborative development
  • intrinsic unique identifiers – (unique references that can be used in textbooks, documentation, build instructions and many other places to build a consistent web of knowledge); intrinsic identifiers with no need to rely on a third party to know whether a given identifier corresponds to a given artefact.)
  • distributed and multi-stakeholder infrastructure – (no single point of failure; a multi-stakeholder network of peers)

For more information on the Archive, visit the website.