Archiving and preserving tweets using a Library Management System

The Welsh Government's Information and Archive Service carried out a mini-pilot project to explore making tweets available via its Soutron Library Management System.

Page 1 of 2 next >>

Twitter is used by government departments, Members of Parliament and millions of businesses, non-government organisations and individuals in the UK. It is free to use with a relatively low impact on resources and has the potential to deliver many benefits that support the Welsh Government’s communications objectives.

The Welsh Government uses three main Twitter feeds on a day to day basis:

  • @Welsh Government  The English language account
  • @LlywodraethCym  The Welsh language account
  • @FMWales  The First Minister of Wales’ account

Capturing and archiving social media for re-use is a major challenge. Making the data searchable and interpretable is challenging, but it’s a necessity, especially when considering that most people use Twitter as one of their main sources of news, information and communication. It is important to find ways of incorporating social media into knowledge structures and archives. Social media records can be used for research, especially when used in context with other digital resources such as email and word documents (i.e. “linked information”).

Tweets could also enrich our collection of consultation documents and provide additional context to the collections -

About our pilot project

On the 1 December 2015, the Human Transplantation (Wales) Act came into full effect. It introduced a soft opt-out system for consent to organ and tissue donation. The Welsh Government (WG) used Twitter extensively to promote its PR campaign to increase organ donation registrations, attitude and awareness. It is estimated the Welsh Government managed to reach a total audience of around 1.5 million via its “Organ Donation Wales” social media campaign.

The collection

We decided to trial hosting approximately 178 tweets and associated metadata on our LMS. The tweets that we identified for capturing on our LMS were held in the following file formats:

  • A PDF screenshot of the tweet in question
  • A txt file containing just the text from the tweet in question
  • An HTML file including the metadata of the tweet in question

In addition to this, we used The National Archives’ (TNA) file profiling tool DROID to help profile the file formats in the collection, and to create a simple csv metadata spreadsheet consisting of a file name, short description and type of tweet.

Complexities and challenges

We explored a range of issues and encountered several challenges during our pilot, including:

  • How to process and organise tweets as well as how to physically store them
  • How to provide useful means of access and retrieval
  • Policy challenges – e.g. appropriate access controls; should any information be censored or restricted
  • The minimum/maximum amount of metadata that accompanies each tweet to be captured? 
  • Is Soutron (our Library Management System/LMS) capable of providing access to these tweets?
  • How much work is involved “processing” tweets? What type of indexing and processing is required by information professionals and/or technology experts to make the collection accessible and re-usable?
  • Moving to second generation digital archiving, what type of sophisticated access tools might be required to provide a “basic level of access” for researchers and users of the collection?
  • Broader ethical considerations of the very existence of such a collection (i.e. should there be any access and/or content restrictions and if so would a time limited “take-down” policy be sufficient?)
  • Are there any privacy concerns about creating a permanent archive of government tweets and are there any GDPR related issues?
  • Should we do this at all or will we simply be preserving a mountain of worthless information?

Copyright issues

The general consensus seems to be that most tweets fall outside copyright because very few are copyrightable or may rise to the level of copyright protection. Nevertheless, we decided not to capture other people’s reply tweets to Welsh Government’s original tweets for the pilot.

The data

We worked with Hanzo ( to extract our previously harvested and archived Twitter presence between 2014-2016.  This involved Hanzo technicians writing a code to extract relevant tweets.

Four separate searches were undertaken using the following search terms.

  • organ donation
  • rhoi organau (Welsh language term)
  • #organdonation
  • #rhoiorganau

Hanzo provided the data via download URLs that allowed us to transfer the collection to our systems using Winzip. After analysing the data we decided that the majority of Tweets and Retweets were worthy of retention, whilst the Feeds, Hashtags and Replies were less likely to have any research or re-use value.

Page 1 of 2 next >>