Archiving and preserving tweets using a Library Management System

The Welsh Government's Information and Archive Service carried out a mini-pilot project to explore making tweets available via its Soutron Library Management System.

<< back Page 2 of 2

Storage of tweets for research and re-use on LMS

The next step was to work with Soutron to upload, store and provide access to tweets irrespective of whether they were saved as attachments, images or links and to store the metadata accompanying each tweet.  

Soutron started by examining the different variants of metadata and PDFs that were being harvested from the selected Welsh Government’s Twitter accounts. The first thing that stood out was that the PDF contained only an image and didn’t include any metadata. However, a “text” file was also provided with the PDF file, using a common name, which meant that Soutron could load this alongside the PDF as part of an automated load function. The idea of manually typing the words of each tweet into the LMS would have made the project unworkable.  

There was additional metadata that accompanied the Twitter text file and image file, but it was in html format, not ideal to work with. Soutron were able to quickly clean this metadata and load it into Excel so that this could be done. This extra bit of data was important as it contained the URLs of the tweet as well as the referral URL. Critically and very usefully, it contained the name of the PDF file containing the image of the tweet, which meant that they were able to link the metadata with the PDF using the standard Soutron importer tool.

In addition, the simple csv created by the Welsh Government’s Information and Archive Services with the language and description of the tweet provided Soutron with further metadata that could be used for the collection.

We jointly determined the fields that were required and created the required record and field structure using the standard menu driven facilities in Soutron that control the database structure. This included customising a new dedicated Search Result template for tweets.

Testing the results

A test database was used to set up these tasks and to load initial test data sets. The result was very positive and the original objectives were all achieved.

It became clear by experimenting and working with this data set that there is yet further potential. Using Twitter’s APIs it may be possible to automatically extract and index data from specific Twitter accounts, to archive important data that otherwise may get lost. 


"It is exciting to capture and preserve data in this new medium. Librarians and Information Professionals have always been at the forefront of technologies and play a pivotal role to manage vital information, more and more of which is outside of traditional print material".

This is an edited version of an article to be published in eLucidate, the journal of UKeiG.

Marlize Palmer is Head of Information and Archive Services, Welsh Government; James Dawes works in the Information and Archive Services of the Welsh Government.


To explore how Soutron can use their LMS to archive your social media content, get in touch with them today at


<< back Page 2 of 2