Big data, big deal?

The increased volume and variety of data brings both opportunities and challenges. Big issues and big ideas were discussed at the Eduserv Symposium.

The big data phenomenon

The 2012 Eduserv Symposium looked at the phenomenon of 'Big Data', unravelling whether it represents a challenge or an opportunity and how we can best make use of it. This year's event was held at the Royal College of Physicians.

The introduction from Andy Powell of Eduserv, stressed the importance of uncoupling the issues of Big Data from the industry hype. He discussed the increase in velocity, volume and variety of data, and noted the rise of a new profession to tackle these issues: the Data Scientist, drawing upon disciplines from both computer science and vertical industries to process, analyse and store Big Data.

Dramatic data volumes

Rob Anderson, CTO EMEA of Isilon / EMC spoke about his experience of Big Data, examining how dramatically data volumes have risen over the last decade. Rob observed that during this decade, the 'digital universe' will grow from 0.92 Zettabytes to 35.2 Zettabytes - a 44-fold increase, and 90% of which will be unstructured data. This data will come from positioning information, mobile sensors, utilities, gene sequencing, video surveillance and a wealth of other sources

Dr Guy Coates, from the Sanger Institute / Wellcome Trust then looked at Big Data within gene sequencing. His team alone generates a terabyte of data each day and Dr Coates noted that Moore's Law is insufficient to save organisations from Big Data. Dr Coates also looked at how cloud computing can assist with managing Big Data, stating that whilst cloud should be the saviour of Big Data, with infrastructure being rapidly provisioned, there are a number of barriers.

Primarily, transferring large volumes of data to the cloud when many cloud providers are not on the high-speed JANET network frequently makes this undertaking unfeasible. At present, Dr Coates observed, it still makes more sense to retain and analyse data in-house, but he was keenly looking forward to a time when cloud could reduce costs and increase speed of data handling.

Other highlights

  • Prof. Anthony Brookes, Univ. Leicester looked at the discipline of Knowledge Engineering and how developing and funding this could help to optimise and personalise medicine. He stressed the current gap between research and medicine practice, and how knowledge engineering could not only help to solve this problem, but also create a positive feedback loop and continuously improve medical practice.
  • Graham Pryor, from the Digital Curation Unit discussed how not only do we have a Big Data problem, but we have a much more fundamental data management problem across the industry and this must be addressed before we even begin to harness Big Data.
  • A 'lightning panel' gave short views on Big Data, including myself, Simon Hodson from JISC and Simon Metson from Cloudant, all of whom addressed different issues in Big Data, from the current inadequacy of solutions to analyse social media, the need for research data management and the difficulty of processing - rather than storing - Big Data
  • Max Wind-Cowie from Demos gave a view on how Big Data can empower public servants, including how Big Data can dissolve some of the 'toxicity' around public services such as JobCentres by providing a more bespoke, individual experience akin to that provided by commercial services such as Google and Amazon.
  • Anthony Joseph, from the Univ. Berkeley, California, gave a view on the scope of Big Data, and AMPLab's ambition to unify the algorithms, machines and people which currently process Big Data. He looked at applications for this, including microsimulation of the urban environment, planning the long-term effects of bridge sites on cities and traffic for example.

Andy Powell concluded the day with a few challenges for the next generation of 'Big Data Scientists'. Not all data which is big is Big Data; in other words, a large, fast or varying amount of data does not necessarily present a Big Data problem. Secondly, there seems to be confusion about whether dealing with Big Data involves curation, management or analysis - whilst purists would emphasise the latter element, many speakers called for best practice in the first two categories.

Despite these challenges, Andy was positive about the future for Big Data and ended by thanking the speakers for what was a very constructive and insightful day.

Devin Gaffney is a Research Assistant at the University of Oxford.

Picture courtesy of D Sharon Pruitt via Flickr.