Chris' blog | Upgrading the Scientific Method

Our lives have become intwined with the maelstrom of digital connections happening all around us. Having given us the digital age, kicked off the web and brought about our modern world of rolling news, twitter trends and Facebook ‘likes’, computer scientists are starting to offer something back to traditional Science; the tools and methodologies it needs to conduct Science in the 21st Century.

The problem with science

Science has served us well for thousands of years. It embodies mans eternal pursuit to further our understanding of the world around us. The scientific method is a continual, analytical, structured process for generating new knowledge while attempting to remain as objective as possible. While the scientific method strives for impartiality, the outputs of science, knowledge, are very much intended to be used within our real world. Up until the last century this knowledge has only been obtainable from a small number of sources – peer reviewed articles, published books or from direct conversations with scientists. However this is changing, and changing fast. In our modern world the outputs of science can be seen everywhere; tv, radio, newspapers, magazines, websites, discussion boards, blogs etc. The problem faced by science is in communicating this knowledge to the public in a consistent, accurate and comprehensible way. Academics are often particularly bad at explaining their research to those untrained in the mathematical, statistical and scientific techniques needed to fully understand the concepts being discussed.

Previously journalists have attempted to act as mediators between the worlds of academia and the wider public at large. The problem is that all too often journalists don’t have the time to fully understand the arguments before deadlines and catchy headlines beckon. In todays world, even journalists have been superseded by the myriad of websites, bloggers, vloggers, Facebook posters and twitterrati.

By breaking down the barriers between academics and non-academics, we risk exposing those who are not well adapted to handing such publicity. The public misunderstanding of science is an important concern and can have far-reaching consequences if not handled carefully; as examples such as Climategate, the MMR vaccine and the GM food debate highlight. However there is a growing body of work emanating from the field of computer science that is coming to the scientist’s aid. Stepping back and looking at the wider picture, one might almost conclude that this movement is attempting to improve the scientific process itself.

Tools for Scientists

New initiatives in the digital realm such as High Performance Computing, eScience and Web Science are starting to take a sustained look at how science is conducted on the web. In doing to they are also generating tools to aid the wider scientific community. For example, the myExperiment site is a social network that enables researchers to publish and share their experiments online with each other. The aim is to make it easy for one scientist to upload their procedures (called workflows) onto the web, and allow other scientists to find and download these workflows. This enables researchers to re-run experiments, reduce the time it takes to get up and running with an experiment as well as make it easier to validate other people’s work. Often science can operate at timescales of months and years, but myExperiment gives the opportunity for anyone, anywhere, to instantly download the complete package of information relating to a process and start building on the research themselves.

Another problem facing science is that as our tools and equipment become better, so too does the amount of data we generate. Projects such as the Human Genome Project, Wikipedia and the Large Hadron Collider are generating vast quantities of data, and as the data increases, so too does the science. In the data arena there is a lot of activity around the concept of sharing data; not just between scientists, but openly to anyone. The Linked Data movement is all about making datasets publicly available in a structured format. Moreover, rather then simply publishing your data, the community encourages dataset providers to provide links between datasets; in much the same way web pages link to other pages. Linked Data has found a home with Governmental organisations who are increasingly under pressure from their own citizens to increase transparency. Efforts such as the US’s data.gov and UK’s data.gov.uk are starting to publish public sector data for anyone to access – in areas such as crime rates, education statistics and census data. How this data is used is then up to anyone. Linking data in this way is not a solution to any particular problem, but rather a mindset for publishing and interconnecting data in a structured way.

However the act of publishing data raises more concerns. A scientist needs more then just data, they also need quality data. It’s no good compiling data from an incomplete census report or analysing sales figures that might have been fabricated. Scientists need to know the origins of each piece of data – the who, when, where and how are critical in assessing the reliability of data. In an information-rich land of openly published, linked datasets, provenance is king. To tackle this problem, the W3C are currently working on definitions of what provenance means, models for expressing provenance and ways of storing it addition to the data itself.

It is not just the forms of data sharing and management that are undergoing changes. The pace of the modern world far outstrips that of traditional scholarly peer-review publishing. Scientists can’t wait for a annual, quarterly or even monthly journal publication to keep up to date with the advances of their colleagues around the world. A decade ago, the then-revolutionary concept of publishing scientific articles before they were peer-reviewed (pre-prints) caused a stir in scientific communities and publishing houses alike. Today people are starting to ask if we should be looking beyond the traditional paper-based, peer-review process itself. Alt-Metrics asks if there is a quicker and more encompassing approach to judging the quality of scientific output that can keep pace with the modern world. Techniques around analysing the new ways people are discussing and citing scholarly work on the web, as well as methods for leveraging these new forms of communication need to be developed that allow us to more accurately judge the impact of scientific output.

Where do we go from here

These developments, while radical, are still not the end of the story. We are also missing the tools that enhance dissemination and at the same time increase our understanding of this work. With more humans on the planet then ever before, the wealth of human knowledge is increasing at a pace almost impossible to keep up with. Tools such as Wikipedia, Wolfram Alpha, translation services, and semantic search engines provide small pieces of this jigsaw, but we still need more guidance. How do we know what questions to ask of these services? How do we interpret their results? Where else can I go and who else can I ask?

The answers will appear in time no doubt. My view is that, as the last 10 years of computer science research shows, people are asking the right questions. Now more then ever before scientists need to be able to adapt and evolve to work in the 21st Century. The computer science community needs to support researchers and encourage better ways of working. Together we can drag the scientific world into the digital age.

“Welcome!”
“It’s good to see you.”
“What took you so long?”