Mathematics and computation

Big data

Throughout 2019, we engaged people across the UK and Ireland in discussion about the ways in which big data, through the lens of physics, affects us all.

What is big data?

Part of why big data is so mysterious to the public is because it has so many definitions. Big data can be defined as a dataset that exceeds a petabyte (1 million gigabytes). Or it can refer to the exponential increase, availability and variety of data in society. 

Some characterise big data by the 3Vs: the extreme volume of data, the wide variety of data types and the velocity at which the data must be processed. And more recently, additional Vs have been added, including veracity, value and variability. To many though, big data simply describes datasets so large and complex that storing and analysing them is a ‘big’ challenge.

Big data and physics

Whether it is the European Space Agency’s Gaia satellite, currently measuring the positions and motions of more than a billion stars in the Milky Way, or a small research team trying to improve cancer diagnosis, physics is generating a tsunami of information worldwide. Physicists working in astrophysics, particle physics, climate science and many more disciplines are at the forefront of understanding how we can interrogate, analyse and store big data for the benefit of science and society.  

Big data and the IOP

Throughout 2019, we engaged people across the UK and Ireland with the ways in which physics and big data touch our lives.

We led this initiative through our national public engagement programme, involving our member-run nations and branches and partner organisations across the UK and Ireland.

This theme continues to be explored through other avenues, notably our work in education and through engagement with our membership. For example, we're getting schoolchildren talking about big data and meeting physicists working in data science, through the schools outreach project I’m a Scientist Get Me Out of Here.

Focus areas for 2019

Big data issues that affect us all

There is little doubt big data has benefited society in many ways. From fitness trackers and smart home devices to more targeted and personalised news and shopping experiences, it has made our lives healthier, safer, greener and more efficient. Yet there are also real concerns surrounding the use and misuse of big data. 

Central to this is the issue of consent. In early 2018, Cambridge Analytica harvested the personal data of millions of Facebook users without their consent for political advertising purposes. The scandal catapulted big data into the limelight, prompting public discussion on data misuse and the ethical standards to which technology companies should adhere. 

Months later, Europe introduced the General Data Protection Regulation (GDPR), the world’s strongest data protection rules. GDPR provides better rights for people to access the information companies hold about them, obligations for better data management for businesses and enhanced standards to ensure people consent to their data being collected and analysed.

However, GDPR is not the end of the debate. As more and more data are collected, how much of that data should be considered public, and when should personal data be considered private? For example, it is far from clear if or when a tech company should turn a user’s personal data over to law enforcement authorities. And opinions vary wildly on whether governments should analyse data without consent to understand terrorist activities. 

The latter is directly related to another worrying consequence of big data analytics: discrimination. When algorithms and artificial intelligence are used to interrogate big data for automated decision making, how can we ensure that those decisions are not biased? Many cases have been reported where historical bias in the datasets has led to unfair sentencing or job recruitment, discriminating based on age, gender, race and disability. 

For some people, the only answer is to accept that fighting for privacy in the 21st century is a losing battle. They call for consumer-based tools similar to those wielded by businesses to allow people to make sense of the data they generate, and restore some level of control over their digital footprint. 

Yet even then, big data poses questions for society. How do we secure our cultural memory when we are awash in so much data? What is the future of telecommunications when the radio frequency over which we transmit information is full? And what roles should be left to humans in an increasingly automated and data-driven world?

Small stories about our big universe

Physics has been dealing with huge volumes of data since before ‘big data’ became a buzzword. In the 1960s, high-energy physicists looking to understand the particles and forces that make up the universe pioneered the use of computers for data acquisition, simulation and analysis. Later, efforts to share and understand the wealth of data generated at CERN – the largest particle physics laboratory in the world – eventually led to the development of the World Wide Web in 1989. 

Today, the Large Hadron Collider (LHC) – CERN’s flagship particle accelerator – produces 1 billion particle collisions per second. These collisions provide a window into the fundamental constituents of the universe by generating more than 30 petabytes of data per year. These data are accessed by a community of thousands of physicists around the globe in near real-time through the world’s largest distributed computing infrastructure known as the Worldwide LHC Computing Grid.

 Though this computing infrastructure serves CERN well now, future upgrades and potential successors to the LHC will produce orders of magnitude more data. How will CERN scientists cope with this data deluge? 

The same question faces many huge physics facilities generating and analysing big data, from the upcoming Square Kilometre Array telescope that aims to answer fundamental questions about the origin and evolution of the Universe to the Human Brain Project, which is building a vast collaborative scientific research infrastructure to allow researchers across Europe to advance knowledge in the fields of neuroscience, computing and brain-related medicine. 

By working with industry and openly sharing knowledge, these and many other big science projects are expected to lead to developments in computing. This might mean more advanced supercomputers and big data analysis techniques capable of predicting the effects of climate change or finding new drugs for hard-to-treat diseases. Or it could mean more energy-efficient computing techniques that reduce the carbon footprint of data centres and digital society. How physics experiments generating big data can best benefit society is therefore a key question for the modern world. 

Physics, big data and ourselves

Computational science has expanded scientists’ toolkit to go far beyond what we could hope to observe with our limited vision and short lifespans. For example, astrophysicists can now build entire simulated universes to test relativity that can run automatically from soon after the Big Bang to today and beyond. 

But the changing way in which we collect, store and analyse this information is also changing the nature of what it is to be a physicist. The result is a new way of doing science – theory and experiment are now joined by computation. This new way of working brings with it important questions. Can we trust the results of code incorporating AI and machine learning when we could never hope to understand exactly how it generated those results? How do we ensure reproducibility? And can we only advance physics in big multidisciplinary collaborations? 

Though big data is posing new challenges for physicists, it is also providing new opportunities for all. Project managers, computer scientists, cloud software engineers and various other experts are now needed just to maintain collaborative large-scale physics code bases. At the same time, quantum physicists, neuroscientists and others from diverse fields are all contributing to develop new computing ideas that can accelerate scientific progress. What innovative and unexpected ideas are shaping the future of computing? And what skills will we need to work in physics in the future?