By Jonas Svenstrup Hansen on July 8, 2020
Machine learning is transforming the wind energy industry, driven by the requirement for scalable data analysis to reduce costs. Observation of the universe is a domain increasingly dominated by huge telescope projects that produce vast quantities of complex data. Both fields thus share a common problem: There is too much data for the data scientists to analyze.
Scalable data analysis is key to understanding the universe
Much of modern astronomy is centered around a few specialized space telescopes. These telescopes in turn continuously generate enormous amounts of data, the analysis of which requires substantial expert resources. The mere catalog of the sky for TESS, a space telescope launched in 2018, comprises more than a billion stars. This vast scope requires a new approach both in terms of hardware and software, the latter of which I had the opportunity to make a contribution to for TESS during my master’s studies.
Data analysis has been an integral part of astronomy since before "computer" became a term used to describe digital machines. Today, very advanced and scalable algorithms run on supercomputers and excel at extracting information about the universe from images. But recent increases in data volume demand that this approach is applicable on a broader range of topics, including some formerly reserved for expert eyes.
Among these topics is exoplanet hunting, the discovery of planets orbiting other stars than our Sun, and asteroseismology, the study of vibrations in stars.
Vibration data enables us to look inside of distant Stars
It is a lot easier to visit a wind turbine than it is to visit a star. We cannot install a vibration sensor on a humongous ball of glowing plasma far away in space. Instead, we rely on capturing small changes in the emitted light that are caused by vibrations.
Analysis of these vibrations can reveal the hidden insides of stars, a field of study known as asteroseismology. It has been revolutionized by the Kepler space telescope (2009 to 2018) which allowed for consistent and sustained observations with low noise. That kind of data is key to a high-quality vibration spectrum, one of the most important tools for vibration analysis of both stars and wind turbines.
In their older years as they run out of hydrogen to fuse, stars swell up to become red giants as their insides contract. This will also happen to the Sun which, in a few billion years, will expand to fill the sky and engulf the Earth. A part of this process is the ignition of helium inside the stellar core, and the knowledge of when this happens is key to understanding how red giants evolve.
The transition is difficult to see on the surface of the star, but it changes the interior structure of the star and thus how it vibrates. By examining these vibrations, we can discern what happens in the core of a giant star thousands of light-years away. But analysis of the vibration signals requires experts, just like in the wind turbine industry. And there are thousands of stars to analyze, just like there are thousands of wind turbines. A great case for machine learning.
A group of astronomers in Sydney has demonstrated this in recent years. With a convolutional neural network they have classified thousands of red giant stars using data from the Kepler space telescope.
With advanced understanding of so many stars we can begin to better understand the dynamics of not only our own stellar backyard, but of our entire galaxy. Combined with sufficient high-quality data and the infrastructure to supply it, machine learning can scale advanced analysis to enable great improvements in our understanding of the universe.
Finding New Worlds
The thousands of exoplanets that have been discovered in the last decade have been detected primarily through the transit method. Analogous to a solar eclipse, an exoplanet blocks out a tiny part of the light from its host star when directly positioned between the telescope and star during its orbit. The brightness of the star will thus decrease in a way that reveals what type of exoplanet it has as illustrated in the video below.
Credit: NASA/Goddard Media Studios
The blocked out portion of light is, however, very tiny indeed: An alien observer would have to look for a 0.01% dip in the light from the Sun to discover Earth using the transit method, and the dip would occur for only half a day each year. But with extensive and accurate observations from, among others, space telescopes Kepler and TESS, such discoveries are possible.
Many stars, however, exhibit natural variability that can be hard to discern from the tiny signal of an exoplanet. The final classification is therefore always done by a human expert. Before that, however, millions of stars must be weeded out. One applied solution to this task is to use non-astronomers for initial classification through citizen science projects. Another easier scalable method is machine learning.
Automatic detection of potential transiting exoplanets has been a part of the Kepler mission data pipeline. However, with traditional algorithms the amount of false positives, and thus the amount of verification by human expert eyes, is massive.
With the aid of machine learning, the accuracy of algorithms that detect exoplanets can be vastly improved. The aptly named Robovetter has since 2015 screened Kepler data for exoplanet-like signals. This has resulted in many exoplanet discoveries that have been verified by experts.
The Robovetter algorithm has not replaced astronomers, but it has enabled them to analyse data more efficiently, finding new worlds in less time.
Challenges of Machine Learning
Application of machine learning in astronomy is an ongoing area of research. Its potential is inherently dependent on data amount and quality. While there are millions of stars bright enough for detailed study, it is a demanding task to align observations made with different telescopes. But it is required in order to get enough data for the interesting rare cases, be that an Earth-like exoplanet or a peculiar type of star that can challenge a scientific theory.
Thus, to cut even more time and costs, machine learning models should be able to learn from very different types of data. Wind turbine fault analysis has the same issue in learning across turbine types. This is an exciting challenge that needs to be addressed by anyone looking to unleash the power of modern deep learning methods on complex data at an affordable price.
If you want to get hands on with machine learning on astronomy data, the Galaxy Zoo data set is a good place to start. You can even contribute to this data set by classifying some galaxies yourself!
About the author:
Jonas Svenstrup Hansen is a junior Data Scientist at Vertikal AI and an astronomy graduate from Aarhus University. During his master’s thesis he has contributed to the development of a data analysis pipeline for the asteroseismology branch of the TESS space telescope scientific community.