Data's Big Bang: Applying Analytics to AstronomyData's Big Bang: Applying Analytics to Astronomy
The volume of astronomical data continues to expand at an explosive rate. The application of analytics to so much data offers new opportunities and even career options to space and data enthusiasts.
If you're an analytics geek with your head in the stars -- by which I mean you have a fascination for stars, planets, moons, galaxies, black holes, and other astronomical objects -- astronomical research may offer you a dream career path. It might require some background in the field of astronomy -- but it's an option for indulging the two interests of analytics and astronomy.
The astronomical research field is expanding by leaps and bounds. It should come as no surprise that astronomical science has been involved in the big-data/analytics revolution (actually, the tasks of gathering and processing large volumes of data has long been a hallmark of astronomical research). But what's jaw dropping is the payoff in terms of how fast and how enormously our knowledge of the universe is expanding.
The Atlantic magazine's article How Big Data Is Changing Astronomy (Again), noted five years ago that "This isn't your grandfather's stargazing," and explained that "the amount of data we have on our universe is doubling every year thanks to big telescopes and better light detectors."
Other whiz-bang systems, such as the Laser Interferometer Gravitational-Wave Observatory (LIGO), detected gravitational waves for the first time just over a year ago, emanating from a collision of black holes about 1.3 billion light years away. That discovery culminated roughly 40 years of observation and data collection.
The analytics revolution is intensely influencing the direction and future of astronomy, according to a paper by Michael Garrett of ASTRON (the Netherlands Institute for Radio Astronomy) and University of Leiden. Garrett presented the paper titled Big Data Analytics and Cognitive Computing: Future Opportunities for Astronomical Research at a 2014 radio-telescope astronomy conference. Here's an excerpt of what Garrett wrote:
"The days of the lone astronomer with his optical telescope and photographic plates are long gone: Astronomy in 2025 will not only be multi-wavelength, but multi-messenger, and dominated by huge data sets and matching data rates. Catalogues listing detailed properties of billions of objects will in themselves require a new industrial-scale approach to scientific discovery, requiring the latest techniques of advanced data analytics and an early engagement with the first generation of cognitive computing systems. Astronomers have the opportunity to be early adopters of these new technologies and methodologies: the impact can be profound and highly beneficial to effecting rapid progress in the field."
Kirk Borne, currently the principal data scientist at the Strategic Innovation Group at Booz-Allen Hamilton, weighed in on the topic in a 2014 Data Science Weekly Interview, back when he was professor of astrophysics and computational science at George Mason University. Borne recounts how he realized the importance of data in astronomical research in 1998 as he was working for Raytheon in NASA's Astrophysics Data Facility.
"... I realized that that the huge increase in data volumes [was] leading to huge potential for new discoveries. To achieve those discoveries, we needed the special machine learning algorithms that are used in data mining. I began devoting all of my research time to data mining research, initially on the very same colliding galaxies that I had previously studied 'one at a time' but now 'many at a time.'"
Borne also describes how his team "secured grants to discover unusual super-starbursting galaxies in large astronomy data sets." To provide the analytics, the team adapted a neural network model originally designed to identify wildfires in remote sensing satellite images of the Earth. These were converted into an analytical model deploying data mining algorithms operations on distributed data -- "one of the first successful examples of 'ship the code, not the data.'"
Examples of modern astronomical research projects illustrate the magnitude of typical datasets and some approaches to analytics. The Next Generation Virgo Cluster Survey (NGVS), described in a 2012 InfoWorld article, seeks to map the approximately 2,000 galaxies of the Virgo Cluster, the closest large star cluster to Earth's own Milky Way galaxy. InfoWorld reported that "the data collection process adds terabytes per week, yielding hundreds of terabytes to analyze." Facing this daunting task, project leaders turned to the Canadian Advanced Network for Astronomical Research (CANFAR) -- "the first dedicated cloud computing platform for astronomy, used to store, share, and analyze the data for astronomers worldwide." Researchers further "determined that machine learning, a type of advanced analytics with origins in artificial intelligence, would provide the most productive approach to accurately identify galaxies and generating the full Virgo Cluster map."
Another major project, the Square Kilometre Array (SKA), aiming to increase the speed of sky surveying by 10,000 times, has enlisted the Murchison Widefield Array (MWA) radio telescope, located in Murchison Shire (some 800 km north of Perth, Australia). As described in a 2015 Phys.org news report titled Radio astronomy backed by big data projects, the MWA is "designed to look back in time, to study the formation of the first stars and galaxies in the universe, less than one billion years after the Big Bang" (about 13.8 billion years ago). So what sizes of datasets are involved?
In just 18 months alone, the MWA collected over 4 petabytes of data. The Murchison systems produce data streams at approximately 60 gigabits per second, processed on-site in real time using GPU-based (graphics processing unit) signal processing "as the first stage in a hierarchical data processing strategy." Output data streams are transmitted over a dedicated 10 Gbps optical fiber network to the Pawsey Supercomputing Centre in Perth.
These examples represent just a minute sampling of the multitude of truly exciting astronomical research projects across the globe that are utilizing analytics to process ever-larger datasets and broaden humanity's knowledge of our own universe. The possible data analytics-related career opportunities could be significant. Candidates for technical positions would likely need to have some background in the given technical field -- but this might have a lot of different forms and levels.
Some positions might require college-level degrees, minors, or significant hours in astronomy-related studies, perhaps also some additional hours in data analytics. Or for example a degree in data analytics with a minor in an astronomy-related field. Possibly there are also opportunities for consultants -- for example, helping to build more effective systems for large-volume data processing -- as well as implications for college-level courses and contractual training services to acquaint personnel involved in astronomical research with advanced data analytics methods for processing big data on a mind-boggling scale.
About the Author
You May Also Like