Data is the new oil and in today’s era of Big Data, it is critical for companies to not only be able to store that data efficiently but also process, analyse and glean insights from it to help their businesses grow.
This is where data science comes in. Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract patterns and meaningful information from a vast volume of structured and unstructured data to help make business decisions.
Ramya Ragupathy is one such data science practitioner. As a spatial data engineer, she works in the domain of OpenData and machine learning. Currently, she works to develop tools related to data analytics and data export at a nonprofit called Humanitarian OpenStreetMap Team (HotOSM), which deals with disaster response and disaster preparedness using open source spatial data.
“Spatial data, in simpler terms, is location data. Whatever you see in the real world, imagine all of that being digitised — every building, every point of interest, like restaurants or ATMs. A lot of human effort goes into building this data. What you see on Google Maps, I’m doing it on an open-source scale. OpenStreetMap is the bedrock of all the work I do,” says Ramya.
Computer science background
Before her transition to data science, Ramya — who comes from a computer science background — began her career as a built engineer. As part of her work, she would use many proprietary tools. “After a year or two, it became very mundane because with proprietary tools, the scope for growth was very limited,”
A search for open source alternatives to her work introduced her to a wider open source community where she could interact with people from across the world. Between 2008-2011, she learnt about the OpenStreetMap project and began contributing to WikiMaps as well.
“I did it in my spare time while continuing to work. It also didn’t require much effort, because all the information was available on the Internet through YouTube videos,” she says.
Data traverses multiple fields, domains and industries. Therefore, data scientists need to have domain expertise, programming skills, and knowledge of information science, mathematics and statistics. They apply machine learning principles to numbers, text, images, video, audio, and more to produce artificial intelligence (AI) systems that can make sense of the deluge of data.
Students have a wide range of career opportunities, with areas of specialisation ranging from data engineering, to data analysis, data visualisation and modelling, data mining, public policy and data journalism.
Today, there are many paid, one-year certificate courses. “Enrolling in such courses would provide you access to lots of datasets, which would otherwise be hard to come by as a student. This would be ideal for someone going on a traditional data science path,” Ramya says.
But, if one wants to learn data science with an open source combination, there is no dearth of free YouTube videos and online tutorials.
Ramya also advises students to keep tabs on organisations that do open source work and reach out to the community at large.
Data Meet and Data Kind are two such groups that upload clean open source datasets in a public GitHub repository (an online portal where all source code for a project is maintained) that can easily be understood and built on by beginners.
For those from a non-programming background, there are initiatives like Free Code Camp (FCC) that are open to all without any restriction based on skill levels.
While the FCC is specifically for programming, Coursera has plenty of free courses for data science, computer vision, data engineering or basic programming. A digital certificate is the only difference between free and paid courses.
Today, unlike 10 years ago, data science is a booming field. With several ride aggregators, delivery-based services switching to Open Street Maps, companies are on the look-out for people who can better explain location-based data, says Ramya.
The challenges, however, remain. Data science practitioners face language and cultural barriers and a lack of diversity within the community.
“All open source projects have the same predominant profile — white and male, resulting in greater interaction only from that demographic within a forum. So, a diverse and inclusive team becomes important to check bias in the data and algorithms we build,” Ramya says.
For those wishing to take up or transition into data science, Ramya offers three keywords: curiosity (to learn something new and go beyond a limited scope), consistency (in skill and habit building by reading and making regular Github contributions) and communication (reaching out to groups and communities).
Mapping Niches is a fortnightly series that sheds light on careers that are off the beaten track, through the eyes of professionals working in a particular field