With the emergence of big data and its importance to businesses’ daily and long-term operations, the need for employees who can process data and construct predictive models has never been higher. It therefore stands to reason that data scientists will be highly sought after, and well paid too. In 2019, data scientist was named ‘Best job in America’ in Glassdoor’s annual roundup, earning an average salary of $108,000. But could they be on the brink of extinction?
AI is having an impact in many industries, with the giant leaps in its development thanks in no small part to the work of data scientists. Ironically, however, some now believe that data science is going to be yet another career which is rendered obsolete by AI
“Thanks to its increasing sophistication, AI has now shown a propensity to threaten white-collar professions”
This is not necessarily the common view, however. Data science is an extremely broad discipline, with an equally broad set of skills employed within it. Data science is also an oft misunderstood profession, with business leaders expecting theirs to fulfil a variety of functions.
To find out whether the risk was overstated, BDJ caught up with Alex Gude, a data scientist at Intuit who has previously worked as a high energy particle physicist at CERN and a cosmologist at Lawrence Berkeley Labs.
The Current State of Data Science
A misconception surrounding machine learning-driven automation is that data scientists are fundamentally opposed to it. In reality, data scientists owe much of their appeal to the type of automation that machine learning advancements have afforded.
A data scientist’s role is multifaceted and requires several different stages before any kind of effective predictive model can be produced to generate insights for a company. Firstly, a data infrastructure must be established, whereby relevant, high-quality data can be easily and efficiently gathered and stored. The data must then be cleaned and evaluated before any kind of model can be developed, trained and then deployed.
Furthermore machine learning algorithms are as diverse as the data that goes into them. As such, data scientists do not immediately know which algorithm will be the best fit for the task at hand. It is, therefore, necessary to test a variety of algorithms until the best performing example is found. In addition to this, before a model can be put into production, it may need to be deconstructed several times and edited to ensure that it is production-safe.
“In general, I find most of the challenges in data science are ‘early’ in the process,” says Alex. “That is, things like finding the data in your organisation, getting the data to the place where you can work with it, cleaning it and doing feature engineering on top of it, and labelling it if needed. Once you've done all that and have a nice table of clean, labelled data, training a model is more straightforward. ‘More straightforward’ is why I think I've seen a lot of work targeting automating that part of the process.”
“People who work closely with data scientists understand sufficiently, but not everyone in a company will really get what the job is”
The different elements of data science are currently open to different levels of integration with machine learning optimisation. However, the general ignorance concerning the scope of tasks a data scientist may be adding to the idea that machine learning is in a position to replace data scientists as a whole.
“I think people who work closely with data scientists understand sufficiently, but not everyone in a company will really get what the job is,” says Alex. “I still run into people who have a ‘and then a data scientist sprinkles some data or machine learning on it and magic happens!’ viewpoint. Often this leads to them overestimating how quickly something can be done, and sometimes how well.”
How Will Data Scientists’ Roles be Altered by Automation?
This is where the issue of data scientists being superseded by machine learning becomes less of a tangible threat. The nature of developing and implementing machine learning algorithms is one of dedicated hand-holding from their developers, with processes like data labelling taking up a large amount of resources, and still requiring human insight to infer meaning and trends from the hard data itself.
It is up to the data scientist to use intuition and showcase a flexible enough approach to cater to the specific needs of the business. Of course, this does not mean that other people within the business understand the multidisciplinary nature of the profession, which may lead to uninformed calls to replace the in-house data science team with a cure-all automated solution.
“Most of the changes I've seen in the last four years have been more on the ‘engineering’ automation, and less on the ‘data science’ side”
“Most of the changes I've seen in the last four years have been more on the ‘engineering’ automation, and less on the ‘data science’ side,” Alex says. “Automatic training and deployment through CI/CD pipelines is becoming pretty standard, but I haven't seen a lot of changes in the early stages of the process, like feature engineering.”
This process of measured automation makes sense, as fully automating any kind of complex, multi-stage process is unrealistic to complete in one fell swoop. The data processing process is complex and time-consuming, and the automation currently available to help isn’t sufficient to make a sizable impact.
Data Scientists Don’t Necessarily Resist Automation
There is a common misconception that any form of automation that eats into an individual’s current role and responsibilities will be met with opposition. It is the natural fear of being rendered obsolete by a more efficient replacement. However, the complexity of data science, and its ability to produce predictive models across various industries, puts it in the enviable position of being an adaptable profession in the age of automation.
“I think ML advances can aid data scientists most by removing the routine, menial work and letting them focus on the really hard stuff,” says Alex. “In the long term, of course, data science will be replaced by AI, but I think that's on the same timescale as software development will be replaced by AI.
“Bits and pieces will be automated, but end-to-end is too complicated for current systems. As automation starts to take chunks out of the workflow (with auto-ML, for example) I think it will free up the humans to spend more time on the hard stuff, like talking to stakeholders, understanding customers, and really diving deep into the data.”
“As automation starts to take chunks out of the workflow (with auto-ML, for example) I think it will free up the humans to spend more time on the hard stuff”
The sheer computational power of machine learning algorithms make them an essential tool in discovering patterns and trends amongst vast quantities of data. Data scientists working in tandem with machine learning could lead to big data being utilised in more diverse and impactful ways than previously imagined.
Due to the speed of machine learning’s development, it is difficult to accurately predict the scope of the automation of data science that it can provide, or indeed the speed at which it will be implemented across the industry. For the time being, data scientists like Alex are left to speculate about the future of their profession, but with cautious optimism.
“Freeing up humans to do the really creative work will make data scientists not just happier, but also more impactful. I guess I look at it like tractors: they didn't replace farmers (yet at least), but they made them able to do much more.”
There have been a number of recent examples of AI behaving badly due to bad data. For example, Microsoft’s Tay became aggressively bigoted having spent just a day learning from dialogue on Twitter.
But while the consequences of bad data in examples like Tay are relatively harmless, it can cause far more dire problems when it shows up in areas like healthcare. Bad data can be the difference between someone getting the treatment they need and the AI overlooking them altogether. Like any new technology, the key must be a gentle and measured introduction into working practices. For the time being, machine learning projects need a guiding hand to quality check their outcomes, particularly to ensure that conscious or unconscious bias doesn’t derail them. Clean AI needs clean data - until the latter can be achieved, humans will still have to play an active role.
Illustrations by Kseniya Forbender
To contact the editor responsible for this story:
Margarita Khartanovich at [email protected]
- Ethereum 2.0 and The Back and Forth Decision on Progressive Proof of Work - What’s The Matter?
- What’s Up With... Those Self-Driving Cars? Any Time Soon?
- LiFi and IoT: Could the Future of the Internet be in Streetlights?
- How Enterprises Like Facebook and Google Should Use The Data
- Useful Applications and 5G Will Help Take VR Out Of Basements, Labs and Exhibition Experiences