Is 'data labeling' the new blue-collar job of the AI era?
Last year, a factory in China replaced 90% of its workers with robots. In call centers across the world, AI voices are replacing human customer service agents. Eventually, taxi and Uber drivers could be replaced by self-driving cars.
The displacement of workers by technological advances is nothing new. Media theorist Douglas Rushkoff’s new book Throwing Rocks at the Google Bus traces the origins of “digital industrialism,” which has increasingly removed humans from the equation, granting power to corporations and stakeholders instead.
“Things have become incrementally worse,” Rushkoff told TechRepublic. “It’s harder to find a job, or everybody’s working more hours for less money. Technology just seems to put us in this always on state where our labor and our data and our time are being extracted from us.”
But a major fear has been that not only will jobs disappear, but the disappearance will disproportionately affect lower-skilled workers—that all of the newly-created jobs will go to an elite cadre of technology employees such as software developers, AI researchers, and cyber security experts.
But Guru Banavar, the head of the team at IBM responsible for creating Watson, the AI system that mastered Jeopardy, told TechRepublic that this isn’t necessarily the case.
Banavar thinks that there will be “all kinds of jobs available” in the AI era. For workers at all skill levels. And for lower-skilled workers, data processing offers a new area of possibility.
“Data labeling,” is what Banavar calls it. “It will be the curation of data, where you take raw data and you clean it up and you have to kind of organize it for machines to ingest,” he said. “If you look at any of the complicated analytical jobs we have today, 70% of that job is probably about the organizing and cleaning of data.”
“I don’t think people had something called data labelers in the past,” said Banavar. “I think of it as data engineering.”
Banavar is seeing the growth of these types of jobs at IBM. “We are hiring people that we were not hiring even five years ago,” he said. “People who just sit down and label data.”
Why is this important? It’s for the machines to learn.
“Without labeling, you cannot train a machine with a new task,” he said. “Let’s say you want to train a machine to recognize planes, and you have a million pictures, some of which have planes, some of which don’t have planes. You need somebody to first teach the computer which pictures have planes and which pictures don’t have planes.” So IBM hires labelers, or outsources the work.
This kind of work can also apply out of the office.
The “environmental data creation of sensors [is] huge,” Banavar said. “Think about all the sensors that exist and are growing and think about how they can measure the environment.” Like for gathering information about weather patterns, for example. “You have to go out and put out sensors.”
“If you look at the data lifecycle, think about the devices that generate data and then the devices that collect the data and then all of the processing that you have to go through before you feed it to a cognitive system which is over here.”
Toby Walsh, Professor of AI at The University of New South Wales, sees data labelling as a reality for the future, as well—although he’s less excited about it.
“This is a somewhat depressing job for the future,” said Walsh.
“All the impressive advances we see with deep learning have come about using what is called ‘supervised learning’ where the data is labelled ‘good’ or ‘bad,’ or ‘Bob’ and ‘Carol,'” said Walsh.
It’s a necessary part of machine learning. “We can’t do unsupervised learning as well if the data is unlabelled,” said Walsh. “The human brain is excellent at this task. And deep learning needs lots and lots of labelled data. It’s likely though a very repetitive and undemanding task.”
Rushkoff sees the increase in undemanding tasks increasing, as well. “We’ve seen the transition from an employee economy, to a gig economy, to a long distance gig economy on Amazon Mechanical Turk or something like that, where you’re not a human,” he said. “Human beings are used to do the tasks that are actually just too boring for computers to do. Find the number in this picture. It’s not, ‘Oh, let’s get people to write a screenplay together.’ It’s the lowest-level stuff.”
The solution? Rushkoff thinks that we’ve reached a point where “work is really just a way of justifying letting people have what’s already in abundance. If we can’t find enough work for people, then we have to destroy stuff. Having people do useless work is not the answer.
“We’ll have to start developing a market of more creative products. Where people are making video games for each other, and entertainment products, and other things. If people are making this stuff, it’s a more efficient marketplace.
Banavar agrees that humans are essential to the process here. And, he believes, this has implications for thinking about how humans contribute in the age of machines, in general.
According to Banavar, there’s been a “big misconception that if something is simple for a person, it should be simple for a computer.”
“It’s almost the opposite,” he said. “The hard things are easier, the easy things are hard. You can ask a computer to solve a differential equation, it’s much easier than teaching a computer why cars have to always touch the ground.”
We know that IBM and other large companies are employing workers to do data cleaning and labeling but have not been able to find exact figures about the scale of this type of work. Keep up with TechRepublic as we work to uncover more information about these jobs and share stories about our new digital economy.
The 3 big takeaways for TechRepublic readers
- Machine learning, which allows computers to teach themselves without being programmed, will require massive amounts of information to be labeled and cleaned up by humans before it can be processed.
- At a time when many blue-collar jobs will disappear, “data labeling” is a job that low-skilled workers will be able to perform.
- Humans still have an important role in teaching machines how to understand information