Nearly two years ago, Seattle Sport Sciences, a company that provides data to soccer club executives, coaches, trainers and players to improve training, made a hard turn into AI. It began developing a system that tracks ball physics and player movements from video feeds. To build it, the company needed to label millions of video frames to teach computer algorithms what to look for. It started out by hiring a small team to sit in front of computer screens, identifying players and balls on each frame. But it quickly realized that it needed a software platform in order to scale. Soon, its expensive data science team was spending most of its time building a platform to handle massive amounts of data.
These are heady days when every CEO can see – or at least sense – opportunities for machine-learning systems to transform their business. Nearly every company has processes suited for machine learning, which is really just a way of teaching computers to recognize patterns and make decisions based on those patterns, often faster and more accurately than humans. Is that a dog on the road in front of me? Apply the brakes. Is that a tumor on that X-ray? Alert the doctor. Is that a weed in the field? Spray it with herbicide.
What only insiders generally know is that data scientists, once hired, spend more time building and maintaining the tools for AI systems than they do building the systems themselves. A recent survey of 500 companies by the firm Algorithmia found that expensive teams spend less than a quarter of their time training and iterating machine-learning models, which is their primary job function.
Now, though, new tools are emerging to ease the entry into this era of technological innovation. Unified platforms that bring the work of collecting, labeling and feeding data into supervised learning models or that help build the models themselves, promise to standardize workflows in the way that Salesforce and Hubspot have for managing customer relationships. Some of these platforms automate complex tasks using integrated machine-learning algorithms, making the work easier still. This frees up data scientists to spend time building the actual structures they were hired to create and puts AI within reach of even small- and medium-sized companies, like Seattle Sports Science.
Frustrated that its data science team was spinning its wheels, Seattle Sports Science’s AI architect John Milton finally found a commercial solution that did the job. “I wish I had realized that we needed those tools,” said Milton. He hadn’t factored the infrastructure into their original budget and having to go back to senior management and ask for it wasn’t a pleasant experience for anyone.
A PEEK INTO THE TOOLBOX
The AI giants, Google, Amazon, Microsoft, and Apple, among others, have steadily released tools to the public, many of them free, including vast libraries of code that engineers can compile into deep-learning models. Facebook’s powerful object-recognition tool, Detectron, has become one of the most widely adopted open-source projects since its release in 2018. But using those tools can still be a challenge because they don’t necessarily work together. This means data science teams have to build connections between each tool to get them to do the job a company needs.
The newest leap on the horizon addresses this pain point. New platforms are now allowing engineers to plug-in components without worrying about the connections.
For example, Determined AI and Paperspace sell platforms for managing the machine-learning workflow. Determined AI’s platform includes automated elements to help data scientists find the best architecture for neural networks, while Paperspace comes with access to dedicated GPUs in the cloud.
“If companies don’t have access to a unified platform, they’re saying, ‘Here’s this open-source thing that does hyperparameter tuning. Here’s this other thing that does distribute training,’ and they are literally gluing them all together,” said Evan Sparks, cofounder of Determined AI. “The way they’re doing it is really with duct tape.”
Labelbox is a training data platform, or TDP, for managing the labeling of data so that data science teams can work efficiently with annotation teams across the globe. (The author of this article is the company’s co-founder.) It gives companies the ability to track their data, spot, and fix the bias in the data and optimize the quality of their training data before feeding it into their machine-learning models.
It’s the solution that Seattle Sports Sciences uses. John Deere uses the platform to label images of individual plants, so that smart tractors can spot weeds and deliver pesticide precisely, saving money and sparing the environment unnecessary chemicals.
Meanwhile, companies no longer need to hire experienced researchers to write machine-learning algorithms, the steam engines of today. They can find them for free or license them from companies who have solved similar problems before.
Algorithmia, which helps companies deploy, serve and scale their machine-learning models, operates an algorithm marketplace so data science teams don’t duplicate other people’s effort by building their own. Users can search through the 7,000 different algorithms on the company’s platform and license one – or upload their own.
Companies can even buy complete off-the-shelf deep learning models ready for implementation.
Fritz.ai, for example, offers a number of pre-trained models that can detect objects in videos or transfer artwork styles from one image to another — all of which run locally on mobile devices. The company’s premium services include creating custom models and more automation features for managing and tweaking models.
And while companies can use a TDP to label training data, they can also find pre-labeled datasets, many for free, that are general enough to solve many problems.
Soon, companies will even offer machine-learning as a service: Customers will simply upload data and an objective and be able to access a trained model through an API.
PICK YOUR TOOLS, AND BUDGET ACCORDINGLY
In the late 18th century, Maudslay’s lathe led to standardized screw threads and, in turn, to interchangeable parts, which spread the industrial revolution far and wide. Machine-learning tools will do the same for AI, and, as a result of these advances, companies are able to implement machine-learning with fewer data scientists and less senior data science teams. That’s important given the looming machine-learning, human resources crunch: According to a 2019 Dun & Bradstreet report, 40 percent of respondents from Forbes Global 2000 organizations say they are adding more AI-related jobs. And the number of AI-related job listings on the recruitment portal Indeed.com jumped 29 percent from May 2018 to May 2019. Most of that demand is for supervised-learning engineers.
But C-suite executives need to understand the need for those tools and budget accordingly. Just as Seattle Sports Sciences learned, it’s better to familiarize yourself with the full machine-learning workflow and identify necessary tooling before embarking on a project.
That tooling can be expensive, whether the decision is to build or to buy. As is often the case with key business infrastructure, there are hidden costs to building. Buying a solution might look more expensive upfront, but it is often cheaper in the long run.
Once you’ve identified the necessary infrastructure, survey the market to see what solutions are out there and build the cost of that infrastructure into your budget. Don’t fall for a hard sell. The industry is young, both in terms of the time that it’s been around and the age of its entrepreneurs. The ones who are in it out of passion are idealistic and mission-driven. They believe they are democratizing an incredibly powerful new technology.
The AI tooling industry is facing more than enough demand. If you sense someone is chasing dollars, be wary. The serious players are eager to share their knowledge and help guide business leaders toward success. Successes benefit everyone.
originally posted on hbr.org by Manu Sharma