Google’s AutoML And BigQuery ML: The Rise Of One-Click Hyperscale Machine Learning

Google's AutoML And BigQuery ML: The Rise Of One-Click Hyperscale Machine Learning
Google’s AutoML And BigQuery ML: The Rise Of One-Click Hyperscale Machine Learning

Two of the greatest obstacles to getting started with today’s deep learning systems have been the lack of truly “point and click” interfaces to creating new models and the immense complexities in scaling machine learning workflows to the production scales businesses work with. While there are tools that make creating new models more straightforward than writing reams of code, few permit the creation of truly production-grade systems built upon transfer learning from some of the world’s largest training datasets and bleeding-edge algorithms, and automated tuning workflows. Similarly, building massive scalable robust machine learning pipelines can be done with many different tools today, but the process is far from a point-and-click experience or a few lines of code. Google’s announcements over the last few years offer a glimpse into how the cloud is transforming the AI workflow experience.

Google’s approach to enabling one-click creation of state-of-the-art deep learning models can be seen through in its AutoML series of products. Today Google’s AutoML range includes imagery, video, text, translation, and even numeric data. Each product leverages transfer learning to allow customers to build their models directly on top of Google’s massive training and algorithmic investments. In some cases, transfer learning can allow users to build new recognition models with as few as a few dozen examples.

The importance of transfer learning in jump-starting new models with minimal training data cannot be underestimated. One of the most costly and difficult aspects of deep learning and the greatest obstacle to successful implementations is the creation and careful curation of the massive archives of true and counterexamples required to train new models. AutoML allows developers to effectively outsource that process to Google, building on top of models trained from its enormous training data investments.

Yet what makes AutoML so powerful is not just transfer learning. It is the automated workflow for constructing, tuning, and optimizing the resulting models. AutoML is not merely a pre-built model ready to be built upon in TensorFlow with reams of code. It is a truly end-to-end automated creation pipeline, accepting a set of training data as input and outputting a final production-grade model without any further developer intervention. Developers do not need any understanding of deep learning concepts nor do they need to write a single line of code. In fact, line of business users could even conceivably begin using these tools to build the predictive, categorical, and filtering models they need.

Similarly, scaling machine learning workflows, both training, and execution, is increasingly difficult as the data sizes companies wish to analyze continue to grow. Most large companies have invested heavily in loading their data into large warehouses like BigQuery and have highly skilled analytics staff capable of writing complex SQL queries. The problem is that there has still historically been a massive gap between the analytic capabilities available natively in these warehouse platforms and the kinds of complex leading-edge machine learning tools companies wish to apply to them. Building these machine learning workflows requires specialized skill sets and technical architectures that are typically in preciously short demand at most companies.

Much as AutoML has begun to bring point-and-click simplicity to transfer learning and automated model construction, BigQuery has begun addressing the warehouse analytics gap through BigQuery ML. Today BigQuery ML offers linear, binary logistic, and multiclass logistic regression, along with k-means clustering. Most importantly, utilizing these models requires nothing more than SQL. No external tools, no data export, no specialization with machine learning toolkits. Just the same SQL skills corporate data analysts are already familiar with.

The ability to perform machine learning in-place offers numerous benefits, both in terms of ease of use and scalability and legal compliance in regulated industries that place stringent constraints on data movement outside designated warehouses.

Perhaps where BigQuery ML shines the brightest is its scalability. The ability to run models over entire live datasets, rather than the traditional process of small stale extracts makes it possible for companies to begin performing such ad hoc at-scale machine learning as part of their routine day-to-day business operations, rather than as special-purpose dedicated external pipelines.

Putting this all together, as deep learning is advancing beyond the research lab, cloud companies are steadily lowering the barriers to access to advanced machine learning technologies. The newest generation of point-and-click model creation tools like AutoML and warehouse-scale analytics like BigQuery ML offers a glimpse at a future in which machine learning becomes increasingly democratized.

In the end, the cloud is no longer just a place where AI experts go to pioneer the future or scarce deep learning engineers build exotic solutions. It is increasingly the place AI gets done and increasingly those AI solutions are going to be built by ordinary users harnessing tools like AutoML and BigQuery ML.

originally posted on Forbes.com by Kalev Leetaru