In part I of this blog series we mentioned best practices and patterns for efficiently deploying a machine learning model for inference with Google Cloud Dataflow. Amongst other techniques, it confirmed efficient batching of the inputs and the use of shared.py to make efficient use of a model.
In this post, we walk through the use of the RunInference API from tfx-bsl, a utility transform from TensorFlow Extended (TFX), which abstracts us away from manually implementing the patterns described in part I. You can use RunInference to simplify your pipelines and reduce technical debt when building manufacturing inference pipelines in batch or stream mode.
The following 4 patterns are covered:
- Using RunInference to make ML prediction calls.
- Post-processing RunInference results. Making predictions is often the first part of a multistep flow, in the business process. Here we will process the results into a form that can be used downstream.
- Attaching a key. Along with the data that is passed to the model, there is often a need for an identifier — for example, an IOT device ID or a customer identifier — that is used later in the process even if it’s not used by the model itself. We show how this can be accomplished.
- Inference with multiple models in the same pipeline. Often you may need to run multiple models inside the same pipeline, be it in parallel or as a sequence of predict – process – predict calls. We walk through a simple example.
Creating a simple model
In order to illustrate these patterns, we’ll use a simple toy model that will let us concentrate on the data engineering desired for the enter and output of the pipeline. This model will be skilled to approximate multiplication by the number 5.
Please note the following code snippets can be run as cells inside a notebook environment.
Step 1 – Set up libraries and imports
%pip install tfx_bsl==0.29.0 --quiet