I’ve had a few attempts at getting TensorFlow estimators into a serving host and a client I can use to query them. Finally getting it working, I thought I’d write up the steps for reproduction.

Assumptions and Prerequisites

The first assumption is that you have already trained your estimator (say the tf.estimator.DNNRegressor) and this is now in the variable estimator. You also have a list of feature columns as is standard in a variable feature_columns. The example here is basically the same as my trained estimator. You will also have the tensorflow serving source code locally. If not, you can grab it and all the submodules by running git clone --recursive https://github.com/tensorflow/serving.git --depth=1. n.b. you will need the path to this later (mine is at ~/developer/serving). Finally, you will also need to have the gRPC tools installed and can be installed by running pip3 install grpcio-tools

Exporting the estimator

First step is to export your trained model with the graph, variables and input variable spec for serving. Firstly the code and then an explanation afterwards:

feature_spec = tf.feature_column.make_parse_example_spec(feature_columns)
export_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(feature_spec)
estimator.export_savedmodel('exports', export_input_fn)

The feature_spec tells TensorFlow how to deserialize an incoming TFRecord for the request. Using the feature_columns here we can generate this to reflect our feature columns we setup for training our estimator. Next the export_input_fn defines the entrypoint for an incoming request to your model and is mainly about the deserializing of inputs and adding tensors to your graph to be executed. Finally we export this to an ’exports’ folder locally.

Hosting the TF Serving Server

I setup a Dockerfile to copy the export outputs and serve the model. I also found that the recommended Dockerfile to have a lot of dependencies I didn’t end up requiring. My final Dockerfile was:

FROM ubuntu:16.04

RUN apt-get update && apt-get install -y \
        curl

RUN echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list
RUN curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -

RUN apt-get update && apt-get install -y \
        tensorflow-model-server \
        && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

EXPOSE 8500

ADD exports /model

ENTRYPOINT [ "tensorflow_model_server", "--model_base_path=/model" ]

It uses the tensorflow-model-server apt package rather than building from source and copies in the export outputs to have a self-contained host for our currently trained model. We can then build and run our docker host by:

docker build -t <your-image-name> .
docker run --rm -d -p 8500:8500 <your-image-name>

Creating a python client

Our python client needs to query the server via gRPC and we will need a few proto files from the serving repository to generate some code for us. Using the grpc tools we mentioned in prequisites, run this command to generate the client libraries from the tf serving source (n.b. current directory is alongisde my python estimator source):

SERVING_SRC_ROOT="path to cloned tf serving"
python3 -m grpc_tools.protoc -I"${SERVING_SRC_ROOT}" -I"${SERVING_SRC_ROOT}/tensorflow" --python_out=. --grpc_python_out=. "${SERVING_SRC_ROOT}/tensorflow_serving/apis/*.proto"

This will generate the required client code into tensorflow_serving alongside your estimator code. Next is the python client code:

from grpc.beta import implementations
from tensorflow_serving.apis import predict_pb2, prediction_service_pb2

def _int_feature(value):
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

channel = implementations.insecure_channel('localhost', int(8500))
stub = prediction_service_pb2.beta_create_PredictionService_stub(channel)

Above we create the prediction service client stub for connecting to localhost:8500 as we setup in our docker file above. We also define a helper function to serialize our values to a TFRecord feature used to pass to our host. And the request (my DNNRegressor was from a movie recommendation tutorial so we take two integers ‘userId’ and ‘movieId’ and produce a predicted rating):

def make_request(userId:int, movieId:int):
    request = predict_pb2.PredictRequest()
    request.model_spec.name = 'default'
    request.model_spec.signature_name = 'serving_default'

    feature_dict = {
        'userId': _int_feature(userId),
        'movieId': _int_feature(movieId)
    }

    example = tf.train.Example(features=tf.train.Features(feature=feature_dict))
    serialized = example.SerializeToString()

    request.inputs['inputs'].CopyFrom(
        tf.contrib.util.make_tensor_proto(serialized, shape=[1]))

    result_future = stub.Predict.future(request, 5.0)
    prediction = result_future.result()

    predicted_rating = prediction.outputs["outputs"].float_val[0]
    actual_rating = ratings[(ratings['userId'] == userId) & (ratings['movieId'] == movieId)]['rating'][0]
    print(f'Predicted value: {predicted_rating} vs actual {actual_rating}')

make_request(0, 0)

Firstly we setup the request and define the model name and signature for the request. You can find your model name from the docker logs and mine looked like “Approving load for servable version {name: default version: 1509419744}” and hence used the name ‘default’. You can find all the information about the signature_name using the saved_model_cli. Next we construct and serialize our feature to be passed to the host and make the request. Finally we just compare it with our training data from a pandas dataframe.

Summary

TensorFlow serving still has a way to go until it’s a 1, 2, 3 step setup but once completed, this could be automated in a CI system. Also my client was responding in 6.3ms running on my MacBook Pro, so performance is not an issue for my circumstances. Next step will be to get this running with a custom estimator by setting the export_outputs appropriately.


If you want to see the full code I used, I have it at https://github.com/damienpontifex/fastai-course/tree/master/deeplearning1/lesson4 when reimplementing fast.ai lesson4