Creating your own Bert embedding service with TorchServe

4 min readDec 12, 2020

Let’s picture this, you have found an amazing dataset, manage to clean it, process it and finally used machine learning to develop a model able to classify something with new data. You have achieved good results and you are ready to put that model into production, so you start researching and you found out about TensorFlow Serving, but oops your model was trained with PyTorch and it wont work this way.

Fortunately, TorchServe exists and we can put that model in production and thus be able to use it in our brand new app, but the lack of documentation might turn the simple process into something quite difficult.
This was something that happened to me with my Bert model and so I will use this Post to explain the steps to use it and to create a word embedder service.

First Steps

To use TorchServe we need to archive our model and for that we will create a Handler that does all the required data transformations. An example file is displayed here in but due to my specifications, I needed to do some changes to the file.

I will start by breaking down what we added. The initialize function will check system specification and load all files used for loading the model. The preprocess() is responsible for the data processing step of our pipeline where in this case is tokenization. In inference() we will apply our pre-processed data to the model and perform mean pooling to the results. This step is necessary because the model outputs the Bert embeddings for individual words and not the sentence embeddings. Finally, post-process will check if the embeddings were obtained successfully and transform the response in a JSON object for simplified use.

After defining our handler is time to create the model archive with the command:

torch-model-archiver --model-name "bert" --serialized-file ./bert_model/pytorch_model.bin --extra-files "./bert_model/config.json, ./bert_model/vocab.txt" --handler "./BertHandler.py"

Where the serialized file is our .bin Pytorch model and extra files are additional files needed to run our model. In this case, is the config.json and vocabulary file for our tokenizer. After running the command bert.mar will appear in the directory which is then needed to be put in our model-store directory.
We can create a file config.properties which specify general definitions like specifying the URL address and the number of workers we want. I used the following configuration:

default_workers_per_model=3
default_response_timeout=300
unregister_model_timeout=300
inference_address=http://0.0.0.0:8443
management_address=http://0.0.0.0:8444
metrics_address=http://0.0.0.0:8445

Starting the service

We can then start our service locally using the following command:

torchserve --start --model-store model-store --models bert=bert.mar

Where model-store is the name of the folder where ou bert.mar archive is and models is the name and file of the model we will use. After running we will see several logs indicating that the endpoints to be used and that the model was loaded. TorchServe provides several endpoints which can be found here, but for now we will access the endpoint <URL>:8443/prediction/<model_name> for obtaining the embeddings.

After a few seconds, depending on the size of the model, the service will start and we can start making requests. In this case, I will use a simple script with a POST request to the specified endpoint. Due to my specifications, all text sent to the service will need to be in UTF-8 encoded but you can change that by modifying the pre-process function in the Handler.

Putting TorchServe in Docker

As we can see everything is working fine. But running locally is not the best solution so we will use docker to create a more accessible service.
To achieve this we need to create an image and run it in a container with. For the creation of a docker image we need to define a specific file (an example file is displayed here) with all the configurations. The used dockerfile is the following:

Note that the entrypoint file can be found here. The last steps are building the image and run it with:

docker build --tag torchserve:1.0 .docker run --publish 8443:8443 --publish 8444:8444 --publish 8445:8445 --detach --name bb torchserve:1.0

After that if we do a request we can see that it’s running successfully. A lot of small things are probably missing and in no ways, this is a must-follow guide but the lack of TorchServe documentation turned a simple task into something that took more time that was needed. All the code used is available here and I hope that this post helps to correct someone problems with TorchServe.

Creating your own Bert embedding service with TorchServe

First Steps

Starting the service

Putting TorchServe in Docker

Written by Luis Duarte