Demo Notebook to trace Sentence Transformers model

Download notebook

This notebook provides a walkthrough guidance for users to trace models from Sentence Transformers in torchScript and onnx format. After tracing the model, customers can register the model to opensearch and generate embeddings.

Remember, tracing model in torchScript or Onnx format at just two different options. We don’t need to trace model in both ways. Here in our notebook we just want to show both ways.

Step 0: Import packages and set up client

Step 1: Save model in torchScript format

Step 2: Register the saved torchScript model in Opensearch

[The following steps are optional, just showing registering model in both ways and comparing the both embedding output]

Step 3: Save model in Onnx format

Step 4: Register the saved Onnx model in Opensearch

Step 5: Generate Sentence Embedding with registered models

Step 0: Import packages and set up client

Install required packages for opensearch_py_ml.sentence_transformer_model Install opensearchpy and opensearch-py-ml through pypi

[1]:
#!pip install opensearch-py opensearch-py-ml

# import os
# import sys
# sys.path.append(os.path.abspath(os.path.join('../../..')))
[2]:
import warnings
warnings.filterwarnings('ignore', category=DeprecationWarning)
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings("ignore", message="Unverified HTTPS request")
warnings.filterwarnings("ignore", message="TracerWarning: torch.tensor")
warnings.filterwarnings("ignore", message="using SSL with verify_certs=False is insecure.")

import opensearch_py_ml as oml
from opensearchpy import OpenSearch
from opensearch_py_ml.ml_models import SentenceTransformerModel
# import mlcommon to later register the model to OpenSearch Cluster
from opensearch_py_ml.ml_commons import MLCommonClient
/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
[3]:
CLUSTER_URL = 'https://localhost:9200'
[4]:
def get_os_client(cluster_url = CLUSTER_URL,
                  username='admin',
                  password='< admin password >'):
    '''
    Get OpenSearch client
    :param cluster_url: cluster URL like https://ml-te-netwo-1s12ba42br23v-ff1736fa7db98ff2.elb.us-west-2.amazonaws.com:443
    :return: OpenSearch client
    '''
    client = OpenSearch(
        hosts=[cluster_url],
        http_auth=(username, password),
        verify_certs=False
    )
    return client
[5]:
client = get_os_client()

# Connect to ml_common client with OpenSearch client
ml_client = MLCommonClient(client)
/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/opensearchpy/connection/http_urllib3.py:199: UserWarning: Connecting to https://localhost:9200 using SSL with verify_certs=False is insecure.
  warnings.warn(

Step 1: Save model in torchScript format

Opensearch-py-ml plugin provides method save_as_pt which will trace a model in torchScript format and save the model in a zip file in your filesystem.

Detailed documentation: https://opensearch-project.github.io/opensearch-py-ml/reference/api/sentence_transformer.save_as_pt.html#opensearch_py_ml.ml_models.SentenceTransformerModel.save_as_pt

Users need to provide a model id from sentence transformers (an example: sentence-transformers/msmarco-distilbert-base-tas-b). This model id is a huggingface model id. Example: https://huggingface.co/sentence-transformers/msmarco-distilbert-base-tas-b

save_as_pt will download the model in filesystem and then trace the model with the given input strings.

To get more direction about dummy input string please check this url: https://huggingface.co/docs/transformers/torchscript#dummy-inputs-and-standard-lengths

after tracing the model (a .pt file will be generated), save_as_pt method zips tokenizers.json and torchScript (.pt) file and saves in the file system.

User can register that model to opensearch to generate embedding.

[6]:
model_id = "sentence-transformers/msmarco-distilbert-base-tas-b"
folder_path = "sentence-transformer-torchscript/msmarco-distilbert-base-tas-b"
[7]:
pre_trained_model = SentenceTransformerModel(model_id=model_id, folder_path=folder_path, overwrite=True)
model_path = pre_trained_model.save_as_pt(model_id=model_id, sentences=["for example providing a small sentence", "we can add multiple sentences"])
/home/linuxbrew/.linuxbrew/opt/python@3.8/lib/python3.8/site-packages/transformers/models/distilbert/modeling_distilbert.py:223: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  mask, torch.tensor(torch.finfo(scores.dtype).min)
model file is saved to  sentence-transformer-torchscript/msmarco-distilbert-base-tas-b/msmarco-distilbert-base-tas-b.pt
zip file is saved to  sentence-transformer-torchscript/msmarco-distilbert-base-tas-b/msmarco-distilbert-base-tas-b.zip

Step 2: Register the saved torchScript model in Opensearch

In the last step we saved a sentence transformer model in torchScript format. Now we will register that model in opensearch cluster. To do that we can take help of register_model method in opensearch-py-ml plugin.

To register model, we need the zip file we just saved in the last step and a model config file. You can use make_model_config_json method to automatically generate the model config file and save it at ml-commons_model_config.json in model folder, or you can create a json file by yourself.

Example of Model config file content can be:

{ “name”: “sentence-transformers/msmarco-distilbert-base-tas-b”, “version”: “1.0.0”, “description”: “This is a port of the DistilBert TAS-B Model to sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and is optimized for the task of semantic search.”, “model_format”: “TORCH_SCRIPT”, “model_config”: { “model_type”: “distilbert”, “embedding_dimension”: 768, “framework_type”: “sentence_transformers” } }

In either approach, you have to set model_format to be TORCH_SCRIPT so that internal system will look for the corresponding .pt file from the zip folder.

Please refer to this doc: https://github.com/opensearch-project/ml-commons/blob/2.x/docs/model_serving_framework/text_embedding_model_examples.md

Documentation for the method: https://opensearch-project.github.io/opensearch-py-ml/reference/api/ml_commons_register_api.html#opensearch_py_ml.ml_commons.MLCommonClient.register_model

Related demo notebook about ml-commons plugin integration: https://opensearch-project.github.io/opensearch-py-ml/examples/demo_ml_commons_integration.html

[8]:
model_config_path_torch = pre_trained_model.make_model_config_json(model_format='TORCH_SCRIPT')
ml-commons_model_config.json file is saved at :  sentence-transformer-torchscript/msmarco-distilbert-base-tas-b/ml-commons_model_config.json
[9]:
ml_client.register_model(model_path, model_config_path_torch, isVerbose=True)
Total number of chunks 27
Sha1 value of the model file:  b397ae99ef3c27ba2ea080428ba695ba732da90a9367e77383b55ec0b191903e
Model meta data was created successfully. Model Id:  4djw4okB2Ly7dmqcT7Xp
uploading chunk 1 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 2 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 3 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 4 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 5 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 6 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 7 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 8 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 9 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 10 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 11 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 12 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 13 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 14 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 15 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 16 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 17 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 18 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 19 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 20 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 21 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 22 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 23 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 24 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 25 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 26 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 27 of 27
Model id: {'status': 'Uploaded'}
Model registered successfully
Task ID: 4tjw4okB2Ly7dmqcerVn
Model deployed successfully
[9]:
'4djw4okB2Ly7dmqcT7Xp'

Step 3: Save model in Onnx format

Opensearch-py-ml plugin provides method save_as_onnx which will trace a model in ONNX format and save the model in a zip file in your filesystem.

Detailed documentation: https://opensearch-project.github.io/opensearch-py-ml/reference/api/sentence_transformer.save_as_onnx.html#opensearch_py_ml.ml_models.SentenceTransformerModel.save_as_onnx

Users need to provide a model id from sentence transformers (an example: sentence-transformers/msmarco-distilbert-base-tas-b). save_as_onnx will download the model in filesystem and then trace the model.

after tracing the model (a .onnx file will be generated), save_as_onnx method zips tokenizers.json and torchScript (.onnx) file and saves in the file system.

User can register that model to opensearch to generate embedding.

[11]:
model_id = "sentence-transformers/msmarco-distilbert-base-tas-b"
folder_path = "sentence-transformer-onnx/msmarco-distilbert-base-tas-b"
[12]:
pre_trained_model = SentenceTransformerModel(model_id=model_id, folder_path=folder_path, overwrite=True)
model_path_onnx = pre_trained_model.save_as_onnx(model_id=model_id)
ONNX opset version set to: 15
Loading pipeline (model: sentence-transformers/msmarco-distilbert-base-tas-b, tokenizer: sentence-transformers/msmarco-distilbert-base-tas-b)
Creating folder sentence-transformer-onnx/msmarco-distilbert-base-tas-b/onnx
Using framework PyTorch: 1.13.1+cu117
Found input input_ids with shape: {0: 'batch', 1: 'sequence'}
Found input attention_mask with shape: {0: 'batch', 1: 'sequence'}
Found output output_0 with shape: {0: 'batch', 1: 'sequence'}
Ensuring inputs are in correct order
head_mask is not present in the generated input list.
Generated inputs order: ['input_ids', 'attention_mask']
zip file is saved to  sentence-transformer-onnx/msmarco-distilbert-base-tas-b/msmarco-distilbert-base-tas-b.zip

Step 4: Register the saved Onnx model in Opensearch

In the last step we saved a sentence transformer model in ONNX format. Now we will register that model in opensearch cluster. To do that we can take help of register_model method in opensearch-py-ml plugin.

To register model, we need the zip file we just saved in the last step and a model config file. You can use make_model_config_json method to automatically generate the model config file and save it at ml-commons_model_config.json in model folder, or you can create a json file by yourself.

{ “name”: “sentence-transformers/msmarco-distilbert-base-tas-b”, “version”: “1.0.0”, “description”: “This is a port of the DistilBert TAS-B Model to sentence-transformers model: It maps sentences & paragraphs to a 768 dimensional dense vector space and is optimized for the task of semantic search.”, “model_format”: “ONNX”, “model_config”: { “model_type”: “distilbert”, “embedding_dimension”: 768, “framework_type”: “sentence_transformers”, “pooling_mode”:”cls”, “normalize_result”:”false” } }

In either approach, you have to set model_format to be ONNX so that internal system will look for the corresponding .onnx file from the zip folder.

Please refer to this doc: https://github.com/opensearch-project/ml-commons/blob/2.x/docs/model_serving_framework/text_embedding_model_examples.md

Documentation for the method: https://opensearch-project.github.io/opensearch-py-ml/reference/api/ml_commons_register_api.html#opensearch_py_ml.ml_commons.MLCommonClient.register_model

Related demo notebook about ml-commons plugin integration: https://opensearch-project.github.io/opensearch-py-ml/examples/demo_ml_commons_integration.html

[15]:
model_config_path_onnx = pre_trained_model.make_model_config_json(model_format='ONNX')
ml-commons_model_config.json file is saved at :  sentence-transformer-onnx/msmarco-distilbert-base-tas-b/ml-commons_model_config.json
[16]:
ml_client.register_model(model_path_onnx, model_config_path_onnx, isVerbose=True)
Total number of chunks 27
Sha1 value of the model file:  81c950d07eaa21705dd94cec0f127efec42844cd1995502452764777460517d4
Model meta data was created successfully. Model Id:  49jz4okB2Ly7dmqcNrWD
uploading chunk 1 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 2 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 3 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 4 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 5 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 6 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 7 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 8 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 9 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 10 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 11 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 12 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 13 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 14 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 15 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 16 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 17 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 18 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 19 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 20 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 21 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 22 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 23 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 24 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 25 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 26 of 27
Model id: {'status': 'Uploaded'}
uploading chunk 27 of 27
Model id: {'status': 'Uploaded'}
Model registered successfully
Task ID: 5Njz4okB2Ly7dmqcW7XA
Model deployed successfully
[16]:
'49jz4okB2Ly7dmqcNrWD'

Step 5: Generate Sentence Embedding

Now after loading these models in memory, we can generate embedding for sentences. We can provide a list of sentences to get a list of embedding for the sentences.

[17]:
# Now using this model we can generate sentence embedding.

import numpy as np

input_sentences = ["first sentence", "second sentence"]

# Generated embedding from torchScript

embedding_output_torch = ml_client.generate_embedding("4djw4okB2Ly7dmqcT7Xp", input_sentences)

#just taking embedding for the first sentence
data_torch = embedding_output_torch["inference_results"][0]["output"][0]["data"]

# Generated embedding from onnx

embedding_output_onnx = ml_client.generate_embedding("49jz4okB2Ly7dmqcNrWD", input_sentences)

# Just taking embedding for the first sentence
data_onnx = embedding_output_onnx["inference_results"][0]["output"][0]["data"]

# Now we can check if there's any significant difference between two outputs

print(np.testing.assert_allclose(data_torch, data_onnx, rtol=1e-03, atol=1e-05))
None