SQuARE Model Management API (0.3.0)
Download OpenAPI specification:Download
API reference for model management.
Response samples
- 200
[- {
- "identifier": "string",
- "model_type": "string",
- "model_name": "string",
- "disable_gpu": true,
- "batch_size": 0,
- "max_input": 0,
- "model_class": "string",
- "return_plaintext_arrays": true
}
]
Get-Model-Health
Check worker's health (worker : inference model container) Return: result[list]: the health of one certain worker
path Parameters
identifier required | string (Identifier) |
hf_username required | string (Hf Username) |
Responses
Response samples
- 200
- 422
[- {
- "identifier": "string",
- "is_alive": true
}
]
Get-Model-Health
Check worker's health (worker : inference model container) Return: result[list]: the health of one certain worker
path Parameters
identifier required | string (Identifier) |
query Parameters
hf_username | string (Hf Username) |
Responses
Response samples
- 200
- 422
[- {
- "identifier": "string",
- "is_alive": true
}
]
Deploy-Model
deploy a new model to the platform
Request Body schema: application/json
identifier | string (Identifier) Default: "" the name given by the user through which the model can be accessed after deployment |
model_name | string (Model Name) Default: "" the name of model on HF, AdapterHub or sentence-transformers platform |
model_type | string (Model Type) Default: "" transformer, adapter, onnx, or sentence-transformer |
disable_gpu | boolean (Disable Gpu) Default: true whether to use gpu for inference |
batch_size | integer (Batch Size) Default: "" input batch size |
max_input | integer (Max Input) Default: "" max input length |
transformers_cache | string (Transformers Cache) Default: "../.cache" path to cache models |
onnx_use_quantized | boolean (Onnx Use Quantized) Default: false Flag that decides if quantized ONNX model should be used for inference |
is_encoder_decoder | boolean (Is Encoder Decoder) Default: false Flag that decides if ONNX model is encoder-decoder model |
hf_token | string (Hf Token) HuggingFace API token with write access to UKP-SQuARE repository for onnx model export |
adapter_id | string (Adapter Id) Adapter id, required if the model to deploy is an adapter model |
custom_onnx_config | string (Custom Onnx Config) Custom input mappings to use for onnx model export (if field None we try to infer OnnxConfig) |
model_class | string (Model Class) Default: "" See square_model_inference.inference.transformer.CLASS_MAPPING for valid names and corresponding class |
return_plaintext_arrays | boolean (Return Plaintext Arrays) Default: false whether to encode outputs |
preloaded_adapters | boolean (Preloaded Adapters) Default: true whether to preload adapters |
Responses
Request samples
- Payload
{- "identifier": "",
- "model_name": "",
- "model_type": "",
- "disable_gpu": true,
- "batch_size": "",
- "max_input": "",
- "transformers_cache": "../.cache",
- "onnx_use_quantized": false,
- "is_encoder_decoder": false,
- "hf_token": "string",
- "adapter_id": "string",
- "custom_onnx_config": "string",
- "model_class": "",
- "return_plaintext_arrays": false,
- "preloaded_adapters": true
}
Response samples
- 200
- 422
{- "message": "string",
- "task_id": "string"
}
Update Model
update the model parameters
path Parameters
identifier required | string (Identifier) |
hf_username required | string (Hf Username) |
Request Body schema: application/json
disable_gpu | boolean (Disable Gpu) |
batch_size | integer (Batch Size) |
max_input | integer (Max Input) |
return_plaintext_arrays | boolean (Return Plaintext Arrays) |
Responses
Request samples
- Payload
{- "disable_gpu": true,
- "batch_size": 0,
- "max_input": 0,
- "return_plaintext_arrays": true
}
Response samples
- 200
- 422
null
Update Model
update the model parameters
path Parameters
identifier required | string (Identifier) |
query Parameters
hf_username | string (Hf Username) |
Request Body schema: application/json
disable_gpu | boolean (Disable Gpu) |
batch_size | integer (Batch Size) |
max_input | integer (Max Input) |
return_plaintext_arrays | boolean (Return Plaintext Arrays) |
Responses
Request samples
- Payload
{- "disable_gpu": true,
- "batch_size": 0,
- "max_input": 0,
- "return_plaintext_arrays": true
}
Response samples
- 200
- 422
null