Hi,

In my previous job, I spent hours debugging internal data transformations to figure out the received data from an external API was faulty. This issue would not appeared with schema validation. My fault was that I trusted the incoming data and didn’t check for data consistency. Learning from mistakes and saving time, I would set up a small example for JSON validation via Pydantic. FastAPI relies heavily on pydantic and I use it for validating the incoming request and outgoing response. Anyway, not in every project FastAPI is used.

I would simulate a simple interference service based on my previous work. The service will receive an OCR as a simple request and will respond with a prediction. Anyway, the code can be accessed via colab.

Data Model

As a first step, I would define the data model of our endpoint. The PredictionRequest expects a requestId, an ocr_body, and a model configuration.

The model configuration uses default values based on the service configuration. In this example, I use for the model_config a dictionary. Still, there can be an issue where you want to change the config with your request. For example, when you want multiple predictions or change the probability threshold for your predictions.

The ocr_body is defined as a list of an OCR class. Also, we expect at least one entry in the ocr_body. The OCR itself has the structure of a value and the region of the value in the document.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
from typing import Optional, List

from pydantic import BaseModel, conlist


model_config = {"n_pred": 1,
                "threshold": 0.5}


class ModelConfig(BaseModel):
    n_pred: int = model_config["n_pred"]
    threshold: float = model_config["threshold"]


class Region(BaseModel):
    left: float
    top: float
    height: float
    width: float
    page: int


class OCR(BaseModel):
    value: str
    region: Region


class PredictionRequest(BaseModel):
    requestId: str
    ocr_body: conlist(OCR, min_length=1)
    configuration: ModelConfig = ModelConfig()

You can check the data model in the following way:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
ocr_body = [{"value": "test_ocr",
             "region": {"left": 0,
                        "top": 0,
                        "height": 0.0,
                        "width": 0.2,
                        "page": 1}
            },
           ]

request_body = {"requestId": "test", "ocr_body": ocr_body}

assert PredictionRequest(**request_body)

For the configuration field, the default values will be used.

Optional Fields

Sometimes, we want to have optional fields in the json. Let’s assume, that we don’t want to use default values for the configuration, but also don’t want to have the configuration explicitly in the schema. We want to have the configuration optional. There is a solution proppsed by mubtasimfuad. The code will define a decorator for the pydantic BaseModel.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from typing import Optional
import inspect

from pydantic import create_model

def optional(*fields):
    def dec(cls):
        fields_dict = {}
        for field in fields:
            field_info = cls.__annotations__.get(field)
            if field_info is not None:
                fields_dict[field] = (Optional[field_info], None)
        OptionalModel = create_model(cls.__name__, **fields_dict)
        OptionalModel.__module__ = cls.__module__

        return OptionalModel

    if fields and inspect.isclass(fields[0]) and issubclass(fields[0], BaseModel):
        cls = fields[0]
        fields = cls.__annotations__
        return dec(cls)

    return dec

, which can be used in the same way:

1
2
3
4
5
6
7
8
class Config(BaseModel):
    n_pred: int
    threshold: float

@optional("configuration")
class PredictionRequestv2(BaseModel):
    requestId: str
    configuration: Config

and allows following json schema without any default values:

1
2
3
request_body = {"requestId": "test"}

assert PredictionRequestv2(**request_body)