Hi,
In my previous job, I spent hours debugging internal data transformations to figure out the received data from an external API was faulty. This issue would not appeared with schema validation. My fault was that I trusted the incoming data and didn’t check for data consistency. Learning from mistakes and saving time, I would set up a small example for JSON validation via Pydantic. FastAPI relies heavily on pydantic and I use it for validating the incoming request and outgoing response. Anyway, not in every project FastAPI is used.
I would simulate a simple interference service based on my previous work. The service will receive an OCR as a simple request and will respond with a prediction. Anyway, the code can be accessed via colab.
Data Model
As a first step, I would define the data model of our endpoint. The PredictionRequest expects a requestId, an ocr_body, and a model configuration.
The model configuration uses default values based on the service configuration. In this example, I use for the model_config a dictionary. Still, there can be an issue where you want to change the config with your request. For example, when you want multiple predictions or change the probability threshold for your predictions.
The ocr_body is defined as a list of an OCR class. Also, we expect at least one entry in the ocr_body. The OCR itself has the structure of a value and the region of the value in the document.
|
|
You can check the data model in the following way:
|
|
For the configuration field, the default values will be used.
Optional Fields
Sometimes, we want to have optional fields in the json. Let’s assume, that we don’t want to use default values for the configuration, but also don’t want to have the configuration explicitly in the schema. We want to have the configuration optional. There is a solution proppsed by mubtasimfuad. The code will define a decorator for the pydantic BaseModel.
|
|
, which can be used in the same way:
|
|
and allows following json schema without any default values:
|
|