Introduction
In the world of Python, various tools and libraries are being created to make your life as a developer much, much easier. Two such tools that often come into play when dealing with data validation and class structures are Pydantic and Python Dataclasses. Both serve similar purposes but have distinct features and use cases. Feel free to pick your poison.
Seems similar?
Pydantic is quite similar to dataclasses mostly due to the determination of the data type processed. In both cases we can define the type to be processed:
from dataclasses import dataclass
from pydantic import BaseModel
@dataclass
class Foo:
name: str
fur_color: str
class PydanticFoo(BaseModel):
name: str
fur_color: str
It looks the same, doesn't it?
The key inside of it is that @dataclass decorator does not strongly bind the type to the variable. So the value type is basically a hint, not validated strongly like in Pydantic case. So basically declaration of variable type in Pydantic equals the type coercion.
In this example, Pydantic shines by automatically validating the input type data, whereas dataclasses require manual validation.
Example of manual validation approach in dataclasses can be shown below:
@dataclass
class Foo:
name: str
surname: str
city: str
country: str
def __post_init__(self):
if not AcceptedCountries(self.country):
raise ValueError("Must be a valid country to proceed")
As you can see.. Python Dataclasses uses dunder method __post_init__
to enforce validation. Therefore we can enforce type coercion for it, guard our classes from incorrect object arguments, or simply maintain the range of suspected inputs, to receive desired outcome - which in case of dataclasses - will be the object creation.
To summarize the difference about type coercion. Pydantic does it out-of-the-box, Dataclasses - require developer insight and input explicitly.
Validators
Ahh... almighty validators, the most common thing for backend developer to ever encounter inside the code. In these archetypes of code, we can also expect that on these two possibilities, the approach, and outcome will be completely different.
Pydantic offers approach with usage of decorators syntax with keyword - @decorator
from typing import Optional
from pydantic import BaseModel, validator, Field
class Foo(BaseModel):
name: str
surname: str
country: str
postcode: str
@validator("postcode")
def postcode_is_valid(cls, value):
if not PLPostcode(value):
raise ValueError("Must be a valid PL Postcode.")
return value
So as we see above, we have a decorator named postcode
within which we are providing the class function to validate class instance. If the value provided can't match the pattern required for instantiation - we receive ValueError
. Simple isn't it?
Also we can have additional validation logic on types declaration:
from pydantic import BaseModel
class Foo(BaseModel):
positive: int = Field(gt=0)
non_negative: int = Field(ge=0)
negative: int = Field(lt=0)
non_positive: int = Field(le=0)
even: int = Field(multiple_of=2)
Which can be pretty much described in the list below:
gt
- greater thanlt
- less thange
- greater than or equal tole
- less than or equal tomultiple_of
- a multiple of the given number
To show the comparison, I'll present the approach that Python dataclasses use to achieve the same result:
@dataclass
class Foo:
positive: int
non_negative: int
negative: int
non_positive: int
even: int
def __post_init__(self):
validate_positive()
validate_non_negative()
validate_negative()
validate_non_positive()
# rest of the code can be nested underneath
def validate_positive(self):
if self.positive < 0:
raise ValueError("Value of positive is lesser than 0")
def validate_non_negative(self):
if self.non_negative <= 0:
raise ValueError("Value of non negative is lesser than 0")
def validate_negative(self):
if self.negative > 0:
raise ValueError("Value of negative is greater than 0")
def validate_non_positive(self):
if self.positive >= 0:
raise ValueError("Value of non positive is greater than 0")
Now you can see, how much Pydantic reduces the boilerplate code. Some hardheads can say what about performance?
.
I will assure you, that Pydantic logic does not need a lot of memory to do validation purposes.
But it will be ideal to merge two bridges together, the simplicity of Dataclasses, and robustness and near-perfect validation and type coercion? Is it actually possible?? And the answer sounds - YES!
Pydantic dataclasses
Up from Python 3.7, Pydantic serves us - developers yet another flavour of validation. Pydantic.dataclasses
from pydantic.dataclasses import dataclass
@dataclass
class Foo:
id: int
name: str = 'John Doe'
signup_ts: datetime = None
This approach gives us a mix of two important factors here. The ease of use of @dataclass
decorator and automatic type coercion - for its fields.
But it shows up its biggest edge when paired with initialization hooks, instead of using the default dunder method __post_init__
- check this out!
from typing import Any, Dict
from pydantic import model_validator
from pydantic.dataclasses import dataclass
@dataclass
class Foo:
a: int
b: int
c: int
@dataclass
class Baz:
d: Foo
@model_validator(mode='before')
def pre_root(cls, values: Dict[str, Any]) -> Dict[str, Any]:
print(f'First: {values}')
return values
@model_validator(mode='after')
def post_root(self) -> 'Baz':
print(f'Third: {self}')
def __post_init__(self):
print(f'Second: {self.d}')
As you can see.. the __post_init__
in pydantic.dataclasses
is executed between all these validators. The sequence is listed below.
model_validator(mode='before')
field_validator(mode='before')
field_validator(mode='after')
- Inner validators. e.g. validation for types like
int
,str
, ... __post_init__
.model_validator(mode='after')
So basically you can validate, model, perform sanitization of fields, do some setups for @properties
. Volia! The more needs, the more possibilities of outcomes coming from that.
Summary
So basically, we're at the end of outcomes and creative approaches in validation patterns in Python. There is no wrong way on both approaches, but trying to be concise i will finalize the thoughts in two points:
-
Pydantic is a mighty workhorse, offering robust validation, data sanitization, type coercion nearly implicitly. The drawbacks here are a little bit increased memory consumption, learning curve, which could be overwhelming for fresh Pythonistas.
-
Python Dataclass is a neat, quick, and affordable approach for model build patterns, validation, and maintaining the source of truth as it should. The drawbacks here are the necessity of tweaking its structures to your needs, creating your own validators, or validation methods.
Happy Pythoning fellas!