API Reference
Validator
Validator class for validating DataFrames against a defined schema.
Source code in src/dataguard/validator/validator.py
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 | |
config_from_mapping
classmethod
config_from_mapping(
config: Mapping[str, str | Sequence | Mapping],
collect_exceptions: bool = True,
logger: logging.Logger = logger,
) -> Validator
Creates a Validator instance from a configuration mapping.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Mapping[str, str | Sequence | Mapping]
|
Configuration mapping for the DataFrame schema. |
required |
|
bool
|
Whether to collect exceptions during the schema creation. Defaults to True. |
True
|
|
logging.Logger
|
Logger instance for logging. Defaults to the module logger. |
logger
|
Examples:
The command is either a user-defined function or a string that maps to a function that will be used to validate the DataFrame.
The following commands are available:
'is_equal_to',
'is_equal_to_or_both_missing',
'is_greater_than_or_equal_to',
'is_greater_than',
'is_less_than_or_equal_to',
'is_less_than',
'is_not_equal_to',
'is_not_equal_to_and_not_both_missing',
'is_unique',
'is_duplicated',
'is_in',
'is_null',
'is_not_null'
>>> config_input = {
"name": "example_schema",
"columns": [
{
"id": "column1",
"data_type": "integer",
"nullable": False,
"unique": True,
"required": True,
"checks": [
{
"command": "is_equal_to",
"subject": ["column2"]
}
]
},
"ids": ["column1"],
"metadata": {"description": "Example DataFrame schema"},
"checks": [
{
'name': 'example_check',
'error_level': 'warning',
'error_msg': 'This is an example check',
'command': 'is_in',
'subject': ['column1'],
'arg_values': [1, 2]
}
]
}
Returns:
| Name | Type | Description |
|---|---|---|
Validator |
Validator
|
An instance of the Validator class with the schema created from the provided configuration mapping. |
Source code in src/dataguard/validator/validator.py
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | |
validate
validate(
dataframe: Mapping[str, list] | pl.DataFrame,
lazy_validation: bool = True,
collect_exceptions: bool = True,
logger: logging.Logger = logger,
) -> None
Validates a DataFrame against the defined schema.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Mapping[str, list] | pl.DataFrame
|
The input data as a mapping or a Polars DataFrame. |
required |
|
bool
|
Whether to perform lazy validation. Defaults to True. |
True
|
|
bool
|
Whether to collect exceptions during validation. Defaults to True. |
True
|
|
logging.Logger
|
Logger instance for logging. Defaults to the module logger. |
logger
|
Raises:
| Type | Description |
|---|---|
Exception
|
If an error occurs during validation and collect_exceptions is False. |
Source code in src/dataguard/validator/validator.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 | |
Enums used in the validation library. These enums define various constants used throughout the library for error levels, validation types, and check cases.
config_input must use the following naming conventions.
CheckCases
ErrorLevel
ValidationType
Bases: Enum
Enum representing different validation types for DataFrame columns.
Source code in src/dataguard/core/utils/enums.py
ErrorCollector
cached
ErrorCollector class for collecting errors during validation.
Source code in src/dataguard/error_report/error_collector.py
add_error_report
Adds an error report to the collector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
ErrorReportSchema
|
The error report to add. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/dataguard/error_report/error_collector.py
add_unknown_exception
Adds an unknown exception to the collector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
ExceptionSchema
|
The exception to add. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Source code in src/dataguard/error_report/error_collector.py
clear_errors
get_errors
Returns the collected errors and exceptions.
Returns:
| Name | Type | Description |
|---|---|---|
ErrorCollectorSchema |
ErrorCollectorSchema
|
A schema containing the collected errors and exceptions. |
Source code in src/dataguard/error_report/error_collector.py
BasicExceptionSchema
Bases: BaseModel
Basic schema for exceptions.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
str
|
Type of the error. |
message |
str
|
Message describing the error. |
Source code in src/dataguard/error_report/error_schemas.py
DFErrorSchema
Bases: ErrorSchema
Schema for errors that occur during DataFrame validation.
Attributes:
| Name | Type | Description |
|---|---|---|
column_names |
list[str] | str
|
Names of the columns where the error occurred. |
row_ids |
list[int]
|
IDs of the rows where the error occurred. |
idx_columns |
list[str]
|
Index columns used for identifying errors. |
level |
str
|
Level of the error, e.g., 'error', 'warning'. |
message |
str
|
Message describing the error. |
title |
str
|
Title of the error. |
Source code in src/dataguard/error_report/error_schemas.py
ErrorCollectorSchema
Bases: BaseModel
Schema for collecting errors and exceptions during validation.
Attributes:
| Name | Type | Description |
|---|---|---|
error_reports |
list[ErrorReportSchema]
|
List of error reports. |
exceptions |
list[ExceptionSchema]
|
List of exceptions that occurred during validation. |
Source code in src/dataguard/error_report/error_schemas.py
ErrorReportSchema
Bases: BaseModel
Schema for error reports generated during validation.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Name of the error report. |
errors |
list[ErrorSchema]
|
List of errors found in the DataFrame. |
total_errors |
int
|
Total number of errors in the report. |
id |
int
|
Unique identifier for the error report. |
Source code in src/dataguard/error_report/error_schemas.py
ErrorSchema
Bases: BasicExceptionSchema
Schema for errors that occur during validation.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
str
|
Type of the error. |
message |
str
|
Message describing the error. |
title |
str
|
Title of the error. |
traceback |
str
|
Traceback of the error. |
Source code in src/dataguard/error_report/error_schemas.py
ExceptionSchema
Bases: BasicExceptionSchema
Schema for unknown exceptions that occur during validation.
Attributes:
| Name | Type | Description |
|---|---|---|
type |
str
|
Type of the error. |
message |
str
|
Message describing the error. |
level |
ErrorLevel
|
Level of the error. |
traceback |
str
|
Traceback of the error. |