EVL Data Quality Microservice

The EVL Data Quality microservice enables quick, easy-to-use, and cost-effective validation of datasets. Useful in situations where complex automated testing tools may be too heavy and expensive. Good candidates for the EVL Data Quality tool are ETL projects, migrations, or quick quality checks of production data.

Configuration via Excel or CSV files with pre-configuration option based on metadata
Automatic data type and null values validation
Setting of other validation criteria for e.g. number intervals, string lengths
Possibility to add complex validation functions for entity relations
Separating “wrong” data and logging of rejection reasons
Setting conditions for breaking the job flow based on percentages, number of rows …
Fast implementation and rapid change management
Low implementation and operating costs

EVL Microservices are built on top of the core EVL software and retain its flexibility, robustness, high productivity, and ability to read data from various sources; including csv files, databases–Oracle, Teradata, SQL Server, etc–and Hadoop streaming data like Kafka.

EVL Validation white paper. Function guide and examples.

Download

EVL Data Quality Functions

Data	Function	Description
String	String length	Min/max string length
Any	Null value check	Check nullability of a field
String	Code Page	Identifying characters with wrong code page
Date	Date Interval	Setting Min and Max date interval
Date	Date Format	Identifying non-standard date and time format
Number	Number Interval	Setting Min and Max interval for integers, floats and decimal
Specific	Entity relation check	Relations checking between 2 or more attributes
Specific	Validation function	Calling validation functions for complex conditions

EVL Data Quality project

An EVL Data Quality project consists of following steps:

unzipping EVL distribution and defining a few variables and paths
filling-in an Excel or CSV file defining source type (e.g. csv, Oracle, …), table or file name and field names and validations functions to be applied
automatic generation of EVL jobs for each entity
running EVL jobs in a batch or individually
displaying rejected files containing wrong records and logs

Example

Following example shows an implementation of a simple validation

Set variables:

# Source and target data directories DATA_SOURCE_DIR="/some/path/source" DATA_ANON_DIR="/some/path/validation"

Validation definition for files TEST1 and TEST2:

Src	Entity	Attribute	Datatype	Null	Min	Max	Validation condition	Description
csv	TEST1	ID	int	No	0	5000		Setting number interval
csv	TEST1	ACC	int	No			ID > 500 && ACC > 3000	Attribute relation
csv	TEST1	RC	string				check_rc(RC)	Calling custom function
csv	TEST1	Sex	string		1	1	substr(RC,2,2)>50 && Sex==”F”	Test of sex validity based on RC
ORCL	TEST2	ID	string	No				String length check
ORCL	TEST2	Postcode	int		5	5		Postcode must be 5 digits
ORCL	TEST2	Text	string				Codepage: ISO-8859-2	Checking characters codepage

Run:

# generating evl jobs from the config file evl run/generate_jobs.evl # running the validation job for an entity “TEST1” evl run/validate.test1.evl # running the validation job for an entity “TEST2” evl run/validate.test2.evl # Example of custom function rc_check(): # stol(replace(RC,’/’,’’)) % 11 == 0