First official version released! After more than a year of design and development and after first industry implementation in T-Mobile CZ.
New features:
- ‘Lookup tables’ – lookup loaded into memory and used in mappings.
- ‘Checksum functions’ – standard checksum function for strings: ‘md5’, ‘sha224’, ‘sha256’, ‘sha384’, ‘sha512’.
- HDFS support
- Spark code generation – Parquet and Impala integration
- Job Manager
New components:
- ‘Aggreg’ – do aggregation for groups of records.
- ‘Cat’ – concatenate several input flows into single output one.
- ‘Comp’ – use custom component, which is actually another job.
- ‘Cut’ – omit fields from input by the output data definition.
- ‘Filter’ – for simple one- or two-way switch. For more complex use ‘Map’.
- ‘Join’ – join two input flows by the key. Catch left/right or even unmatched records.
- ‘Map’ – transform input fields and write into output fields.
- ‘Read’ – read file(s) into output flow, uncompress if needed.
- ‘Sort’ – sort, deduplicate, check sort; simply the output is always sorted by the key.
- ‘Tee’ – replicate one input flow to several output ones.
- ‘Trash’ – like /dev/null.
- ‘Write’ – write the flow into file, compress if needed.
New commands:
- ‘Mkdir’
- ‘Mv’