Enrich applications are compositions of
- Pipeline: Is a program that composes modules in a predefined way and covers all aspects including ingestion, computation, and rendering of the output.
- Transform: A module that manipulates dataframes. It could change or add dataframes as it sees fit.
- App: This is a web application module that extends the default platform to provide new experiences and implement data access policies and preferences. App in Enrich context maps to a Django app This could be stand alone or built on top of renderer.
- Task: A pipeline that mainly interacts with the system and does maintenance tasks such as Backup.
- Services: A simple application that is designed to run continuously.
- Asset: A reusable library built as a python package. Asset paths can be specified in the pipeline and they get loaded before execution.
- Commands: These are remote execution commands meant for maintenance and data interogation by end-users. These commands are logged, and the output is captured.
- Idempotentcy and execution isolation
The pipeline is designed to support idempotentcy assuming that the transforms themselves support idempotency. The pipeline provides a top-level namespace to work with. By and large the pipelines from multiple applications can be executed in parallel safely if no correctness issue at the transform level. The base class of the transform provides a few features such as data versioning that enables the isolation.
A comprehensive log and metadata is kept for every run of the pipeline. This enables rapid and correct debugging of correctness problems in data. Code versions, file metadata, and environment information is logged.
- Highlights and annotations
Transforms can capture and surface important information to the consumer of the pipeline. This is by default provided in the execution notification messages.
Each dataset computed by the transform can also be associated with detailed description and notes. This information is also surfaced appropriately so that user knows what dataset is being looked and understand all the ciritical aspects.
- Dry-run mode
The engine supports a dry-run mode and full execution mode. The former is useful to make sure all the transforms and skins are being configured correctly.
The pipeline loads assets of all applications deployed. This enables cross-application reuse of code, datasets, and knowledge.
When to Use Enrich→
The Enrich Platform is most suited for a large range of data science usecases but is not a good fit if:
- If you are using extremely large datasets (100s of TB)
- Application integration complexity is high
- Thirdparty application integration or visualization is the focus