Node
enrichsdk.core.node
→
Core classes that must be used to build transforms
Integration(*args, **kwargs)
→
Model(*args, **kwargs)
→
Bases: Transform
Transforms that are models.
As of now no additional function interfaces exist. But model management functions will be added in future.
Source code in enrichsdk/core/node.py
Node(*args, **kwargs)
→
This is the base class for all modules including transforms, models, skins etc.
This provides a number of infrastructural elements that enable the module to the plugged into the application infrastructure.
This is for understanding only. Subclassing is done from derived classes such as Source, SearchSkin etc.
Source code in enrichsdk/core/node.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 |
|
args = {}
instance-attribute
→
dict: Arguments passed to the transformation from the configuration file
args_schema = {}
instance-attribute
→
dict: JSON Schema to validate the input args::
See: https://python-jsonschema.readthedocs.io/en/latest/ https://json-schema.org/
self.args_schema = {
"type": "object",
"properties": {
"name": {
"type": "string"
"description": "Name of the dataset"
},
"count": {
"type": "number"
"description": "Threshold"
}
...
}
"required": ["name", "count"]
}
}
author = 'Builtin'
instance-attribute
→
str: Author of this module.
This is overridden by author specified in the manifest.json
config = kwargs.pop('config')
instance-attribute
→
object: Configuration object
This is passed to the module during instantiation by the enrich compute engine
data_version_map = {}
instance-attribute
→
dict: Versions of the preprocessed data
.. warning::
.. deprecated:: 2.0
It has the form::
{
'raw': 1.0,
'preprocessed': 1.3
}
datasets = []
instance-attribute
→
Input and output datasets handled by this transform
debug = False
instance-attribute
→
bool: Debug mode operation at the transform level
dependencies = {}
instance-attribute
→
Specifies the list of frames and transformations that this transform dependens on. This value has to be overriden by the subclass explicitly. Otherwise it transform will throw a validation error. For example::
{
'article': 'Core',
'store': ['EnrichedStore', 'EnrichedTransform']
...
}
description = 'Unknown'
instance-attribute
→
str: Text description of the transform
This is overridden by description specified in the manifest.json
enable = True
instance-attribute
→
bool: Is this transform enabled?
A transform could be included but not enabled
files = ['manifest.json']
instance-attribute
→
list: Files in this module
fullname
property
→
str: One line summary of the transform
manifest = {}
instance-attribute
→
dict: Manifest json file loaded
minorversion = 1.0
instance-attribute
→
float: Minor version of this module
name = 'Unknown'
instance-attribute
→
str: Name of the transform. Required
This is overridden by name specified in the manifest.json
node_type = 'Unknown'
instance-attribute
→
str: Node type
This is one of Transform, Model, Skin, Integration
output_version = 1.0
instance-attribute
→
float: Version of the output of this module
outputs = {}
instance-attribute
→
dict: Outputs of this module
It has the form::
{
<pattern> : <description>,
<pattern>: {
'description': <text>
}
}
roles_current = 'default'
instance-attribute
→
str: Current role of this transform
This attribute was used to sequence the transforms in early versions of Enrich. Now it is deprecated.
.. warning::
.. deprecated:: 2.0
roles_supported = ['default']
instance-attribute
→
list: A transform can support multiple roles such as source, sink etc. This lists the roles.
This attribute was used to sequence the transforms in early versions of Enrich. Now it is deprecated.
.. warning::
.. deprecated:: 2.0
supported_extra_args = []
instance-attribute
→
Extra arguments that can be passed on the command line and the GUI to the module::
self.supported_extra_args = [
{
"name": "jobid",
"description": "Export JobID",
"type": "str",
"required": True,
"default": "22832"
}
]
tags = []
instance-attribute
→
list: Tags associated with this transform
test = False
instance-attribute
→
bool: Whether this transform is being operated in test mode
testdata = self.get_testdata
instance-attribute
→
dict: Test data to be used for testing module
The default structure is instantiated by self.get_testdata. Each of the elements (e.g., conf) can be specified by over-riding corresponding method (e.g., self.get_testdata_conf). Such functions are supported for conf, args, global, data.
It has the form::
{
'data_root': os.path.join(os.environ['ENRICH_TEST'], self.name),
'outputdir': os.path.join(os.environ['ENRICH_TEST'], self.name),
'inputdir': os.path.join(os.environ['ENRICH_TEST']),
'statedir': os.path.join(os.environ['ENRICH_TEST'], self.name, 'state'),
'datasets': {
...
},
'global': {
'args': {
'rundate': '2020-01-01'
}
},
'conf': {
'args': {
'threshold': 0.6
...
}
},
'data': {
'article': {
'filename': 'test.csv',
'transform': 'Core',
'frametype': 'pandas',
'stages': [ 'transform1'],
'params': {
'sep': ','
}
}
}
}
When the module is tested, the sdk loads
$ENRICH_DATA/test/temp-configname/Core/test.csv
and makes it
available to the module using the state interface
If data in the frame is to be loaded from multiple files, you can specify the list and a merge function that gets the list of frames to merge::
{
..
'data': {
'article': {
'filename': {
'transform1': ['test.csv'],
'transform2': ['test.csv']
},
'frametype': 'pandas',
'mergedf': self.test_merge_func, # function
'transform': 'transform1',
'stages': ['transform2'],
'params': {
'sep': ','
}
}
}
}
...
def test_merge_func(self, dfs): ...
See get_test_conf
function to dynamically generate these
configurations.
datasets
is a dictionary with a specification for where the
input datasets should be stored and obtained from::
{
"command": command,
'params': {
'enrich_data_dir': '/home/ubuntu/enrich/data',
'backup_root': 'some-s3-path',
'node': 'some hostname'
},
'available': [
Dataset({
'name': "inventory_dataset",
...
}),
...
]
}
version = 1.0
instance-attribute
→
float: Version of this module
add_marker(state, name=None, suffix='Completed')
→
Adds an object to the state to force order. Doesnt serve any other purpose
Parameters:
Name | Type | Description | Default |
---|---|---|---|
state |
required |
Source code in enrichsdk/core/node.py
configure(conf)
→
Load the module and prepare to execute
Args: conf: Module configuration specified in config
Source code in enrichsdk/core/node.py
frame_get_overrides(state_detail)
→
Obtain any extra instructions passed by previous stage on how to process a frame
Returns:
Type | Description |
---|---|
args (dict): Dictionary passed by previous transform
|
Source code in enrichsdk/core/node.py
get_arg_overrides(state)
→
Get any over-rides provided from other transforms.
Args: state (object): State object
Returns: overrides (list)
Source code in enrichsdk/core/node.py
get_cache_dir(name=None, subdir='.', what='raw', create=False, extra={})
→
Get a local directory for caching partial results.
.. warning::
Deprecated. Not to be used.
Args: name (str): Namespace version (str): Version of the cache what (str): Class of data being stored create (bool): Create the cache directory extra (dict): Extra parameters passed to get_file
Returns: path (str): Path of the cache directory
Source code in enrichsdk/core/node.py
get_credentials_by_name(name)
→
Looks up the siteconf in ENRICH_ETC and returns appropriate entry
This name should in specific in the credentials section of siteconf.
Source code in enrichsdk/core/node.py
get_default_metadata(state)
→
Get reuse metadata dict with some standard fields.
Args: state (object): State object passed by the pipeline
Returns: metadata (dict): Dictionary with standalone:pipeline schema
Source code in enrichsdk/core/node.py
get_file(*args, **kwargs)
→
File path resolver. Insert transform and pass to the pipeline execution engine.
Args: abspath (bool): Generate absolute path (default: True) extra (dict): Extra args for resolving the path
Returns: path (str): Resolved path
Source code in enrichsdk/core/node.py
get_relative_path(path, what='enrich_data_dir')
→
Get the path relative to a predefined root
Args: path (str): Path of object what (str): What should the root of the relative path be?
Source code in enrichsdk/core/node.py
get_supported_extra_args()
→
Get the command line arguments. This will handle any callables specified by the default.
Returns: args (list): List of arg specifications. Default values can be actual strings/numbers or callback functions
See supported_extra_args
Source code in enrichsdk/core/node.py
get_test_conf()
→
Function to get test configurations.
This can be used to dynamically generate test configurations by overiding this function in the derived class.
get_testdata()
→
Generate test data structure
Returns:
testdata (dict) : Dictionary
Source code in enrichsdk/core/node.py
get_testdata_args()
→
get_testdata_conf()
→
Get testdata conf attribute
Returns:
Name | Type | Description |
---|---|---|
dict |
a dictionary of args and any other elements
|
get_testdata_data()
→
Get 'data' element of test data. This specifies what dataframes have to be loaded to test this transform.
Returns:
Name | Type | Description |
---|---|---|
dict |
A specification of dataframes and loading instructions
|
Example::
return { 'article': { 'filename': 'test.csv', 'transform': 'Core', 'frametype': 'pandas', 'stages': [ 'transform1'], 'params': { 'sep': ',' } } }
Source code in enrichsdk/core/node.py
get_testdata_global()
→
Get testdata global attribute. Global args are pipeline-wide args.
Returns:
Name | Type | Description |
---|---|---|
dict |
a dictionary of global args
|
Example::
return {
'args': {
'rundate': '2020-01-01'
}
}
Source code in enrichsdk/core/node.py
get_versionmap(include_tags=False)
→
Export the versionmap
Args: include_tags (bool): Whether to include git tags
Returns: verionmap (list): List of dicts - one for each module
Source code in enrichsdk/core/node.py
has_tag(tags)
→
Does the module have any of the specified tags?
Returns:
Type | Description |
---|---|
bool
|
True if there is an overlapping tag otherwise False |
Source code in enrichsdk/core/node.py
initialize()
→
Called at the time of instantiating the module.
This can be used to open connections, setup state and other functions.
.. deprecated::
Use configure/other functions to complete the initialization..
Source code in enrichsdk/core/node.py
instantiable()
classmethod
→
is_enabled()
→
is_integration()
→
Test whether the module is an integration
Returns:
Type | Description |
---|---|
bool
|
.. warning::
.. deprecated:: 2.0 Will be dropped in future
is_model()
→
is_notification()
→
is_search()
→
is_sink()
→
is_skin()
→
Test whether the module is a skin
Returns:
Type | Description |
---|---|
bool
|
.. warning::
.. deprecated:: 2.0 Use enrichsdk.render library functions
is_transform()
→
load()
→
Run time loading of parameters.
The primary use of this function is for skins. They load the last run details for further processing.
note_access(filename, *args, **kwargs)
→
Note access to file to enable creation of lineage
Args: filename (str): File being accessed args (list): Extra args to be passed to baseclass for use
Source code in enrichsdk/core/node.py
pass_args(state, name, detail)
→
preload_clean_args(args)
→
Clean the arguments before using them
Args: args: args to be resolved/cleaned/extended
Returns: list: Cleaned args
Source code in enrichsdk/core/node.py
673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 |
|
preload_validate_conf(conf)
→
Check whether this configuration is even valid?
Args: conf: Module configuration specified in the config file
Raises: Generic exception if conf is not a dictionary or if the version doesnt match
Source code in enrichsdk/core/node.py
preload_validate_self()
→
Syntactic check of the supported_extra_args, output other future configurations
Source code in enrichsdk/core/node.py
process(state)
→
Execute the module function whatever it is.
The process function typically extracts the dataframes from the state, computes on it, and writes them back.
Every module implements this function except Skins.
Args: state: State object passed by the pipeline manager
Source code in enrichsdk/core/node.py
ready(state)
→
Test whether the module is ready for processing.
This is typically not over-ridden. The platform implemets a number of checks including preconditions and checks before calling the module. The developer specifies the preconditions using the dependencies object (which could be overridden in the configuration)
This function is there for special cases.
Source code in enrichsdk/core/node.py
set_arg_overrides(state, name, detail)
→
Set any over-rides provided from other transforms.
Source code in enrichsdk/core/node.py
validate(what, state)
→
Validate various aspects of the transform state,
configuration, and data. Dont override this function. Override
specific functions such as validate_args
.
Args: what (str): What should be validated? args, conf, results etc. state (object): State to checked
Returns:
Nothing
Raises:
Exception ("Validation error")
Source code in enrichsdk/core/node.py
validate_args(what, state)
→
Validate user-specified args for both content and structure against args_schema.
Args: what (str): "args" state (object): State to checked
Returns:
Nothing
Raises:
Exception ("Validation error")
Source code in enrichsdk/core/node.py
validate_conf(what, state)
→
Validate user-specified configuration for both content and structure.
Args: what (str): "conf" state (object): State to checked
Returns:
Nothing
Raises:
Exception ("Validation error")
Source code in enrichsdk/core/node.py
validate_input(what, state)
→
Validate input data including dataframes in the state
Args: what (str): "args" state (object): State to checked
Returns:
Nothing
Raises:
Exception ("Validation error")
Source code in enrichsdk/core/node.py
validate_results(what, state)
→
Post-processing validation check to make sure that the computation has happened correctly - existance and values of the results.
Args: what (str): "results" state (object): State to checked
Returns:
Nothing
Raises:
Exception ("Validation error")
Source code in enrichsdk/core/node.py
validate_testdata(what, state)
→
Validate the test data provided for structure and semantics. Test for existance of files and appropriate load commands being present.
Args: what (str): "testdata" state (object): State to checked (generally ignored)
Returns:
Nothing
Raises:
Exception ("Validation error")
Source code in enrichsdk/core/node.py
NodeMeta(name, bases, dct)
→
Meta class for all elements with schemas. This allows for registration, validation, and tracking of the schema implementors.
Source code in enrichsdk/core/node.py
Skin(*args, **kwargs)
→
Bases: Node
.. deprecated:: 2.0
Renders the data computed the transforms and modules.
This is typically used by built-in capabilities such as Search. An alternative to implementing this is to use the usecase apps capabilities (explained elsewhere)
Source code in enrichsdk/core/node.py
load()
→
render()
→
This is a generic data rendering capability. Each implementation of a skin should provide this.
This generic rendering capability inturn calls implementation specific rendering called render_helper (see below)
Source code in enrichsdk/core/node.py
template_get_variables(widgetname)
→
Get the variables from a template
Args: widgetname (str): Name of the template
Returns: List of variables
Source code in enrichsdk/core/node.py
template_render(widgetname, context)
→
Render a specified template using the context
Args: widgetname (str): Name of the template context (dict): Key-value pairs
Returns: Rendered html template that can be embedded
Source code in enrichsdk/core/node.py
Transform(*args, **kwargs)
→
Bases: Node
This is the base class for all transforms.
Source code in enrichsdk/core/node.py
collapse_columns(details)
→
get_column_description(frame, column, outputs=None)
→
Look through the output definition to extract the description. The output description sometimes specifies a pattern that must be matched. Invalid patterns or patterns that dont match are ignored.
Args: frame (object): name of the dataframe being processed column (object): Name of the column whose description is required
Returns: Description if found else empty string
Source code in enrichsdk/core/node.py
get_column_metadata(name, df)
→
Generate columns metadata to be associated with this dataframe in the state.
Args: name (str): name of the dataframe being processed df (DataFrame): Dataframe to be documented
Returns: Dict with columns metadata
Source code in enrichsdk/core/node.py
get_column_params(name, df)
→
Generate columns metadata to be associated with this dataframe in the state. In params form.
Args: name (str): name of the dataframe being processed df (DataFrame): Dataframe to be documented
Returns: List of dicts. Each dict has 'type' (columns) and columns metadata (a dict)
Source code in enrichsdk/core/node.py
get_column_security(frame, column, outputs=None)
→
Look through the output definition to extract the security attributes including privacy.
Args: frame (object): name of the dataframe being processed column (object): Name of the column whose security attributes are required
Returns: dict with attribute. Could be multiple
Source code in enrichsdk/core/node.py
get_column_version(frame, column, outputs=None)
→
Look through the output definition to extract version attributes
Args: frame (object): name of the dataframe being processed column (object): Name of the column whose version attributes are required
Returns: string (version)
Source code in enrichsdk/core/node.py
get_file_params(name, df, path, fshandle=None)
→
Generate output metadata for files (required for compliance)
Args: df (Dataframe): Pandas dataframe path (str): Path where the dataframe was dumped fshandle (obj): S3 filesystem handle of (s3fs)
Returns: list: List of dicts
Source code in enrichsdk/core/node.py
lookup_column_output(frame, column, outputs=None)
→
Look through the output definition to extract the description. The output description sometimes specifies a pattern that must be matched. Invalid patterns or patterns that dont match are ignored.
Args: frame (str): name of the dataframe being processed column (str): Name of the column whose description is required
outputs (dict): A dictionary that has a set of patterns for each frame.
Returns: Description if found else empty string
Source code in enrichsdk/core/node.py
1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 1665 1666 1667 1668 1669 1670 1671 1672 1673 1674 1675 1676 1677 1678 1679 1680 1681 1682 1683 1684 1685 1686 1687 1688 1689 1690 1691 1692 1693 1694 1695 1696 1697 1698 1699 1700 1701 1702 1703 1704 1705 1706 1707 1708 1709 1710 1711 1712 1713 1714 1715 1716 1717 1718 |
|