Non-Core
Modules outside that help with a number of transform development and operations.
Dataset Management→
Please also see Test Data Setup
section for command line interface
to use the datasets.
Transforms have numerous data dependencies, and as time is progressing the dependencies are increasing and they have to tracked. Further datasets are used in multiple places including the feature marketplace and compliance modules.
Abstract dataset specification. This is useful for:
(a) handling dependent datasets (b) cleaning (c) checking (d) validating & sampling (e) discovery of data
There are two abstractions: Dataset and DatasetRegistry.
Arg(name, description, *args, **kwargs)
→
Dataset command argument
Source code in enrichsdk/datasets/discover.py
DataSource(params, *args, **kwargs)
→
Bases: object
Class for specifying a dataset. This is a base class meant to be derived and implemented.
Source code in enrichsdk/datasets/discover.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 |
|
alt_names = alt_names
instance-attribute
→
list: Alternative names for the dataset Example: "['txn_charges']"
description = params.get('description', '')
instance-attribute
→
str: Description Example: "Daily events for Alpha Sensor"
isfile = params.get('isfile', False)
instance-attribute
→
Specifies whether the dataset is simple (a file for each run) or more complex directory hierarchy.
Example::
# A complex dataset
Dataset(params={
"name": 'athena-availability',
"isfile": False,
...
name = params['name']
instance-attribute
→
Unique name of the dataset, e.g., Events
paths = params.get('paths', {})
instance-attribute
→
dict: Paths to be resolved Example::
[
{
"name": "test",
"nature": "local",
"path": "%(data_root)s/shared/datasets/athena/v2/availability"
}
]
registry = None
instance-attribute
→
object: Dataset manager object
resolve = kwargs.get('resolve', params.get('resolve', {}))
instance-attribute
→
Parameters to resolve paths
subsets = params.get('subsets', [])
instance-attribute
→
A dataset oftentimes has multiple components. Define them here. We can give a name to each.
Example::
{ ... "subsets": [ { "name": "v1culltable", "filename": "v1culltable.csv" },... ] }
version = params.get('version', 'v1')
instance-attribute
→
str: Version Example: "v1"
get_doodle_source(filename)
→
Name as it may be in doodle
Source code in enrichsdk/datasets/discover.py
get_match_spec()
→
get_path_by_name(name, full=False, resolve=None)
→
Return the path definition for a path with a given name
Source code in enrichsdk/datasets/discover.py
get_paths()
→
get_subset_description(name)
→
Get the description for a subset
Source code in enrichsdk/datasets/discover.py
get_subset_detail(name)
→
Return the names of subsets available
get_subset_match_spec(name, params={})
→
Get a serializable specification of the matching function for the subset of this dataset
Source code in enrichsdk/datasets/discover.py
get_subsets()
→
get_subsets_detail()
→
has_subsets()
→
in_subset(name, spec)
→
Return the names of subsets available
Source code in enrichsdk/datasets/discover.py
matches(names)
→
Check if the dataset has a particular name
Source code in enrichsdk/datasets/discover.py
set_registry(registry)
→
set_resolve(resolve)
→
Dataset(params, *args, **kwargs)
→
Bases: DataSource
Class for specifying a dataset. This is typically an input to the transforms.
Usage::
# A dataset that has one directory for each day. Within that
# there are atleast two sub-datasets
Dataset(params={
"name": 'athena-availability',
"type": "file",
"paths": [
{
"name": "test",
"nature": "local",
"path": "%(data_root)s/shared/datasets/athena/v2/availability",
},
{
"name": "local",
"nature": "local",
"path": "%(enrich_data_dir)s/acme/Marketing/shared/datasets/athena/v2/availability",
},
{
"name": "s3",
"nature": "s3",
"path": "%(backup_root)s/%(node)s/data/acme/Marketing/shared/datasets/athena/v2/availability",
},
],
"match": {
"generate": "generate_datetime_daily",
"compare": "compare_datetime_pattern_match_range",
"pattern": "%Y-%m-%d",
},
"subsets": [
{
"name": "CatalogScore",
"filename": "catalogscore.csv",
"description": "Score of all products in the catalog"
},
{
"name": "ProductWeight",
"filename": "productweight.csv",
"description": "Assortment weight for each product"
}
]
})
Source code in enrichsdk/datasets/discover.py
backup
property
→
Read the path with the name 'backup'. Available for backward compatability
local
property
→
Read the path with the name 'local'. Available for backward compatability
match = params.get('match', {})
instance-attribute
→
dict: Generating and matching rules
Example::
{
"generate": "generate_datetime_daily",
"params": {},
"match": "match_datetime_pattern_range",
"pattern": "plpevents-%Y%m%d-%H%M%S",
}
test
property
→
Read the path with the name 'test'. Available for backward compatability
compare(name, start, end=None)
→
Given a start and an end, generate the datetime objects corresponding to each run
Args: name (str): Directory name start (datetime): Start of the time range end (datetime): End of range. If None, default = start
Source code in enrichsdk/datasets/discover.py
compare_datetime_pattern_match_range(name, start, end)
→
Given a start and an end, generate the datetime objects corresponding to each run for each day, and check whether a named file/directory exists in that list.
Args: start (datetime): Start of the time range end (datetime): End of range. If None, default = start
Source code in enrichsdk/datasets/discover.py
generate(start, end=None, full=False, name='default', resolve=None)
→
Given a start and an end, generate the datetime objects corresponding to each run
Args: start (datetime): Start of the time range end (datetime): End of range. If None, default = start full (bool): Whether full path is required or only the suffix. Default is false name (str): Name of the path specification. Optional is full path is required resolve (dict): Additional path resolution parameters
Returns: list (dict): List of dictionaries (name, timestamp)
Source code in enrichsdk/datasets/discover.py
listdir(name, fshandle=None, resolve=None, detail=False)
→
List the dataset directory
Source code in enrichsdk/datasets/discover.py
match_content(fs, localdir, backupdir, dirname)
→
Check whether local path is replicated in the s3/blob store
Args: fs (object): s3fs handle localdir (str): Root local dir to check backupdir (str): Root remote dir in block store to check dirname (str): path within the localdir
Source code in enrichsdk/datasets/discover.py
read_data(start_date, end_date, filename, readfunc, errors=True, name='default', params={}, resolve={})
→
Read a single dataset
Args: start_date (str): Starting date to scan end_date (str): Ending date of the scan filename (str): Filename within each day's directory. If None, readfunc will be called with the directory name readfunc (method): Callback errors (bool): What to do on failure. True = bailout name (str): Which subset to look at? params (dict): Params to the readfunc resolve (dict): Parameters for resolution of path
Returns: dataframe: Data read dict: Optional metadata on the read
Source code in enrichsdk/datasets/discover.py
read_subset(start_date, end_date, subset, readfunc, errors=True, name='default', params={}, resolve={})
→
Read a particular subset of the dataset
Source code in enrichsdk/datasets/discover.py
sample(filename, safe=True, fd=None, nrows=10, encoding='utf-8')
→
Sample a file belonging to this dataset.Subclass and overload this function if the data is sensitive.
Args: filename (str): A file that belongs to this dataset safe (bool): Whether file is trusted fd (object): File descriptor for s3/gcs/other non-filesystems nrows (int): Number of rows to sample
Source code in enrichsdk/datasets/discover.py
validate(params)
→
Validate dataset arguments. It should be a dictionary with atleast three elements: name, match, and paths.
Args: params (dict): Parameters for dataset
Params dict should have the following:
- name: string
- match: dictionary with generate function (name or lambda), are function (name or lambda), and pattern (string).
Source code in enrichsdk/datasets/discover.py
DatasetRegistry(*args, **kwargs)
→
Bases: object
Registry for datasets.
This provides a search and resolution interface.
There is a notion of a command - a scp/aws command line template. The registry allows enumeration of the commands and enables scripting.
Args: commands (list): List of command templates. Optional. If not specified, the system will use defaults. resolve (dict): Path resolution dictionary
Source code in enrichsdk/datasets/discover.py
add_datasets(items)
→
Add to the registry. Each element in the item list should be a Dataset (or subclass).
Args: items (list): A list of datasets
Source code in enrichsdk/datasets/discover.py
find(names)
→
Find a dataset in the registry.
Args: names (str): Name of the dataset
Source code in enrichsdk/datasets/discover.py
get_command(name, source_type='File')
→
Get the command specification by specifying its name
Args: name (str): Name of the command template
Source code in enrichsdk/datasets/discover.py
get_commands(source_type='File')
→
Get the command specifications as a list of dicts
Source code in enrichsdk/datasets/discover.py
list()
→
set_params(params)
→
set_resolve(resolve)
→
Set the resolution parameters
Args: resolve (dict): Resolution parameters
DateArg(name, *args, **kwargs)
→
Doodle(cred, *args, **kwargs)
→
Bases: object
default
Source code in enrichsdk/datasets/doodle.py
access_server(path, params={}, data={}, method='get')
→
Get/Post from doodle server
Source code in enrichsdk/datasets/doodle.py
add_feature(catalog_id, source_id, details)
→
Update the source with latest feature information
Source code in enrichsdk/datasets/doodle.py
add_source(catalog_id, details)
→
Update the catalog with latest dataset information
Source code in enrichsdk/datasets/doodle.py
compute_source_paths(source, start, end)
→
Read the source data from start to end dates...
Source code in enrichsdk/datasets/doodle.py
find_catalog(name, version)
→
Find one catalog with a precise name and version
Source code in enrichsdk/datasets/doodle.py
get_catalog(catalog_id)
→
Get the details of one catalog
Source code in enrichsdk/datasets/doodle.py
get_feature(feature_id)
→
Update the feature with latest dataset information
Source code in enrichsdk/datasets/doodle.py
get_source(source_id)
→
Update the feature with latest dataset information
Source code in enrichsdk/datasets/doodle.py
get_source_paths(start, end, name=None, version='v1', source_id=None)
→
Find all the source paths
Source code in enrichsdk/datasets/doodle.py
get_url()
→
Get base url
Source code in enrichsdk/datasets/doodle.py
list_catalogs(only_active=True, offset=0, limit=10, order_by=None)
→
Search the catalogs
Source code in enrichsdk/datasets/doodle.py
list_features(source_id=None, catalog_id=None, only_active=True, offset=0, limit=5000, order_by=None)
→
List available features for a source
Source code in enrichsdk/datasets/doodle.py
list_sources(catalog_id=None, only_active=True, offset=0, limit=1000, order_by=None)
→
List available sources for a catalog
Source code in enrichsdk/datasets/doodle.py
search_catalogs(only_active=True, name=None, version=None, offset=0, limit=10, query=None, modified_since=None, modified_before=None, order_by=None)
→
Search the catalogs
Source code in enrichsdk/datasets/doodle.py
search_features(only_active=True, name=None, version=None, catalog_id=None, source_id=None, offset=0, limit=10, query=None, modified_since=None, modified_before=None, order_by=None)
→
Search the features
Source code in enrichsdk/datasets/doodle.py
search_sources(only_active=True, name=None, version=None, catalog_id=None, offset=0, limit=10, query=None, modified_since=None, modified_before=None, order_by=None)
→
Search the sources
Source code in enrichsdk/datasets/doodle.py
update_feature(feature_id, details)
→
Update the feature with latest dataset information
Source code in enrichsdk/datasets/doodle.py
update_source(source_id, details)
→
Update the catalog with latest dataset information
Source code in enrichsdk/datasets/doodle.py
Feature Engineering→
Modules to support feature engineering of objects (dictionaries).
There are two base classes at the individual feature level and at a feature set level. There is a compute function that iterates through these for all the objects
FeatureExtractorBase(*args, **kwargs)
→
Extract a single feature
Example::
class SimpleFeatureExtractor(FeatureExtractorBase):
def __init__(self, *args, **kwargs):
self.name = "simple"
def extract(self, name, data, key=None):
if key is None:
key = name
value = jmespath.search(key, data)
return [{
'key': name,
'value': value
}]
Source code in enrichsdk/feature_compute/__init__.py
extract(name, data, key=None)
→
Given data and a name, generate some attributes. The return value should be a list of dictionaries
Args: name (str): name of the feature data (dict): A dictionary key (str): Dictionary key potentially if name is not key
Returns: list: List of dictionaries. Each dict has a "key" and "value
Source code in enrichsdk/feature_compute/__init__.py
FeaturesetExtractorBase
→
Compute a featureset - collection of features. To be used in conjunction with the FeatureCompute(outer) and FeatureExtractor (inner). We define and use multiple extractors::
class CustomerFeaturesetExtractor(FeaturesetExtractorBase):
'''
Customer timeseries featureset extractor
'''
def get_extractors(self):
return {
"simple": SimpleFeatureExtractor(),
}
def get_specs(self):
specs = [
{
"keys": ['days'],
"extractor": "simple"
},
]
return specs
def one_record(self, data):
allfeatures = super().one_record(data)
return allfeatures
def clean(self, df):
df = df.fillna("")
return df
clean(df)
→
Clean the collated dataframe/list/other.
Args: df (object): output of collate
Returns: object: A cleaned collated object
collate(features)
→
Combine a outputs of the extractors (each of which is a dictionary) into an object. It could be anything that the cleaner can handle.
Args: features (list): List of features extracted by one_record
Returns: object: Could be a combined dictionary/dataframe/other
Source code in enrichsdk/feature_compute/__init__.py
document(name, df)
→
Document the dataframe generated. The default is to capture schema, size etc. Over-ride to extend this documentation.
Args: df (object): output of collate name (str): name of the featureset extractor specification
Source code in enrichsdk/feature_compute/__init__.py
finalize(df, computed)
→
Take cleaned data and generate a final object such as a dataframe
Args: df (object): output of collate computed (dict): featureset extractor name -> collated/cleaned object
Returns: object: final data object
Source code in enrichsdk/feature_compute/__init__.py
get_extractors()
→
Returns a list of extractors. This is over-ridden in the subclass. Sample::
return {
"simple": SimpleFeatureExtractor(),
}
Returns: dict: Dictionary of name -> extractor class instance
Source code in enrichsdk/feature_compute/__init__.py
get_specs()
→
Returns a list of specifications. Each specification applies to one or more features. We specify a combination of keys in the input dictionary and a corresponding extractor. The keys could be a list or a dictionary.
For example::
[
{
"keys": ['age', 'sex'],
"extractor": "simple",
},
{
"keys": {
'gender': 'sex',
'old': 'age'
},
"extractor": "simple",
}
]
Source code in enrichsdk/feature_compute/__init__.py
one_record(data)
→
Process one record at a time. Pass it through the extractors, collect the outputs and return
Args: data (dict): One record to process
Returns: list: A list of dictionaries with features from this record
Rough logic::
get specs
for each spec:
find extractor
find name and corresponding keys
newfeatures = call extractor(name, keys) for one row in data
collect new features
collapse
return one or more 'feature row(s)'
Source code in enrichsdk/feature_compute/__init__.py
compute_features(objects, extractors, read_object=None)
→
Compute the features
Args: objects (list): List of objects to process. Could be names extractors (dict): Name to extractor mapping read_object (method): Turn each object into a dict
Returns: dict: name to dataframe mapping
Source code in enrichsdk/feature_compute/__init__.py
273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 |
|
note(df, title)
→
Quick summary of a dataframe including shape, column, sample etc.
Args: df (dataframe): Input dataframe title (str): Title
Returns: str: A formatted text to be used for logging
Source code in enrichsdk/feature_compute/__init__.py
Feature Store→
download(backend, service=None, featuregroup_id=None, run_id=None, data=None, debug=False)
→
Post featurestore to server
Args: backend (object): Backend class enrichsdk.api.Backend service (object): Dictionary with name, path featuregroup_id (int): Id of the featuregroup to download run_id (int): Id of the run to download data (dict): Dictionary to be posted (if filename not specified)
Returns dict: Response dictionary from the server
Source code in enrichsdk/featurestore/__init__.py
generate(backend, service=None, debug=False)
→
Generate sample specification files
Source code in enrichsdk/featurestore/__init__.py
post(backend, service=None, filename=None, data=None, debug=False)
→
Post featurestore to server
Args: backend (object): Backend class enrichsdk.api.Backend service (object): Dictionary with name, path filename (str): Path to the input file to the posted data (dict): Dictionary to be posted (if filename not specified)
Returns dict: Response dictionary from the server
Source code in enrichsdk/featurestore/__init__.py
search(backend, service=None, debug=False, params={})
→
Search featurestore for specific featuregroups
Args: backend (object): Backend class enrichsdk.api.Backend service (object): Dictionary with name, path args (dict): Search criteria as key, value paits debug (bool): Debug run or not
Returns dict: Response dictionary from the server
Source code in enrichsdk/featurestore/__init__.py
Data Quality→
Module to set and implement expectations
on dataframes. Note
that this is development mode and meant to be a preview.
As systems are getting automated, and processing more data everyday, it is hard to keep track of correctness of the entire system. We use expectations to make sure that decision modules such as ML/statistics code is being correctly fed.
An expectation is a set of structural rules that datasets should satisfy. It could be superficial such as names or order of columns. It could go deeper and specify the statistical attributes of the input.
This module help implement a specification. An example is shown below::
expectations = [
{
'expectation': 'table_columns_exist',
'params': {
'columns': ['Make', 'YEAR', 'Model', 'hello']
}
},
{
'expectation': 'table_columns_exist_with_position',
'params': {
'columns': {
'Make': 1
}
}
}
]
Each expectation has a name, and arbitrary parameters. The first expectation shown above will for example, check if the specified columns exist in the dataframe being evaluated. The result will be as shown below::
{
"expectation": "table_columns_exist_with_position",
"passed": true,
"description": "Check whether columns exist (in particular order)",
"version": "v1",
"timestamp": "2020-01-04T11:43:03+05:30",
"frame": "v2scoring",
"meta": {
"node": "data.acmeinc.com",
"pipeline": "AcmeLending",
"transform": "ScoringModule",
"runid": "scoring-20200104-114137",
"start_time": "2020-01-04T11:41:37+05:30"
}
}
This is stored as part of the state and dumped in the metadata for each run.
The expectations are used as follows::
expectationsfile=os.path.join(thisdir, "expectations.py")
checker = TransformExpectation(self,
mode='validation',
filename=expectationsfile)
result = checker.validate(df,selector=selector)
state.add_expectations(self, name, result)
Expectations ^^^^^^^^^^^^
Base classes for expectation and manager, and builtin expectations.
ExpectationBase(*args, **kwargs)
→
Bases: object
Base class for expectations
Initialize the base class for expectations
Source code in enrichsdk/quality/base.py
match(config)
→
Check if the expectations matches
Args: config: Expectation specification
Returns: True if this class can handle expectation
Source code in enrichsdk/quality/base.py
validate_args(config)
→
Validate arguments passed to this expectation
Check if the expectation is correctly specified beyond the name
Args: config: Expectation specification
Returns: True if this class can handle this specification
Source code in enrichsdk/quality/base.py
ExpectationResultBase(*args, **kwargs)
→
Bases: object
Class to return one or more results from an expectation validation step.
Source code in enrichsdk/quality/base.py
add_result(expectation, description, passed, extra={})
→
Add an entry to the validation result object
Args: expectation: Name of the expectation description: One line summary of the check passed: True if the check has passed extra: Dictionary with extra context including reason
Source code in enrichsdk/quality/base.py
TableColumnsExistExpectations(*args, **kwargs)
→
Bases: ExpectationBase
Check whether table has required columns
Configuration has to specify a list of columns::
{
'expectation': 'table_columns_exist',
'params': {
'columns': ['alpha', 'beta']
}
}
}
Source code in enrichsdk/quality/expectations.py
generate(df)
→
validate(df, config)
→
Check if the specified columns are present in the dataframe
Source code in enrichsdk/quality/expectations.py
TableColumnsPositionExpectations(*args, **kwargs)
→
Bases: ExpectationBase
Check whether table has the right columns in right positions.
Configuration has to specify the columns and their corresponding positions in the test dataframe::
{
'expectation': 'table_columns_exist_with_position',
'params': {
'columns': {
'alpha': 0,
'beta': 2
}
}
}
Source code in enrichsdk/quality/expectations.py
generate(df)
→
Not implementated yet.
Source code in enrichsdk/quality/expectations.py
validate(df, config)
→
Validate the names and positions of columns
Source code in enrichsdk/quality/expectations.py
Transforms ^^^^^^^^^^
This class bridges expectations library and the Enrich pipelines.
TransformExpectation(transform, mode=None, filename=None, expectations=None)
→
Bases: object
Class to provide a bridging interface between light weight expectations implemented here and the pipeline
Initialize the class
Args: transform: Transform using this expectation mode: Mode of operation (validation|generation) filename: Path to expectations file expectations: Explicitly provide expectations
Returns: Instantiated class
Source code in enrichsdk/quality/transforms.py
generate(df)
→
Render a specified template using the context
Args: df: Dataframe to profiled
Returns: Rendered html template that can be embedded
Source code in enrichsdk/quality/transforms.py
load_expectations_file(filename)
→
Load expectations whether specified as json, pickle, or py file into a instance variable.
Args: filename: Name of the expectations file
Returns: None:
Source code in enrichsdk/quality/transforms.py
validate(df, selector=None)
→
Run the loaded expectations
Args: df: Dataframe to evaluated selector: Function to select the
Returns: result: A list of dictionaries, each has evaluation result
Source code in enrichsdk/quality/transforms.py
Exceptions ^^^^^^^^^^
Exceptions used by the quality module.
IncorrectImplementationExpectation
→
Bases: Exception
Expectation implementation class didnt return right result
InvalidConfigurationExpectation
→
Bases: Exception
Expectation implementation class didnt return right result
Notebook→
This is intended to be used in the notebook context. It provides a number of capabilities including:
- Obfuscated credentials
- Resource management
- Searching of notebooks
Services to allow enrichsdk to be used in notebooks. It provides:
- Security - access to obfuscated credentials
- Resource - limit resource usage
- Indexing - search notebooks
- Metadata - Generate metadata to be included in the output files
Notebook(data_root=None)
→
Bases: object
This class allows one to read and search in notebooks
Source code in enrichsdk/notebook/__init__.py
get_file(path, create_dir=False)
→
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
required | ||
create_dir |
False
|
Example→
%(data_root)/acme/Projects/commands
This is resolved into /home/ubuntu/enrich/data/notebooks/acme/Projects/commands
Returns:
Type | Description |
---|---|
Full path from abstract specification
|
Source code in enrichsdk/notebook/__init__.py
get_metadata(filename=None, file_type=None)
→
Get reusable metadata dict with some standard fields.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
None
|
Returns:
Name | Type | Description |
---|---|---|
metadata |
Dictionary with a number of fields
|
Source code in enrichsdk/notebook/__init__.py
read_notebook(notebook_path, as_version=nbformat.NO_CONVERT)
→
Parameters:
Name | Type | Description | Default |
---|---|---|---|
notebook_path |
required | ||
as_version |
nbformat.NO_CONVERT
|
Returns:
Name | Type | Description |
---|---|---|
NotebookNode |
Dictionary representation of file
|
Source code in enrichsdk/notebook/__init__.py
save(file_name, metadata_path=None)
→
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_name |
required |
Returns:
Type | Description |
---|---|
None. Saves the data file specified and metadata about the file into Enrich's data dir
|
Source code in enrichsdk/notebook/__init__.py
search_notebooks(user_notebook_dir, keyword)
→
Parameters:
Name | Type | Description | Default |
---|---|---|---|
notebook_path |
required | ||
keyword |
required |
Returns:
Type | Description |
---|---|
List of path strings to Notebooks that satisfy the search
|
Source code in enrichsdk/notebook/__init__.py
search_within_notebook(notebook_path, keyword)
→
Parameters:
Name | Type | Description | Default |
---|---|---|---|
notebook_path |
required | ||
keyword |
required |
Returns:
Type | Description |
---|---|
Boolean, true if keyword is found in ipynb file; False if not.
|
Source code in enrichsdk/notebook/__init__.py
set_resource_limits(params={})
→
Set resource limits for the notebook. This applies only to this notebook.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
params |
of available memory to be used. |
{}
|