User-Defined Operators (UDO)
If you need functionality that the built-in operators (e.g. automl, what-if, and pivot chart) can't provide, you can create a user-defined operator (UDO). With UDOs, you can create new transformations, train models for future use, and make custom visualizations. See the UDO Showcase for examples.
Creating UDOs
A new operator can be created from the main menu, just as with a canvas. Click operator
in the drop-down menu from the add new
button in the top-left corner.
Using UDOs
Once an operator has been created, it will appear in the operators
tab in the main menu.
To use an operator for your analysis, you must add it to a canvas. To do this, drag the operator onto the operators
section of the canvas's side panel, in a similar fashion to adding datasets to a canvas. After adding a UDO to a canvas, it will appear in the list of operators in the canvas and can be used just like any built-in operator, as in the image below.
Editing and Testing UDOs
Before a UDO can perform useful tasks, you must implement its specification. This specification covers all aspects of the operator's functionality, including:
- Name, size, and description
- Custom parameters
- Python code to run upon execution
- Visualization logic (using Vega or Vega-Lite)
You can access a UDO's specification by entering the UDO editor. Before entering the UDO editor, select the UDO in the main menu. A panel will appear to the right. In this panel, as with canvases, you can add datasets and collaborators. The datasets specified here will become available for testing purposes in the UDO editor, and any collaborators you specify will be able to view the UDO's specification as well.
The UDO editor can be entered by clicking the edit
button on the UDO side panel. An image of the UDO editor, with several numbered areas to be discussed next, is shown below.
Editor Menu
The menu bar at the top of the UDO editor exposes a few different functions. Each of these is numbered in the above image for reference.
Home Button (1)
Click the Home button to go back to the main menu.
Operator Name (2)
To change the name of the UDO, edit this field. The name displayed here is what will be shown in the canvas.
Import / Export (3)
UDOs can be saved as .zip
files by clicking the export
button, saving all of the UDO's information in the exported file. By clicking import
and selecting such a .zip
file, the contents of the file will be loaded into the currently opened UDO. Alternatively, a UDO can be imported directly in the main menu by dragging and dropping a .zip
file, just as with .csv
files.
Importing UDOs can be useful when importing directly from Einblick's public UDO repository. To do this, find the UDO you want to import, then download the directory as a .zip
file. The downloaded file can be directly imported into Einblick.
Operator Type and Model Type (4)
The type class of the operator can be set through these two menus (note that the output model type
menu is unavailable for operators with type UDF
.) In most cases, the UDF
operator type is used for operators which perform operations on individual rows (e.g. adding a new column based on the value of each row of another column) while the UDA
operator type is used for operators which perform operations over entire columns at a time (e.g. clustering operators).
These types are described in further detail in the Leveraging Einblick's Progressive Engine section.
Play Button (5)
Runs the operator for purposes of testing. See the Run Operator View section below.
Publish (6)
Clicking publish
removes the draft status of the operator and makes the operator available for use in canvases. If the operator had previously been published, this changes all instances of the operator (e.g. in canvases) to use the latest specification.
Run Operator View
The right side of the UDO editor contains the input settings and the resulting output after running the UDO. The image below, with relevant sections numbered, shows what this looks like after updating the UDO specification clicking the play button.
Dataset Input (1)
In the first part of the input
section, you can specify datasets to use as inputs into the operator. The number and name of these inputs will correspond to the specified DataframeInputModels
in the operator specification
code tab (discussed in more detail later.) The available datasets to use here will be those added to the datasets
section of the UDO panel in the main menu.
Attribute and Parameter Inputs (2)
The second part of the input
section allow you to choose attribute and parameter settings for testing your UDO. The inputs that appear here will correspond to the AttributeConfigGroupInputDescription
and ValueInputDescription
sections of the operator specification
code, and will be visually controlled by the corresponding sections of the InputUI
section. These will also be discussed in detail further down.
Visualization (3)
If you've specified code in the visualization
tab, the generated Vega or Vega-Lite visualization will appear here. Otherwise, a tabular representation of the data will appear here.
Logs (4)
If there are any logs resulting from the execution of the operator (e.g. through print
statements), or if the system encounters errors from running the operator, the corresponding information will appear here.
Editing UDO Code
The primary function of the UDO editor, of course, is to specify the code underlying UDOs. A UDO's code is divided between a number of code files, each displayed in a separate tab in the UDO editor, with each governing a specific part of the UDO's behavior. While most tabs are unnecessary in most cases, the ones most commonly used are bolded below.
operator specification
(JSON): defines the inputs and parameters of the operatorrequirements
(Python): a list of required Python packagesmodel definition
(Python): definition of the trained model for trained UDOson_open
(Python): code to run upon operator initializationon_batch
(Python): code to run upon receiving a new batch of dataon_close
(Python): code to run upon finishing executionon_reset
(Python): code to run upon execution resetsvisualization
(Vega or Vega-Lite): the Vega or Vega-Lite specificationfilters
(JSON): defines filters for customizing user interactions with visualizations
Each tab is explained in further detail below. Also note that each tab has a description and documentation link available to its top-right, as illustrated in the image below.
Operator Specification
This tab controls various properties of a UDO. The most important properties include:
- Visual properties, such as width, height
- Inputs and outputs, such as the name and number of input dataframes
- Attribute input menus, controlling how users select columns from input dataframes
- Custom parameter menus, allowing users to set custom values
There are two possible input methods for setting the operator specification. The first, which appears by default, is a form which allows you to specify most of the properties available to UDOs. The form view is displayed in the image below.
In the form view, you can specify the following properties:
- Basic properties:
- Width (pixels)
- Height (pixels)
- Description (Markdown): a short description describing the UDO, which will be available at the bottom of the operator
- Auto-execute on change: whether to re-execute the operator automatically when any input dataframes or parameters change
- Include output dataframe: whether to expose an output dataframe once the UDO is run
- Dataframe Inputs: how many input dataframes to expose, and their names
- Attribute Selection Inputs: operator input menus allowing the selection of attributes from input dataframes
- Customizable properties include:
- Name of the input dataframe to select attributes from
- Number and type of selectable attributes
- Customizable properties include:
- Custom Value Inputs: operator input menus allowing the specification of custom parameters
- Available input types include:
- Text field
- Numeric field
- Numeric slider
- Checkbox
- Multiple-choice list
- Available input types include:
The second input method is through a JSON file which describes all of the above properties along with more advanced customizability. The code view of the operator specification tab is described in detail on the Operator Specification JSON page.
Requirements
In this tab, enter a list of python packages to include for the Python code tabs.
Package Whitelist
Among external packages, only the following are currently supported as requirements:
nltk
sklearn
scikit-learn
pandas
numpy
scipy
pycountry
reverse_geocoder
xgboost
Model Definition
If specified, a UDO can output a trained model instead of a dataframe. A trained model, just as with the built-in automl
operator, allows us to build models on prior data and predict values on new data. Trained UDOs can be useful if, for example, you want to use a custom machine learning model during your analysis.
The specification for the code of the model definition
tab is available on the Model Definition page.
Trained Model UDO Usage
If the output model type
of the UDO is set, running the UDO, instead of returning a dataframe or visualization as output, will return a trained model. This model will be returned in the form of an executor, as with the automl
operator. This is shown in the image below.
To use an executor of the returned model, drag out the gray box that appears in the UDO element once it finishes running. The executor can then be used with other datasets.
Batch Events (on_open
, on_batch
, on_close
, on_reset
)
These four files describe the UDO's behavior upon the OPEN, BATCH, CLOSE, and RESET events with respect to Einblick's progressive engine. Unless you are working with large datasets, you will generally only need to fill out the on_batch
tab. A quick summary of the four tabs is given below.
on_open
: code for initializing variables for the BATCH, CLOSE, and RESET eventson_batch
: code to run for each batch- for smaller datasets, only a single batch will be encountered, which allows us to ignore the other three tabs
on_close
: code specifying behavior after all batches have been runon_reset
: code specifying behavior upon the RESET event
See Leveraging Einblick's Progressive Engine for more details on the various event types.
Accessing Settings From the Operator Menus
To access the input dataframe(s), you will need the df
(dfs
) keyword. To access any custom parameters or selected attributes in the operator's menus, you will need to use the attributes
and params
keywords. See Keywords for more details.
Visualization
If a valid Vega or Vega-Lite specification is provided in the visualization
tab, the visual output of a UDO will be the corresponding visualization. The type of specification (Vega or Vega-Lite) is automatically inferred. To see what kinds of visualizations are possible and begin creating new visualizations, see the example specifications in the Vega Example Gallery and Vega-Lite Example Gallery.
Accessing Settings From the Operator Menus
To access any custom parameters or selected attributes in the operator's menus, you will need to use the attributes
and params
keywords. See Keywords for more details.
Visualization Data
The data output from a UDO (which may be modified by any provided Python code) will automatically be included in the data
field of the specification. Therefore, in most cases, the data
field can be left blank (or nearly blank in the case of Vega-Lite specifications) as below:
// Vega
"data": [],
// Vega-Lite
"data" { "values": [] }
If you want to provide additional, hard-coded data points, you may do so by inserting them appropriately into the data
field. Any additional data points will be insert after the automatically included data.
Auto-Completion and Documentation
To enable autocompletion and gain access to documentation in the editor, include a $schema
value in your specification, dependent on whether you are using Vega or Vega-Lite:
// Vega
$schema: "https://vega.github.io/schema/vega/v5.json";
// Vega-Lite
$schema: "https://vega.github.io/schema/vega-lite/v4.json";
Filters
The filters
tab is used to specify filters for the visualization. For example, this allows selecting specific groups of data points just by selecting a single point belonging to that group.
To use a filter, add an entry to the filters
array indicating the class of points to group together. For example, the following code groups together all points sharing the same value of x
. If this were used in a scatterplot UDO, then upon clicking a point, all points sharing the same value of x
as the selected point would also be selected.
{
"filters": [
"attributes['x'][0]"
]
}
Keywords (Dataframes, Attributes, Custom Parameters, Container Sizing)
To access a user's settings in a UDO (e.g. selected attributes, dataframe inputs), a few keywords are available for use in the Python code tabs (e.g. on_batch
) and in the visualization tab. They are described below.
Dataframe Keywords (df
, dfs[i]
) [Python only]
As with the python
operator, the dataframe inputs of a UDO can be accessed with the df
or dfs
keywords. df
is used when there is only one input dataframe, while dfs[i]
is used when there are multiple.
The following code block (in on_batch
) returns a new dataframe, produced by adding a column of zeros to the input dataframe.
# on_batch
df["zeros"] = 0
return df
Similarly, the next code block returns a new dataframe, which is the result of multiplying the Score
column from the second dataframe by 100 and adding it to the first dataframe.
# on_batch
dfs[0]["Score"] = dfs[1]["Score"] * 100
return dfs[0]
Attribute and Custom Parameter Keywords (attributes
, params
) [Python, Vega/Vega-Lite]
To retrieve the names of selected attributes, use the attributes
keyword. This keyword refers to either a Python dictionary or a JavaScript object, with keys corresponding to the attribute names (e.g. attributes
, features
, target
). Each of these keys will map to a list (or array) of selected attribute names.
For example, in the following image, the attributes Score
, GDP per capita
, and Social support
of the input dataset have been selected.
The resulting attributes
keyword object is as follows:
print(attributes)
# Result:
# {
# 'features': [
# 'GDP per capita',
# 'Social support',
# 'Score'
# ]
# }
Similarly, the params
keyword is available for any custom parameters. It is an object mapping parameter names to values. For example, given the following inputs:
the params
object is:
print(params)
# Result
# {
# 'n_clusters': 3,
# 'n_components': 3
# }
Both attributes
and params
are available in Python and Vega/Vega-Lite tabs.
Container Sizing (Vega/Vega-Lite)
The keywords container_width
__ and container_width
are available in the visualization
tab. They are useful for setting the visualization's size to fit the operator's dimensions.
Selections (Vega/Vega-Lite)
Adding the following line to a Vega or Vega-Lite specification will enable selections within the visualization, manipulated through clicking.
"selection": "SELECT_STORE",
Then selection store itself will then be available under the name select_store
(lowercase).