Skip to main content

Text Featurizer (NLP)

The text featurizer operator is used to obtain a vector representation (also known as an embedding) for words or sentences. This operator is particularly helpful in combination with the AutoML operator. When your data contains free-form text columns, the extracted numeric representation will produce vectors that represent words or sentences with similar meaning through similar vectors. This numerical representation can then be useful for predictive tasks.

To use the text featurizer operator, drag it onto the canvas and specify an input dataframe. Once the dataframe is in place, select the target column which you would like to extract a numeric representation from.

Example text featurizer operator

Once the embeddings have been computed, you will obtain an augmented version of your original dataframe horizontally concatenated with the embeddings. This new dataframe can then be used, for example, as the input of a prediction operator.