# What-if

The What-if operator allows us to observe the behavior of custom variables under various scenarios we specify. By performing this What-if analysis, we can better understand what to expect from a range of possible decisions, and choose what works best for us.

Combined with other operators, the What-if operator opens up many new possibilities for gaining insight into a situation and making decisions.

### Terminology​

What-if analysis involves the specification of various hypothetical scenarios, each of which implies some impact on some quantity (or quantities) we are interested in. Therefore, to use the What-if operator, we will need two things:

• One or more independent variables, which define a specific possible scenario
• One or more dependent variables, which vary in accordance with the values of the independent variables

Independent variables, and the values they take on, represent the scenarios we are considering. For example, a bookstore owner looking to increase sales might consider a range of potential discounts; in this case, the discount amount is the independent variable, and each value it takes on corresponds to a different scenario. If that same bookstore owner wants to run separate discounts for fiction and nonfiction books, they now have two independent variables -- the fiction discount, and the nonfiction discount -- and each combination of values for the two variables defines a different overall scenario.

Dependent variables, as their name implies, are dependent on the scenario (i.e. independent variables) we are considering. They are the metrics or figures we are most interested in. For the bookstore owner, an important independent variable is likely profit; by running the optimal discount, they are likely to increase their profit. If they can determine which combination of discounts (i.e. combination of independent variables) leads to the highest profit (i.e. dependent variable), they can make the best possible decision available to them. If, for some reason, the bookstore owner must also sell a certain minimum number of books, then they now must consider two dependent variables: profit, and number of books sold.

### Usage: Bookshop Tutorial​

To help illustrate how to use the What-if operator, we will continue with our bookshop example, where we want to know what the best discount to offer is. We will work with the simplified book dataset shown below. Each row corresponds to a genre of books, and each genre has a type (fiction or nonfiction), a sales price, a cost for us to acquire the books within it, and an interest level. We can think of the interest level as the number of books in that genre we expect to sell in a given month; more interest implies more sales. To start, we'll need to introduce some independent and dependent variables, as we saw in the previous section. For this purpose, we will make use of the transformation operator.

First, let's add our independent variable: `discount`. We can achieve this by simply adding a new column using the expression transformation type. Independent variables, in order to be recognized as variables we want to substitute different values for, must be encased in dollar signs. The variable name is not too important; for our `discount` column, we will call our variable `\$disc\$`. Since we are directly setting the values of this variable, we don't need to use any formula to calculate it. Once the variable description is entered, click `apply`. Now that the transformation has been applied, there will be a new column in the output of our transformation operator; in this case, it is `discount`. To the top right, you can see that a new menu has popped up, called `what-if`. Since we used the What-if independent variable syntax in our expression, the transformation operator recognizes that this column represents a variable. Once we enter the menu, we will be able to define a range of values (i.e. scenarios) for this variable. In this case, since we are entering discount values, it makes sense to use a float range; let's consider discount values from 0 to 50%, in 5% increments. Next, we need to add our dependent variable: `profit`. To do so, we need some way to model how profit varies with respect to discounts; clearly, sales will increase as discounts get higher, but profit per book will correspondingly decrease. We need to find what discount balances these two factors as best as possible.

In practice, we may use a model derived from the AutoML operator, if, for example, we have book sales data from previous sales. However, for the purposes of demonstration, let's keep things simple by assuming that we have a simple formula for the amount of sales we expect for a given discount: the base interest, plus 5 books per percent of discount we offer. With this, we can easily calculate profit, add it as an expression in our transformation operator, and apply it: We now have an independent variable and a dependent variable in our transformation operator. What's left is to run this through the What-if operator, so that we can see how profit performs for various values of the discount we offer.

For the What-if operator, we need to specify a quantity we're interested in: in this case, we want to know what the total profit is, across all genres, given a certain discount. Therefore, we specify a collector in the operator menu, with `profit` and `sum` indicated. The output of the What-if gives us what we want. Here's a look: To get a clearer idea of what's going on, let's view this in a pivot chart: Now, things make sense. We expect around \$4000 in profit with no discount, and a maximum of around \$6000 profit with a roughly 15% discount. Since our margins aren't that wide, higher discounts quickly bring our profit down. At a 50% discount, where we are selling books for less than we paid for them (and quite a few of them, due to their low price), we are losing nearly \$9000!

Based on this graph, we can conclude that a 15% discount is probably best.

### Bookshop Tutorial: Extensions​

To extend this analysis, there are a number of directions we could go in. First, we could add more independent and dependent variables: for example, as mentioned before, we could add separate discounts for fiction and nonfiction books, or we could calculate the number of total book sales we make, based on the discount. Though this can yield more powerful and detailed analysis, visualization and interpretation could become difficult as the number of possible scenarios grows.

#### Using a Prediction Model​

As previously mentioned, we could also improve our model, if the number of sales we expect to make cannot easily be expressed in as simple of a form as we used earlier. If possible, using the AutoML operator could greatly improve our modeling capabilities.

Here is an example of how we could use the AutoML operator to find the best discount for the bookstore.

First, create a prediction model on past bookstore sales data, over various discounts. Pick the pipeline that seems best. Next, drag out the pipeline executor, and use the book data we have for our current book inventory as input. In this data, make sure to have the `discount` variable included. Note that the `profit` variable, which we can no longer directly calculate, is absent from the transformation operator: Now that we have a range of discounts from the transformation operator in our data, along with a model that can predict sales based on these discounts, we can calculate our expected profit and run What-if analysis: All that's left is to visualize the results: Based on this analysis, it seems that a discount of 20% will yield the most profit.