As data scientists we are often asked to find the main factors that cause a particular outcome. From business point of view predicting that outcome is important. But even more important is to determine the factors influence that outcome so that the business can control the outcome by modifying those “causing” factors; this is provided by a “causal model”. A predictive model does not provide the actionable insights; a causal model does.
Let us take a couple of industry examples. Say, a mobile network telecom company wants to determine the factors that positively and negatively affect their customer satisfaction so that they can minimize the churn. They might have huge amount of data with factors related to the technical aspect of their network and also the experience of each of the customers like number of dropped calls, call quality, average data speed. This can be merged with the same subscriber’s service calls, cause of the call, how many calls did it take to fix a complaint and satisfaction survey results. Very quickly the number of influencing factors can become more than 100. With traditional Machine Learning modeling of this data, one can predict if a certain subscriber is prone to leave for a competitor. But it does not tell how to reduce the churn. To do that, one needs to know what factors are primarily driving churn by using causal analysis. A data scientist with deep domain knowledge can prune the number of factors to a manageable number. But still it could be overwhelming for a manual determination of the causal relationships.
Another business example is when a brand does a survey to determine what factors caused a customer to buy the particular brand. Based on the causal analysis the brand owner can design their new products, also create advertisements and promotions focusing on the causal factors.
So, the question is how do you create a causal model? There are few mathematical tools like Bayesian Causal Inference and Structural Equation Model (SEM) that can help a data scientist in their pursuits of causal inference. However, they appear to be very similar and more importantly they are not efficient for practical usage. According to Judea Pearl, the godfather of causal analysis, in an answer to a question about the differences in these two methods,
“I would not dissuade people from using either causal Bayesian causal networks or structural equation models, because the difference between the two is so minute that it is not worth the dissuasion. The question is only what question you ask yourself when you construct the diagram. If you feel more comfortable asking: What factors determine the value of this variable” then you construct a structural equation model. If on the other hand you prefer to ask: “If I intervene and wiggle this variable, would the probability of the other variable change?” then the outcome would be a causal Bayes network.”
Among the existing tools and solutions from the technology leaders:
IBM SPSS uses a package like AMOS that lets data scientists use structural equation modeling (SEM) to test hypotheses on complex variable relationships.
Microsoft research is working on a library for causal analysis called DoWhy that uses “…Bayesian graphical model framework where users can specify what they know, and more importantly, what they don’t know, about the data-generating process.“
Similarly Google AI has created CausalImpact - an R package for causal inference using Bayesian structural time-series models.
The challenges with these robust mathematical tools are the following:
Need a-priori knowledge about the relationships of a range of factors to reduce overall computational complexity.
Computation time can take hours even for a modest number of factors to analyze, which makes it impractical.
Enters a brand-new company with a brand-new technology called Inguo. This Silicon Valley based startup was spun-off from NEC based on a technology developed in their lab. They have taken the mantra that first-year student of statistics learns – ‘Correlation is not causation’ to its logical extreme. The highlights of their technology are:
The analysis is based on the data only and does not need apriori knowledge about relationship of various variables. That means it can help both professional data scientists as well as the business owners of the data with very limited data relationship knowledge, to determine the causal relation among various variables.
One can find the top N (5, 10, 20…) factors that are causing a target variable in the data either directly or indirectly. In the business examples mentioned earlier, the target variable for the telecom company might be ‘customer satisfaction’ and in case of the brand owner it might be ‘customer purchased the brand’ or ‘Recommendability of the brand’.
Based on the existing data, it can create a model to predict the target variable of a new data just like many other machine learning models can do.
One can change the values of the top factors one at a time or in combination to see how that changes the value of the target variable. This “what if” analysis capability is a great tool for the business users to determine the optimal values of several factors for a certain business outcome.
Amazingly it can ingest up to 200 columns and 20,000 rows for their analysis and come with a causal directed acyclic graph (DAG) as a visual output in seconds. However, at this time it needs a little pre-processing of data to make it acceptable for the tool.
The whitepaper describing their method includes information about relevant heuristic algorithms like A* to reduce the search space in an efficient manner. That means the result generated by Inguo tool might not be globally optimum, but it will generate industrially acceptable result with potentially un-precedented impact on many business problems. I believe there is no such tool available that offers users an easy and fast way to see the causal relationship results of their own data. This category defining product is a first of its kind in this genre.
I strongly suggest readers of this blog to find more about this tool by watching their demo video and reading their whitepapers and case studies. Inguo is currently engaging with various businesses to provide insight to their varied data sets. That will also improve the Inguo tool by encountering various use cases. Contact me if you want to learn more about causal analysis and how it can help your business to get ROI from your data.