A way to compare the results of alternate models in order to give human executive oversight that prevents a single model’s biases from taking prominence
Healthcare・Visual Intelligence・Predictive Operations・Risk Management・Finance

Work In Progress

Our Elements guide is still in progress, and therefore lacks full visual and technical assets. We hope to release them by summer of 2020. Thanks for reading Lingua Franca!


A comparison is a simple methodology for creating visually interpretable groups of models. The crucial factor in this is disaggregation, where different kinds of input data points are each fed to their own model, rather than contributing to a single über-model. For example, in a credit scoring context, many products attempt to place all possible inputs (demographics, past spending, likeliness to default, insurance estimates, etc.) into one giant predictive system that outputs a result such as an interest rate for the customer. This giant system may work in a purely automated situation, but it can result in various biases and unpredictable consequences[1]. Instead, consider disaggregating these data points and placing each into its own model. Then, a human operator can visually inspect the results to make a comparison. This method still succeeds in cases where the operator doesn’t have certainty over how each individual model works. By allowing the operator to compare, they gain better intuitions about each model.


Data scientists usually seek to increase the accuracy of AI models by giving them as much context as possible. This can make the system astonishingly accurate when measured by the data scientists own internal metrics. However, each additional factor (i.e. column) added to the input data increases the number of non-linear correlations and possible causalities in the AI. In other words, the more detailed the input data, the more opaque the AI model.

Unfortunately, tools for disentanglement and factor analysis (often advocated to make sense of such opaque systems) only increase the complexity of the system overall by adding additional non-linearities[2]. Instead, we advocate for the use of comparisons to disaggregate rather than disentangle the AI system, which allows for human oversight and executive decision-making.


Any large, complex AI system can be easily convered into a comparison-based system. Primarily, this requires that multiple AIs are trained on subsets of the input data rather than all of it. These subsets should be made as human-interpretable as possible, either by grouping related factors or by segmenting users into different categories. These models can then be trained individually and presented in the user interface.


  1. Apple Card Investigated After Gender Discrimination Complaints by New York Times ↩︎

  2. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations by Locatello, et al. ↩︎