Confidence Scores—an introduction
Like humans, algorithms, machine learning, and AI sometimes make mistakes when predicting a value from an input data point for any individual model. But also like humans, most models are able to provide information about the reliability of these predictions.
When we’re asked, as humans, how confident we are in a statement we’re making, either in response to a query or as a direct statement, we actually assign a relative qualification to the confidence of our statement or prediction. For example, we might say “I’m confident that <statement>,” or “I’m pretty sure that <statement>,” or perhaps even “Maybe it’s <statement>,” - these allow for us, and the recipient of our statement, to make logical decisions about what they do next with the information provided.
In mathematics and data science, models can employ and provide the same type of information where a model provides a score of a result indicating its level of confidence that the returned value is the correct one. Confidence scores generally come in the form of a decimal or another expression, both of which have pros and cons. For instance:
A decimal number between 0 and 1, which can be interpreted as a percentage of confidence.
- Pros: easily interpretable for a human
- Cons: ambiguous with no clear indication of threshold between high and low scores
A set of expressions, such as {“very low”, “low”, “medium”, “very high”}
- Pros: easily actionable and simple to understand
- Cons: lack of clear boundaries, and no utility in mathematical expressions or math-base queries
This score then allows us to make an informed decision about what to do with a returned set of data.
Leadspace Confidence Scores
Overview
Once you’ve received a high-level understanding of Confidence scoring and what that means for data and decision-making, it’s important for us to then apply how those confidence scores relate to your data and Leadspace’s Confidence scoring.
Leadspace employs the Company Match Confidence Level to represent our indication of confidence for any particular input. To tie it back to what we learned earlier, this value represents how confident our model is that the result returned is the correct one based on the input (or record) provided to the model.
The Leadspace Confidence score provides an expression that indicates, for any given LS Company Profile, the likelihood that the record is the best match to the input record (Account, lead, etc.).
So, for example: if the input record for Leadspace’s enrichment algorithm is Amazon US, headquartered in Seattle, WA, then the level of confidence we attach to the enriched record indicates how sure we are that the returned, enriched record, is Amazon US, headquartered in Seattle, WA.
Methodology
To understand how Leadspace generates a confidence level, it’s critical to have insight into the methodology for calculating the bucket level. The confidence level bucket is based on an alignment between 3 parameters for higher accuracy:
- Leadspace AI model;
- Logic-based on string similarity of the input and output of the Company Name; and,
- Logic-based on string similarity of the input and output of the Company Website.
So, depending on the number of inputs provided on any individual Company Record, our algorithm will take into account the number of available inputs and provide a returned record, and confidence level, according to the logic provided above.
What scoring levels are available?
For any given record, Leadspace leverages an expression-based confidence level for your enrichment. Why? We believe that Confidence Scoring levels should be 1) easy to understand and 2) easily actionable - and as such, a numerical score is not the most efficient for your teams.
Leadspace Confidence Levels feature a bucket scale with a series of simple expressions, ranging from ‘Very Low” to “High."
Confidence Level Bucket |
High |
Medium |
|
Low |
|
Very Low |
Interpreting these values should be considered very straightforward, see below:
High |
Leadspace is over 95% confident that the company returned is the correct match to the input record. |
Medium |
Leadspace is 85% confident that the company returned is probably the correct match to the input record. |
Low |
Leadspace is 75% confident that the company returned is the correct match to the input record. |
Very Low |
Leadspace is 50% confident that the company returned is the correct match to the input record. |
What data does Confidence Level score for?
The Leadspace Confidence Score is only for a company. When enriching a person record, we return a company confidence bucket score, indicating how confident Leadspace matched the person to the correct company.
Confidence Score refers to ONLY the accuracy of the Company Match, not the individual data points. Therefore, it would be possible to have a Confidence Score of High but an incorrect Company Size value if we matched to the correct company but had bad Company Size data. (Unlikely but possible.)
Can Confidence Level be customized, what about thresholds?
Depending on the Use Case and your specific needs for the Confidence Level values in your systems, queries, or data health analyses, you may want to adjust which records are actually returned back to you from Leadspace.
For example, you may not want to include “Very Low” Confidence matches, as they’re likely to not be extremely accurate or precise, and therefore have low value in your systems.
Do note the following impacts on recall and precision in your data if you adjust the default thresholds:
- Increasing the threshold will lower the recall, and improve the precision. Why? The number of false positives decreases, but false negatives increase, which decreases recall and increases precision.
- Decreasing the threshold will do the opposite.
Available Customization
By default, we will enrich all records with a confidence level bucket but customers can work with their Customer Success Manager to customize the following:
- Change the minimum threshold to exclude score buckets (i.e. “Very Low” as an example) records from entering your system.
- Note: These records will have an LS Enrichment Status of “Not Enriched”
- Customize the bucket names
What will the field look like in my data?
The Leadspace confidence level field will be a text-based value, with a field name of LS Matching Confidence Level:
Object |
Field Name |
Field Type |
Account/Contact/Lead |
LS Matching Confidence Level |
Text |
Use Cases
There are numerous Use Cases where you can leverage the Leadspace Confidence level to drive successful programs and a better understanding and qualification of your data in your systems. Depending on the Use Case, you may want to utilize confidence scores differently.
A few use cases can be found below:
- Data Workflows
- In MAP / CRMs, you can remove Very Low confidence accounts from your campaign segmentation or workflows by adding a match rule where LS Matching Confidence Level is not ‘very low’
- Update Lead to Account Matching flows to remove “Very Low” records from performing a match
- Remove Very Low records from your Eloqua Database through a program canvas segmenting users where LS Matching Confidence Level is ‘very low’
- Strategic Processes
- In strategic processes that require high precision data, such as annual sales territory planning, prioritize accounts with a High or Medium match confidence, and then return to Low once initial planning has been completed
- Lead Form Filtering
- In filtering high volumes of leads coming in through web forms and other channels, where partial input information causes low accuracy