Claims

Capture claims about entities from sources

What are claims?

It's normal to encounter competing claims about entities while conducting research. This is especially true when conducting open-ended research which involves searching the internet for information, and incorporating data from external sources.

Sometimes this is because sources contain incorrect or outdated information. Other times, they simply phrase things differently. Normal large language models lack the ability to effectively integrate this information, factor in source publication dates, and judge source's reputations for veracity and accuracy.

How claims are used

As part of our efforts to build expert-level systems for AI inference, researchers in HASH no longer capture entity data directly from sources, but "claims" instead, which are then judged in a standalone step before being converted to entity data.

This allows gathered information to be checked and verified, and helps reconcile competing as well as overlapping information. Imagine stumbling across a series of documents, with subtly different claims. One claims that "Microsoft was founded in Albuquerque", while another reads "New Mexico". A third document (erroneously) claims "Microsoft was founded in Redmond", while a fourth (ambiguously) describes Microsoft as a "Washington company" (true in the sense that this is its current headquarters, but not reflective of its founding location). An ordinary AI agent would somewhat opaquely choose between these claims. HASH captures all four claims, along with contextual metadata which helps the AI assess the probability of each individually being correct, as well as each in light of the others. Many sources similarly making a specific claim will generally lend it weight, but better-placed "original" sources of information making claims may still override these.

Proprietary algorithm

In contrast to our normal principle of open-sourcing our technology, the exact process by which we generate "confidence" scores for data will remain private to ensure the mechanism cannot easily be abused, similar to how search engines like Google protect the confidentiality of their ranking algorithms in order to ensure their continued efficacy. At all times, however, the confidence scores generated are visible to users, and in the future we intend to allow individual users to "weight" the scoring process according to their own preferences.

You can view the individual claims observed as part of a research job by clicking the new "Claims" tab. In all cases, it's possible to see which claims were considered in light of the ultimate data populated by HASH's AI researchers, including where these were dismissed. In the case of Microsoft, our researcher combines the correct answers "Albuquerque" and "New Mexico" to produce the answer "Albuquerque, NM, USA", transforming the output to match our entity's schema (expected data type in the process.

Discover more updates

Create a free account

By signing up you agree to our terms and conditions and privacy policy