Data Day Texas
Originally launched in January 2011 as one of the first NoSQL / Big Data conferences, Data Day Texas each year highlights the latest tools, techniques, and projects in the data space, bringing speakers and attendees from around the world to enjoy the hospitality that is uniquely Austin. Since its inception, Data Day Texas has continually been the largest independent data-centric event held within 1000 miles of Texas.
Dr. Jans Aasman, CEO to present:
Neuro-Symbolic Story Extraction from Natural Language
The majority of the Knowledge Graphs we are building for and with customers contain massive amounts of unstructured data, usually in the form of informal natural language. Think of doctor’s notes in a medical chart , agent/customer conversations in a call center, and maintenance records for an aircraft. Yes, of course we do advanced taxonomy based entity extraction and relationship detection on these texts but that doesn’t even come close to what we really need, true Natural Language Understanding (NLU) that turns text into an understandable story represented as usable data in a knowledge graph.
But what is the state of NLU? A recent article from MIT Technology Review concludes that AI still doesn’t have the common sense to understand human language . Yes: transformer models like ChatGPT and GPT-3 do an amazing job of writing prose that resembles human writing in ways that dazzle naive users and newspaper journalists. But that is only the first impression, on closer look you will find that these models have many shortcomings. They don’t hold logical and consistent context over many paragraphs, they don’t have a mental model, memory, or a sense of meaning and they really don’t understand what the inputs and outputs mean. The famous author and cognitive scientist Douglas Hofstatter calls these transformer models basically cluelessly clueless.
So why do we even mention these models? Well: because they are incredibly useful in helping us to write rules for normalizing and reducing informal natural language, and even help us write rules that can turn natural language into collections of reified triples that represent stories.
This presentation will cover several examples in domains where we extracted understandable, explainable, and query-able stories from unstructured text. The extraction pipeline relies on several technologies but the two important pillars are the rules written by GPT-3, hand-edited by humans and detailed domain ontologies. The extracted stories are represented in knowledge graphs and can be queried for deeper domain insight and predictive analytics.