Los Alamos National Labs built a Big Data solution combining Hadoop and the AllegroGraph semantic graph platform to identify people, their social networks and connectedness across cultural and linguistic backgrounds.
Build a scalable Social Network Analysis application for processing terabytes of social data using a demonstration dataset of bibliographic metadata to resolve authors, co-authors, all their associated publications, and shared affiliations.
• A Big Data problem that cannot be solved with Hadoop alone
• Disambiguation of people’s names – for spelling variants, nick-names, misspellings, abbreviations
• Semi-structured data
• Scale to terabytes of content spanning multiple repositories and forms
• Uncover relationships not discoverable by traditional name matching
Los Alamos were able to achieve a 99% accuracy in identifying and disambiguating people across terabyte size data sets.