Creating Explainable AI With Rules

Franz’s CEO, Jans Aasman’s recent Forbes article:

There’s a fascinating dichotomy in artificial intelligence between statistics and rules, machine learning and expert systems. Newcomers to artificial intelligence (AI) regard machine learning as innately superior to brittle rules-based systems, while  the history of this field reveals both rules and probabilistic learning are integral components of AI.

This fact is perhaps nowhere truer than in establishing explainable AI, which is central to the long-term business value of AI front-office use cases.

Granted, simple machine learning can automate backend processes. However, the full extent of deep learning or complex neural networks — which are much more accurate than basic machine learning — for mission-critical decision-making and action requires explainability.

Using rules (and rules-based systems) to explicate machine learning results creates explainable AI. Many of the far-reaching applications of AI at the enterprise level — deploying it to combat financial crimes, to predict an individual’s immediate and long-term future in health care, for example — require explainable AI that’s fair, transparent and regulatory compliant.

Rules can explain machine learning results for these purposes and others.

Read the full article at Forbes




Adding Properties to Triples in AllegroGraph

AllegroGraph provides two ways to add metadata to triples. The first one is very similar to what typical property graph databases provide: we use the named graph of triples to store meta data about that triple. The second approach is what we have termed triple attributes. An attribute is a key/value pair associated with an individual triple. Each triple can have any number of attributes. This approach, which is built into AllegroGraph’s storage layer, is especially handy for security and bookkeeping purposes. Most of this article will discuss triple attributes but first we quickly discuss the named graph (i.e. fourth element or quad) approach.

1.0 The Named Graph for Properties

Semantic Graph Databases are actually defined by the W3C standard to store RDF as ‘Quads’ (Named Graph, Subject, Predicate, and Object).  The ‘Triple Store’ terminology has stuck even though the industry has moved on to storing quads.   We believe using the named graph approach to store metadata about triples is richer model that the property graph database method.

The best way to understand this is to give an example. Below we see two statements about Bruce weighing 105 kilos. The triple portions (subject, predicate, object) are identical but the named graphs (fourth elements) differ. They are used to provide additional information about the triples. The graph values are S1 and S2. By looking at these graphs we see that

  • The author of the first triple (with graph S1) is Sophia and the author of the second (with graph S2) is Bruce (who is also the subject of the two triples).
  • Sophia is 100% certain about her statement while Bruce is only 10% certain about his.

Using the named graph we can do even more than a property graph database, as the value of a graph can itself be a node, and is the subject of various triples which specify the original triple’s author, date, and certainty. Additional triples tell us the ages of the authors and the fact that the authors are married.

Here is the data displayed in Gruff, AllegroGraph’s associated triple store browser:

Using named graphs for a  triple’s metadata is a powerful tool but it does have limitations: (1) only one graph value can be associated with a triple, (2) it can be important that metadata is stored directly and physically with the triple (with named graphs, the actual metadata is usually stored in additional triples with the graph as the subject, as in the example above), and (3) named graphs have competing uses and may not be available for metadata.

2.0 The Triple Attributes approach

AllegroGraph uniquely offers a mechanism called triple attributes where a collection of user defined key/value pairs can be stored with each individual triple. The advantage of this approach is manyfold, but the original use case was designed for triple level security for an Intelligence agency.

By having triple attributes physically connected to the triples in the storage layer we can provide a very powerful and flexible mechanism to protect triples at the lowest possible level in AllegroGraph’s architecture. Our first example below shows this use case in great detail. Other use cases are for example to add weights or costs to triples, to be used in graph algorithms. Or we can add a recorded time or expiration times to a triple and use that to provide a time machine in AllegroGraph or do automatic clean-up of old data.

Example with Attributes:

      Subject – <http://dbpedia.org/resource/Arif_Babayev>
      Predicate – <http://dbpedia.org/property/placeOfDeath>
      Object – <http://dbpedia.org/resource/Baku>
      Named Graph – <http://ex#trans@@1142684573200001>
      Triple Attributes – {“securityLevel”: “high”, “department”: “hr”, “accessToken”: [“E”, “D”]}

This article provides an initial introduction to attributes and the associated concept of static filters, showing how they are set up and used. We start with a security example which also describes the basics of adding attributes to triples and filtering query results based on attribute values. Then we discuss other potential uses of attributes.

2.1 Triple Attribute Basics: a Security Example

One important purpose of attributes, when they were added as a feature, was to allow for very fine triple-level security, so that triples would be visible or invisible to users according to the attributes of the triples and the permissions associated with the query being posed by the user.

Note that users as such do not have attributes. Instead, attribute values are assigned when a query is posed. This is an important point: it is natural to think that there can be an attribute SECURITY-LEVEL, and a triple can have attribute SECURITY-LEVEL=3, and USER1 can have an attribute SECURITY-LEVEL=2 and USER2 can have an attribute SECURITY-LEVEL=4, and the system can require that the user SECURITY-LEVEL attribute must be greater than the triple SECURITY-LEVEL for the triple to be visible to the user. But that is not how attributes work. The triples can have the attribute SECURITY-LEVEL=2 but users do not have attributes. Instead, the filter is made part of the query.

Here is a simple example. We define attributes and static attribute filters using AGWebView. We have a repository named repo. Here is a portion of its AGWebView page:

The red arrow points to the commands of interest: Manage attribute definitions and Set static attribute filter. We click on Set static attribute filter to define an attribute. We have filled in the attribute information (name security-level, minimum and maximum number allowed per triple, allowed values, and whether order or not (yes in our case):

We click Save and the attribute is defined:

Then we define a filter (on the Set static attribute filter page):

We defined the filter (attribute-set> user.security-level triple.security-level) and clicked Save (the definition appears in both the Edit and the Current fields). The filter says that the “user” security level must be greater than the triple security level. We put “user” in quotes because the user security level is specified as part of the query, and has no direct connection to any specific user.

Here are some triples in a nqx file fr.nqx. The first triple has no attributes and the other three each has a security-level attribute value.

     <http://www.franz.com#emp0> <http://www.franz.com#position> “intern” .

     <http://www.franz.com#emp1> <http://www.franz.com#position> “worker” {“security-level”: “2”} .

     <http://www.franz.com#emp2> <http://www.franz.com#position> “manager” {“security-level”: “3”} .

     <http://www.franz.com#emp3> <http://www.franz.com#position> “boss” {“security-level”: “4”} .

We load this file into a repository which has the security-level attribute defined as above and the static filter mentioned above also defined. (Triples with attributes can also be entered directly when using AGWebView with the Import RDF from a text area input command).

Once the triples are loaded, we click View triples in AGWebView and we see no triples:

This result is often surprising to users just beginning to work with attributes and filters, who may expect the first triple, abbreviated to [emp0 position intern], to be visible, but the system is doing what it is supposed to do. It will only show triples where the security-level of the user posing the query is greater than the security level of the triple. The user has no security level and so the comparison fails, even with triples that have no security-level attribute value. We will describe below how to ensure you can see triples with no attributes.

So we need to specify an attribute value to the user posing the query. (As said above, users do not themselves have attribute values. But the attribute value of a user posing a query can be specified as part of the query.) “User” attributes are specified with a prefix like the following:

     prefix franzOption_userAttributes: <franz:%7B%22security-level%22%3A%223%22%7D>

so the query should be

     prefix franzOption_userAttributes: <franz:%7B%22security-level%22%3A%223%22%7D>

     select ?s ?p ?o { ?s ?p ?o . }

We will show the results below, but first what are all the % signs and numbers doing there? Why isn’t the prefix just prefix franzOption_userAttributes: <franz:{“security-level”:”3″}>? The issue is that {“security-level”:”3″} won’t read correctly. It must be URL encoded. We do this by going to https://www.urlencoder.org/ (there are other websites that do this as well) and put {“security-level”:”3″} in the first box, click Encode and get %7B%22security-level%22%3A%223%22%7D.  We then paste that into the query, as shown above.

When we try that query in AGWebView, we get one result:

If we encode {“security-level”:”5″} to get the query

prefix franzOption_userAttributes: <franz:%7B%22security-level%22%3A%225%22%7D>
select ?s ?p ?o { ?s ?p ?o . }

we get three results:

     emp3    position                “boss”
     emp2    position                “manager”
     emp1    position                “worker”

since now the “user” security-level is greater than that of any triples with a security-level attribute. But what about the triple with subject emp0, the triple with no attributes? It does not pass the filter which required that the user attribute be greater than the triple attribute. Since the triple has no attribute value so the comparison failed.

Let us redefine the filter to:

(or (attribute-set> user.security-level triple.security-level)
    (empty triple.security-level))

Now a triple will pass the filter if either (1) the “user” security-level is greater than the triple security-level or (2) the triple does not have a security-level attribute. Now the query from above where the user has attribute security-level:”5” will show all the triples with security-level less than 5 and with no attributes at all. That happens to be all four triples so far defined:

The triple

     emp0    position                “intern”

will now appears as a result in any query where it satisfies the SPARQL select regardless of the security-level of the “user”.

It would be a useful feature that we could associate attributes with actual users. However, this is not as simple as it sounds. Attributes are features of repositories. If I have a REPO1 repository, it can have a bunch of defined attributes and filters but my REPO2 may know nothing about them and its triples may not have any attributes at all, and no attributes are defined, and (as a result) no filters. But users are not repository-linked objects. While a repository can be made read-only or unreadable for a user, users do not have finer repository features. So an interface for providing users with attributes, since it would only make sense on a per-repository basis, requires a complicated interface. That is not yet implemented (though we are considering how it can be done).

Instead, users can have specific prefixes associated with them and that prefix and be included in any query made by the user.

But if all it takes to specify “user” attributes is to put the right line at the top of your SPARQL query, that does not seem to provide much security. There is a feature for users “Allow user attributes via SPARQL PREFIX franzOption_userAttributes” which can restrict a user’s ability to specify “user” attributes in a query, but that is a rather blunt instrument. Instead, the model is that most users (outside of trusted administrators) are not actually allowed to pose SPARQL queries directly. Instead, there is an intermediary program which takes the query a user requests and, having determined the status of the user and what attribute values should be given to the user, modifies the query with the appropriate franzOption_userAttributes prefixes and then sends the query on to the server, following which it captures the results and sends them back to the requesting user. That intermediate program will store the prefix suitable for a user and thus associate “user” attributes with specific users.

2.2 Using attributes as additional data

Although triple security is one powerful use of attributes, security is far from the only use. Just as the named graph can serve as additional data, so can attributes. SPARQL queries can use attribute values just as static filters can filter out triples before displaying them. Let us take a simple example: the attribute timeAdded. Every triple we add will have a timeAdded attribute value which will be a string whose contents are a datetime value, such as “2017-09-11T:15:52”. We define the attribute:

Now let us define some triples:

     <http://www.franz.com#emp0> <http://www.franz.com#callRank> “2” {“timeAdded”: “2019-01-12T10:12:45” } .
     <http://www.franz.com#emp0> <http://www.franz.com#callRank> “1” {“timeAdded”: “2019-01-14T14:16:12” } .
     <http://www.franz.com#emp0> <http://www.franz.com#callRank> “3” {“timeAdded”: “2019-01-11T11:15:52” } .
     <http://www.franz.com#emp1> <http://www.franz.com#callRank> “5” {“timeAdded”: “2019-01-13T11:03:22” } .
     <http://www.franz.com#emp0> <http://www.franz.com#callRank> “2” {“timeAdded”: “2019-01-13T09:03:22” } .

 

We have a call center with employees making calls. Each call has a ranking from 1 to 5, with 1 the lowest and 5 the highest. We have data on five calls, four from emp0 and one from emp1. Each triples has a timeAdded attribute with a string containing a dateTime value. We load these into a empty repository named at-test where the timeAdded attribute is defined as above:

 

SPARQL queries can use the attribute magic properties (see https://franz.com/agraph/support/documentation/current/triple-attributes.html#Querying-Attributes-using-SPARQL). We use the attributesNameValue magic property to see the subject, object, and attribute value:

     select ?s ?o ?value { 
       (?ta ?value) <http://franz.com/ns/allegrograph/6.2.0/attributesNameValue>    (?s ?p ?o) . 
     }

But we are really interested just in emp0 and we would like to see the results ordered by time, that is by the attribute value, so we restrict the query to emp0 as the subject and order the results:

     select ?o ?value { 
       (?ta ?value) <http://franz.com/ns/allegrograph/6.2.0/attributesNameValue>    (<http://www.franz.com#emp0> ?p ?o) . 
     }  order by ?value

There are the results for emp0, who is clearly having difficulties because the call rankings have been steadily falling over time.

Another example using timeAdded is employee salary data. In the Human Resources data, the salary of an employee is stored:

      emp0 hasSalary 50000

Now emp0 gets a raise to 55000. So we delete the triple above and add the triple

      emp0 hasSalary 55000

But that is not satisfactory because we have lost the salary history. If the boss asks “How much was emp0 paid initially?” we cannot answer. There are various solutions. We could define a salary change object, with predicates effectiveDate, previousSalary, newSalary, and so on:

     salaryChange017 forEmployee emp0
     salaryChange017 effectiveDate “2019-01-12T10:12:45”
     salaryChange017 oldSalary “50000”
     salaryChange017 newSalary “55000”

     emp0 hasSalaryChange salaryChange017

and that would work fine, but perhaps it is more setup and effort than is needed. Suppose we just have hasSalary triples each with a timeAdded attribute. Then the current salary is the latest one and the history is the ordered list. Here that idea is worked out:

<http://www.franz.com#emp0> <http://www.franz.com#hasSalary> “50000”^^<http://www.w3.org/2001/XMLSchema#integer> {“timeAdded”: “2017-01-12T10:12:45” } .
<http://www.franz.com#emp0> <http://www.franz.com#hasSalary> “55000”^^<http://www.w3.org/2001/XMLSchema#integer> {“timeAdded”: “2019-03-17T12:00:00” } .

What is the current salary? A simple SPARQL query tells us:

      select ?o ?value { 
       (?ta ?value) <http://franz.com/ns/allegrograph/6.2.0/attributesNameValue>  
                       (<http://www.franz.com#emp0> <http://www.franz.com#hasSalary> ?o) . 
        }  order by desc(?value) limit 1

 

The salary history is provided by the same query without the LIMIT:

     select ?o ?value { 
       (?ta ?value) <http://franz.com/ns/allegrograph/6.2.0/attributesNameValue>   
                      (<http://www.franz.com#emp0> <http://www.franz.com#hasSalary> ?o) . 
        }  order by desc(?value)

 

This method of storing salary data may not easily support more complex questions which might be easily answered if we went the salaryChange object route mentioned above but if you are not looking to ask those questions, you should not do the extra work (and the risk of data errors) required.

You could use the graph of each triple for the timeAdded. All the examples above would work with minor tweaks. But there are many uses for the named graph of a triple. Attributes are available and using them for one purpose does not restrict their use for other purposes.

 




Earth Day – Franz Inc. and Geoscience Experts Recognize the Growing Importance of Semantic Knowledge Graphs for Earth Science

Semantically Linking Earth Observation Data Makes it FAIR for the Global Community of Geoscientists

In celebration of Earth Day, Franz Inc., an early innovator in Artificial Intelligence (AI) and leading supplier of Semantic Graph Database technology for Knowledge Graphs, today recognized how AllegroGraph, its semantic knowledge graph technology, is playing an essential part in making data FAIR (Findable, Accessible, Interoperable and Reusable) for the geoscience community. Since the current understanding of earth science processes is largely based on earth observation and numerical model data, making this data FAIR for all geoscientists and technologists is critical to facilitate future knowledge discovery about planet Earth.

Collecting, storing, monitoring and
analyzing data from the core of the Earth up to the atmosphere provides
critical knowledge about the planet and how living things interact with it. Scientists
and technologists gather information about Earth from a range of sources,
including:  satellites, air- ground- and ocean-based
sensors, physical sample data, etc., which are all recorded at a variety of
temporal and spatial resolutions and need to be represented on the web for the global
scientific community to access and use. AllegroGraph’s unique semantic graph
capabilities allow diverse and complex data sources to be easily integrated
with full search and cross-dataset queries possible.  

“Our
most pressing global environmental challenges cannot be solved by a single
organization,” said Dr. Annie Burgess,
Lab Director, Earth Science Information Partners (ESIP)
. “Scientists require
data collected across multiple disciplines, which are often managed by many
different agencies and institutions. ESIP is a community of data and information
technology professionals dedicated to ensuring those data are FAIR. To assist
with that goal, the unique semantic graph capabilities of AllegroGraph are
leveraged with the ESIP Community Ontology Repository, a community platform to
manage and exchange terms and vocabularies that assists scientists to publish,
discover and reuse data.”

“To
address important marine research, there is a critical need for ocean
observatories to share data in a way that is easy to discover, use and
integrate,” said Carlos Rueda, Senior
Software Engineer, Monterey Bay Aquarium Research Institute
. “With this goal in
mind, the Marine Metadata
Interoperability Project
developed the MMI Ontology Registry
and Repository

(ORR), which leverages AllegroGraph to provide powerful interoperable semantic
services that make the content on the web interconnected in a meaningful way
for both humans and machines to consume.”

“We
are at an exciting stage where there is a critical mass of experts and
organizations around the globe with similar goals as well as the realization
that we need knowledge-intensive applications,” said Dr. Lewis McGibbney, Data
Scientist, Jet Propulsion Laboratory, California Institute of Technology
and Co-Chair of the NASA
ESDSWG Search Relevance Working Group
. “The semantic technology stack is a crucial
piece for building intelligent apps for knowledge-intensive use cases within
the geoscience area.”

“Semantic graph technology is particularly
well-suited to address the complex data integration, data access and analysis challenges
surrounding Earth data science,” said Dr. Jans Aasman, CEO of Franz Inc. “We are thrilled that leading
geoscience organizations are tapping into the power of AllegroGraph to share Earth
science ontologies and data. We look forward to continuing to work with the
community and help forward their important projects.”

A recent Gartner report explains the
importance of using semantic technology to drive value out of data and included
AllegroGraph as a graph database to consider for semantic technology solutions.
“Unprecedented levels of data scale and distribution are making it almost
impossible for organizations to effectively exploit their data assets. Data and
analytics leaders must adopt a semantic approach to their enterprise data
assets or face losing the battle for competitive advantage.” (Source: Gartner, How to Use Semantics to Drive the
Business Value of Your Data, Guido De Simoni, November 27, 2018.)
To view a
summary of the report, go to https://www.gartner.com/doc/3894095/use-semantics-drive-business-value.

About ESIP
The Earth Science Information
Partners (ESIP)
is a community of innovative science, data and information technology
practitioners. ESIP members catalyze connections across traditional
institutional and domain boundaries to solve critical Earth science data
stewardship, information technology and interoperability issues. Through this
work, ESIP improves Earth science data management practices and makes Earth
science data more discoverable, accessible and useful to researchers, policy
makers and the public. Learn more at esipfed.org or follow @ESIPfed on Twitter.

About Monterey
Bay Aquarium Research Institute

Monterey Bay Aquarium Research Institute (MBARI) encompass the
entire ocean, from the surface waters to the dee seafloor, and from the coastal
zone to the open sea. The need to understand the ocean in all its complexity
and variability drives MBARI’s research and development efforts.

About JPL

The Jet Propulsion Laboratory  is a unique national research facility that
carries out robotic space and Earth science missions. JPL helped open the Space
Age by developing America’s first Earth-orbiting science satellite, creating
the first successful interplanetary spacecraft, and sending robotic missions to
study all the planets in the solar system as well as asteroids, comets and
Earth’s moon. In addition to its missions, JPL developed and manages NASA’s
Deep Space Network, a worldwide system of antennas that communicates with
interplanetary spacecraft. JPL is a federally funded research and development
center managed for NASA by Caltech. From the long history of leaders drawn from
the university’s faculty to joint programs and appointments, JPL’s intellectual
environment and identity are profoundly shaped by its role as part of Caltech.

About AllegroGraph

AllegroGraph is a database technology that enables businesses to
extract sophisticated decision insights and predictive analytics from highly
complex, distributed data that cannot be uncovered with conventional databases.
Unlike traditional relational databases or other NoSQL databases, AllegroGraph
employs semantic graph technologies that process data with contextual and
conceptual intelligence. AllegroGraph is able run queries of unprecedented
complexity to support predictive analytics that help organizations make more
informed, real-time decisions. AllegroGraph is utilized by dozens of the top
F500 companies worldwide

Semantic Knowledge Graphs are the
Foundation for Artificial Intelligence

The foundation for Knowledge Graphs
and AI lies in the facets of semantic technology provided by AllegroGraph.
Semantic Graph databases provide the core technology environment to enrich and
contextualized the understanding of data. The ability to rapidly integrate new
knowledge is the crux of the Knowledge Graph and depends entirely on semantic
technologies.  

About Franz Inc.

Franz Inc. is an early innovator in
Artificial Intelligence (AI) and leading supplier of Semantic Graph Database
technology with expert knowledge in developing and deploying Knowledge Graph solutions.
The foundation for Knowledge Graphs and AI lies in the facets of semantic
technology provided by AllegroGraph and Allegro CL.  The ability to
rapidly integrate new knowledge is the crux of the Knowledge Graph and Franz
Inc. provides the key technologies and services to address your complex
challenges.  Franz Inc. is your Knowledge
Graph technology partner.

All trademarks and registered
trademarks in this document are the properties of their respective owners.




Webcast – Speech Recognition, Knowledge Graphs, and AI for Intelligent Customer Operations – April 3, 2019

Presenters – Burt Smith, N3 Results and Jans Aasman, Franz Inc.

In the typical sales organization the contents of the actual chat or voice conversation between agent and customer is a black hole. In the modern Intelligent Customer Operations center (e.g. N3 Results – www.n3results.com) the interactions between agent and customer are a source of rich information that helps agents to improve the quality of the interaction in real time, creates more sales, and provides far better analytics for management.

Join us for this Webinar where we describe a real world Intelligent Customer Operations center that uses graph based technology for taxonomy driven entity extraction, speech recognition, machine learning and predictive analytics to improve quality of conversations, increase sales and improve business visibility.

View the recorded webinar.




Using JSON-LD in AllegroGraph – Python Example

The following is example #19 from our AllegroGraph Python Tutorial.

JSON-LD is described pretty well at https://json-ld.org/ and the specification can be found at https://json-ld.org/latest/json-ld/ .

The website https://json-ld.org/playground/ is also useful.

There are many reasons for working with JSON-LD. The major search engines such as Google require ecommerce companies to mark up their websites with a systematic description of their products and more and more companies use it as an easy serialization format to share data.

The benefit for your organization is that you can now combine your documents with graphs, graph search and graph algorithms. Normally when you store documents in a document store you set up your documents in such a way that it is optimized for direct retrieval queries. Doing complex joins for multiple types of documents or even doing a shortest path through a mass of object (types) is however very complicated. Storing JSON-LD objects in AllegroGraph gives you all the benefits of a document store and you can semantically link objects together, do complex joins and even graph search.

A second benefit is that, as an application developer, you do not have to learn the entire semantic technology stack, especially the part where developers have to create individual triples or edges. You can work with the JSON data serialization format that application developers usually prefer.

In the following you will first learn about JSON-LD as a syntax for semantic graphs. After that we will talk more about using JSON-LD with AllegroGraph as a document-graph-store.

Setup

You can use Python 2.6+ or Python 3.3+. There are small setup differences which are noted. You do need agraph-python-101.0.1 or later.

Mimicking instructions in the Installation document, you should set up the virtualenv environment.

  1. Create an environment named jsonld:
python3 -m venv jsonld

or

python2  -m virtualenv jsonld

  1. Activate it:

Using the Bash shell:

source jsonld/bin/activate

Using the C shell:

source jsonld/bin/activate.csh
  1. Install agraph-python:
pip install agraph-python

And start python:

python
[various startup and copyright messages]
>>>

We assume you have an AllegroGraph 6.5.0 server running. We call ag_connect. Modify the hostportuser, and password in your call to their correct values:

from franz.openrdf.connect import ag_connect
with ag_connect('repo', host='localhost', port='10035',
                user='test', password='xyzzy') as conn:
    print (conn.size())

If the script runs successfully a new repository named repo will be created.

JSON-LD setup

We next define some utility functions which are somewhat different from what we have used before in order to work better with JSON-LD. createdb() creates and opens a new repository and opendb() opens an existing repo (modify the values of hostportuser, and password arguments in the definitions if necessary). Both return repository connections which can be used to perform repository operations. showtriples() displays triples in a repository.

import os
import json, requests, copy

from franz.openrdf.sail.allegrographserver import AllegroGraphServer
from franz.openrdf.connect import ag_connect
from franz.openrdf.vocabulary.xmlschema import XMLSchema
from franz.openrdf.rio.rdfformat import RDFFormat

# Functions to create/open a repo and return a RepositoryConnection
# Modify the values of HOST, PORT, USER, and PASSWORD if necessary

def createdb(name):
    return ag_connect(name,host="localhost",port=10035,user="test",password="xyzzy",create=True,clear=True)

def opendb(name):
    return ag_connect(name,host="localhost",port=10035,user="test",password="xyzzy",create=False)

def showtriples(limit=100):
    statements = conn.getStatements(limit=limit)
    with statements:
        for statement in statements:
             print(statement)

Finally we call our createdb function to create a repository and return a RepositoryConnection to it:

conn=createdb('jsonplay')

Some Examples of Using JSON-LD

In the following we try things out with some JSON-LD objects that are defined in json-ld playground: jsonld

The first object we will create is an event dict. Although it is a Python dict, it is also valid JSON notation. (But note that not all Python dictionaries are valid JSON. For example, JSON uses null where Python would use None and there is no magic to automatically handle that.) This object has one key called @context which specifies how to translate keys and values into predicates and objects. The following @context says that every time you see ical: it should be replaced by http://www.w3.org/2002/12/cal/ical#xsd: by http://www.w3.org/2001/XMLSchema#, and that if you see ical:dtstart as a key than the value should be treated as an xsd:dateTime.

event = {
  "@context": {
    "ical": "http://www.w3.org/2002/12/cal/ical#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "ical:dtstart": { "@type": "xsd:dateTime" }
      },
    "ical:summary": "Lady Gaga Concert",
    "ical:location": "New Orleans Arena, New Orleans, Louisiana, USA",
    "ical:dtstart": "2011-04-09T20:00:00Z"
}

Let us try it out (the subjects are blank nodes so you will see different values):

>>> conn.addData(event)
>>> showtriples()
(_:b197D2E01x1, <http://www.w3.org/2002/12/cal/ical#summary>, "Lady Gaga Concert")
(_:b197D2E01x1, <http://www.w3.org/2002/12/cal/ical#location>, "New Orleans Arena, New Orleans, Louisiana, USA")
(_:b197D2E01x1, <http://www.w3.org/2002/12/cal/ical#dtstart>, "2011-04-09T20:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>)

Adding an @id and @type to Objects

In the above we see that the JSON-LD was correctly translated into triples but there are two immediate problems: first each subject is a blank node, the use of which is problematic when linking across repositories; and second, the object does not have an RDF type. We solve these problems by adding an @id to provide an IRI as the subject and adding a @type for the object (those are at the lines just after the @context definition):

>>> event = {
  "@context": {
      "ical": "http://www.w3.org/2002/12/cal/ical#",
      "xsd": "http://www.w3.org/2001/XMLSchema#",
      "ical:dtstart": { "@type": "xsd:dateTime" }
        },
      "@id": "ical:event-1",
      "@type": "ical:Event",
      "ical:summary": "Lady Gaga Concert",
      "ical:location": "New Orleans Arena, New Orleans, Louisiana, USA",
      "ical:dtstart": "2011-04-09T20:00:00Z"
  }

We also create a test function to test our JSON-LD objects. It is more powerful than needed right now (here we just need conn,addData(event) and showTriples() but test will be useful in most later examples. Note the allow_external_references=True argument to addData(). Again, not needed in this example but later examples use external contexts and so this argument is required for those.

def test(object,json_ld_context=None,rdf_context=None,maxPrint=100,conn=conn):
    conn.clear()
    conn.addData(object, allow_external_references=True)
    showtriples(limit=maxPrint)
>>> test(event)
(<http://www.w3.org/2002/12/cal/ical#event-1>, <http://www.w3.org/2002/12/cal/ical#summary>, "Lady Gaga Concert")
(<http://www.w3.org/2002/12/cal/ical#event-1>, <http://www.w3.org/2002/12/cal/ical#location>, "New Orleans Arena, New Orleans, Louisiana, USA")
(<http://www.w3.org/2002/12/cal/ical#event-1>, <http://www.w3.org/2002/12/cal/ical#dtstart>, "2011-04-09T20:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
(<http://www.w3.org/2002/12/cal/ical#event-1>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://www.w3.org/2002/12/cal/ical#Event>)

Note in the above that we now have a proper subject and a type.

Referencing a External Context Via a URL

The next object we add to AllegroGraph is a person object. This time the @context is not specified as a JSON object but as a link to a context that is stored at http://schema.org/. Also in the definition of the function test above we had this parameter in addData:allow_external_references=True. Requiring that argument explicitly is a security feature. One should use external references only that context at that URL is trusted (as it is in this case).

person = {
  "@context": "http://schema.org/",
  "@type": "Person",
  "@id": "foaf:person-1",
  "name": "Jane Doe",
  "jobTitle": "Professor",
  "telephone": "(425) 123-4567",
  "url": "http://www.janedoe.com"
}
>>> test(person)
(<http://xmlns.com/foaf/0.1/person-1>, <http://schema.org/name>, "Jane Doe")
(<http://xmlns.com/foaf/0.1/person-1>, <http://schema.org/jobTitle>, "Professor")
(<http://xmlns.com/foaf/0.1/person-1>, <http://schema.org/telephone>, "(425) 123-4567")
(<http://xmlns.com/foaf/0.1/person-1>, <http://schema.org/url>, <http://www.janedoe.com>)
(<http://xmlns.com/foaf/0.1/person-1>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://schema.org/Person>)

Improving Performance by Adding Lists

Adding one person at a time requires doing an interaction with the server for each person. It is much more efficient to add lists of objects all at once rather than one at a time. Note that addData will take a list of dicts and still do the right thing. So let us add a 1000 persons at the same time, each person being a copy of the above person but with a different @id. (The example code is repeated below for ease of copying.)

>>> x = [copy.deepcopy(person) for i in range(1000)]
>>> len(x)
1000
>>> c = 0
>>> for el in x:
    el['@id']= "http://franz.com/person-" + str(c)
    c= c + 1
>>> test(x,maxPrint=10)
(<http://franz.com/person-0>, <http://schema.org/name>, "Jane Doe")
(<http://franz.com/person-0>, <http://schema.org/jobTitle>, "Professor")
(<http://franz.com/person-0>, <http://schema.org/telephone>, "(425) 123-4567")
(<http://franz.com/person-0>, <http://schema.org/url>, <http://www.janedoe.com>)
(<http://franz.com/person-0>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://schema.org/Person>)
(<http://franz.com/person-1>, <http://schema.org/name>, "Jane Doe")
(<http://franz.com/person-1>, <http://schema.org/jobTitle>, "Professor")
(<http://franz.com/person-1>, <http://schema.org/telephone>, "(425) 123-4567")
(<http://franz.com/person-1>, <http://schema.org/url>, <http://www.janedoe.com>)
(<http://franz.com/person-1>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://schema.org/Person>)
>>> conn.size()
5000
>>>
x = [copy.deepcopy(person) for i in range(1000)]
len(x)

c = 0
for el in x:
    el['@id']= "http://franz.com/person-" + str(c)
    c= c + 1

test(x,maxPrint=10)

conn.size()

Adding a Context Directly to an Object

You can download a context directly in Python, modify it and then add it to the object you want to store. As an illustration we load a person context from json-ld.org (actually a fragment of the schema.org context) and insert it in a person object. (We have broken and truncated some output lines for clarity and all the code executed is repeated below for ease of copying.)

>>> context=requests.get("https://json-ld.org/contexts/person.jsonld").json()['@context']
>>> context
{'Person': 'http://xmlns.com/foaf/0.1/Person',
 'xsd': 'http://www.w3.org/2001/XMLSchema#',
 'name': 'http://xmlns.com/foaf/0.1/name',
 'jobTitle': 'http://xmlns.com/foaf/0.1/title',
 'telephone': 'http://schema.org/telephone',
 'nickname': 'http://xmlns.com/foaf/0.1/nick',
 'affiliation': 'http://schema.org/affiliation',
 'depiction': {'@id': 'http://xmlns.com/foaf/0.1/depiction', '@type': '@id'},
 'image': {'@id': 'http://xmlns.com/foaf/0.1/img', '@type': '@id'},
 'born': {'@id': 'http://schema.org/birthDate', '@type': 'xsd:date'},
 ...}
>>> person = {
  "@context": context,
  "@type": "Person",
  "@id": "foaf:person-1",
  "name": "Jane Doe",
  "jobTitle": "Professor",
  "telephone": "(425) 123-4567",
}
>>> test(person)
(<http://xmlns.com/foaf/0.1/person-1>, <http://xmlns.com/foaf/0.1/name>, "Jane Doe")
(<http://xmlns.com/foaf/0.1/person-1>, <http://xmlns.com/foaf/0.1/title>, "Professor")
(<http://xmlns.com/foaf/0.1/person-1>, <http://schema.org/telephone>, "(425) 123-4567")
(<http://xmlns.com/foaf/0.1/person-1>,
 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,
 <http://xmlns.com/foaf/0.1/Person>)
>>>
context=requests.get("https://json-ld.org/contexts/person.jsonld").json()['@context']
# The next produces lots of output, uncomment if desired
#context

person = {
  "@context": context,
  "@type": "Person",
  "@id": "foaf:person-1",
  "name": "Jane Doe",
  "jobTitle": "Professor",
  "telephone": "(425) 123-4567",
}
test(person)

Building a Graph of Objects

We start by forcing a key’s value to be stored as a resource. We saw above that we could specify the value of a key to be a date using the xsd:dateTime specification. We now do it again for foaf:birthdate. Then we created several linked objects and show the connections using Gruff.

context = { "foaf:child": {"@type":"@id"},
            "foaf:brotherOf": {"@type":"@id"},
            "foaf:birthdate": {"@type":"xsd:dateTime"}}

p1 = {
    "@context": context,
    "@type":"foaf:Person",
    "@id":"foaf:person-1",
    "foaf:birthdate": "1958-04-09T20:00:00Z",
    "foaf:child": ['foaf:person-2', 'foaf:person-3']
}

p2 = {
    "@context": context,
    "@type":"foaf:Person",
    "@id":"foaf:person-2",
    "foaf:brotherOf": "foaf:person-3",
    "foaf:birthdate": "1992-04-09T20:00:00Z",
}

p3 = {"@context": context,
    "@type":"foaf:Person",
    "@id":"foaf:person-3",
    "foaf:birthdate": "1994-04-09T20:00:00Z",
}

test([p1,p2,p3])
>>> test([p1,p2,p3])
(<http://xmlns.com/foaf/0.1/person-1>, <http://xmlns.com/foaf/0.1/birthdate>, "1958-04-09T20:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
(<http://xmlns.com/foaf/0.1/person-1>, <http://xmlns.com/foaf/0.1/child>, <http://xmlns.com/foaf/0.1/person-2>)
(<http://xmlns.com/foaf/0.1/person-1>, <http://xmlns.com/foaf/0.1/child>, <http://xmlns.com/foaf/0.1/person-3>)
(<http://xmlns.com/foaf/0.1/person-1>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://xmlns.com/foaf/0.1/Person>)
(<http://xmlns.com/foaf/0.1/person-2>, <http://xmlns.com/foaf/0.1/brotherOf>, <http://xmlns.com/foaf/0.1/person-3>)
(<http://xmlns.com/foaf/0.1/person-2>, <http://xmlns.com/foaf/0.1/birthdate>, "1992-04-09T20:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
(<http://xmlns.com/foaf/0.1/person-2>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://xmlns.com/foaf/0.1/Person>)
(<http://xmlns.com/foaf/0.1/person-3>, <http://xmlns.com/foaf/0.1/birthdate>, "1994-04-09T20:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
(<http://xmlns.com/foaf/0.1/person-3>, <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>, <http://xmlns.com/foaf/0.1/Person>)

The following shows the graph that we created in Gruff. Note that this is what JSON-LD is all about: connecting objects together.

img-person-graph

JSON-LD Keyword Directives can be Added at any Level

Here is an example from the wild. The URL https://www.ulta.com/antioxidant-facial-oil?productId=xlsImpprod18731241 goes to a web page advertising a facial oil. (We make no claims or recommendations about this product. We are simply showing how JSON-LD appears in many places.) Look at the source of the page and you’ll find a JSON-LD object similar to the following. Note that @ directives go to any level. We added an @id key.

hippieoil = {"@context":"http://schema.org",
 "@type":"Product",
 "@id":"http://franz.com/hippieoil",
 "aggregateRating":
    {"@type":"AggregateRating",
     "ratingValue":4.6,
     "reviewCount":73},
     "description":"""Make peace with your inner hippie while hydrating & protecting against photoaging....Mad Hippie's preservative-free Antioxidant Facial Oil is truly the most natural way to moisturize.""",
     "brand":"Mad Hippie",
     "name":"Antioxidant Facial Oil",
     "image":"https://images.ulta.com/is/image/Ulta/2530018",
     "productID":"2530018",
     "offers":
        {"@type":"Offer",
         "availability":"http://schema.org/InStock",
         "price":"24.99",
         "priceCurrency":"USD"}}


test(hippieoil)

img-hippieoil

JSON-LD @graphs

One can put one or more JSON-LD objects in an RDF named graph. This means that the fourth element of each triple generated from a JSON-LD object will have the specified graph name. Let’s show in an example.

context = {
        "name": "http://schema.org/name",
        "description": "http://schema.org/description",
        "image": {
            "@id": "http://schema.org/image", "@type": "@id" },
        "geo": "http://schema.org/geo",
        "latitude": {
            "@id": "http://schema.org/latitude", "@type": "xsd:float" },
        "longitude": {
            "@id": "http://schema.org/longitude",  "@type": "xsd:float" },
        "xsd": "http://www.w3.org/2001/XMLSchema#"
    }

place = {
    "@context": context,
    "@id": "http://franz.com/place1",
    "@graph": {
        "@id": "http://franz.com/place1",
        "@type": "http://franz.com/Place",
        "name": "The Empire State Building",
        "description": "The Empire State Building is a 102-story landmark in New York City.",
        "image": "http://www.civil.usherbrooke.ca/cours/gci215a/empire-state-building.jpg",
        "geo": {
               "latitude": "40.75",
               "longitude": "73.98" }
        }}

and here is the result:

>>> test(place, maxPrint=3)
(<http://franz.com/place1>, <http://schema.org/name>, "The Empire State Building", <http://franz.com/place1>)
(<http://franz.com/place1>, <http://schema.org/description>, "The Empire State Building is a 102-story landmark in New York City.", <http://franz.com/place1>)
(<http://franz.com/place1>, <http://schema.org/image>, <http://www.civil.usherbrooke.ca/cours/gci215a/empire-state-building.jpg>, <http://franz.com/place1>)
>>>

Note that the fourth element (graph) of each of the triples is <http://franz.com/place1>. If you don’t add the @id the triples will be put in the default graph.

Here a slightly more complex example:

library = {
  "@context": {
    "dc": "http://purl.org/dc/elements/1.1/",
    "ex": "http://example.org/vocab#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "ex:contains": {
      "@type": "@id"
    }
  },
  "@id": "http://franz.com/mygraph1",
  "@graph": [
    {
      "@id": "http://example.org/library",
      "@type": "ex:Library",
      "ex:contains": "http://example.org/library/the-republic"
    },
    {
      "@id": "http://example.org/library/the-republic",
      "@type": "ex:Book",
      "dc:creator": "Plato",
      "dc:title": "The Republic",
      "ex:contains": "http://example.org/library/the-republic#introduction"
    },
    {
      "@id": "http://example.org/library/the-republic#introduction",
      "@type": "ex:Chapter",
      "dc:description": "An introductory chapter on The Republic.",
      "dc:title": "The Introduction"
    }
  ]
}

With the result:

>>> test(library, maxPrint=3)
(<http://example.org/library>, <http://example.org/vocab#contains>,
<http://example.org/library/the-republic>,
<http://franz.com/mygraph1>) (<http://example.org/library>,
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>,
<http://example.org/vocab#Library>, <http://franz.com/mygraph1>)
(<http://example.org/library/the-republic>,
<http://purl.org/dc/elements/1.1/creator>, "Plato",<http://franz.com/mygraph1>)
>>>

img-library-graph

JSON-LD as a Document Store

So far we have treated JSON-LD as a syntax to create triples. Now let us look at the way we can start using AllegroGraph as a combination of a document store and graph database at the same time. And also keep in mind that we want to do it in such a way that you as a Python developer can add documents such as dictionaries and also retrieve values or documents as dictionaries.

Setup

The Python source file jsonld_tutorial_helper.py contains various definitions useful for the remainder of this example. Once it is downloaded, do the following (after adding the path to the filename):

conn=createdb("docugraph")
from jsonld_tutorial_helper import *
addNamespace(conn,"jsonldmeta","http://franz.com/ns/allegrograph/6.4/load-meta#")
addNamespace(conn,"ical","http://www.w3.org/2002/12/cal/ical#")

Let’s use our event structure again and see how we can store this JSON document in the store as a document. Note that the addData call includes the keyword: json_ld_store_source=True.

event = {
  "@context": {
    "@id": "ical:event1",
    "@type": "ical:Event",
    "ical": "http://www.w3.org/2002/12/cal/ical#",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "ical:dtstart": { "@type": "xsd:dateTime" }
      },
    "ical:summary": "Lady Gaga Concert",
    "ical:location":
    "New Orleans Arena, New Orleans, Louisiana, USA",
    "ical:dtstart": "2011-04-09T20:00:00Z"
}
>>> conn.addData(event, allow_external_references=True,json_ld_store_source=True)

The jsonld_tutorial_helper.py file defines the function store as simple wrapper around addDatathat always saves the JSON source. For experimentation reasons it also has a parameter fresh to clear out the repository first.

>>> store(conn,event, fresh=True)

If we look at the triples in Gruff we see that the JSON source is stored as well, on the root (top-level @id) of the JSON object.

img-event-store-source

For the following part of the tutorial we want a little bit more data in our repository so please look at the helper file jsonld_tutorial_helper.py where you will see that at the end we have a dictionary named obs with about 9 diverse objects, mostly borrowed from the json-ld.org site: a person, an event, a place, a recipe, a group of persons, a product, and our hippieoil.

First let us store all the objects in a fresh repository. Then we check the size of the repo. Finally, we create a freetext index for the JSON sources.

>>> store(conn,[v for k,v in obs.items()], fresh=True)
>>> conn.size()
86
>>> conn.createFreeTextIndex("source",['<http://franz.com/ns/allegrograph/6.4/load-meta#source>'])
>>>

Retrieving values with SPARQL

To simply retrieve values in objects but not the objects themselves, regular SPARQL queries will suffice. But because we want to make sure that Python developers only need to deal with regular Python structures as lists and dictionaries, we created a simple wrapper around SPARQL (see helper file). The name of the wrapper is runSparql.

Here is an example. Let us find all the roots (top-level @ids) of objects and their types. Some objects do not have roots, so None stands for a blank node.

>>> pprint(runSparql(conn,"select ?s ?type { ?s a ?type }"))
[{'s': 'cocktail1', 'type': 'Cocktail'},
 {'s': None, 'type': 'Individual'},
 {'s': None, 'type': 'Vehicle'},
 {'s': 'tesla', 'type': 'Offering'},
 {'s': 'place1', 'type': 'Place'},
 {'s': None, 'type': 'Offer'},
 {'s': None, 'type': 'AggregateRating'},
 {'s': 'hippieoil', 'type': 'Product'},
 {'s': 'person-3', 'type': 'Person'},
 {'s': 'person-2', 'type': 'Person'},
 {'s': 'person-1', 'type': 'Person'},
 {'s': 'person-1000', 'type': 'Person'},
 {'s': 'event1', 'type': 'Event'}]
>>>

We do not see the full URIs for ?s and ?type. You can see them by adding an appropriate formatargument to runSparql, but the default is terse.

>>> pprint(runSparql(conn,"select ?s ?type { ?s a ?type } limit 2",format='ntriples'))
[{'s': '<http://franz.com/cocktail1>', 'type': '<http://franz.com/Cocktail>'},
 {'s': None, 'type': '<http://purl.org/goodrelations/v1#Individual>'}]
>>>

Retrieving a Dictionary or Object

retrieve is another function defined (in jsonld_tutorial_helper.py) for this tutorial. It is a wrapper around SPARQL to help extract objects. Here we see how we can use it. The sole purpose of retrieve is to retrieve the JSON-LD/dictionary based on a SPARQL pattern.

>>> retrieve(conn,"{?this a ical:Event}")
[{'@type': 'ical:Event', 'ical:location': 'New Orleans Arena, New Orleans, Louisiana, USA', 'ical:summary': 'Lady Gaga Concert', '@id': 'ical:event1', '@context': {'xsd': 'http://www.w3.org/2001/XMLSchema#', 'ical': 'http://www.w3.org/2002/12/cal/ical#', 'ical:dtstart': {'@type': 'xsd:dateTime'}}, 'ical:dtstart': '2011-04-09T20:00:00Z'}]
>>>

Ok, for a final fun (if you like expensive cars) example: Let us find a thing that is “fast and furious”, that is worth more than $80,000 and that we can pay for in cash:

>>> addNamespace(conn,"gr","http://purl.org/goodrelations/v1#")
>>> x = retrieve(conn, """{ ?this fti:match 'fast furious*';
                          gr:acceptedPaymentMethods gr:Cash ;
                          gr:hasPriceSpecification ?price .
                    ?price gr:hasCurrencyValue ?value ;
                           gr:hasCurrency "USD" .
                    filter ( ?value > 80000.0 ) }""")
>>> pprint(x)
[{'@context': {'foaf': 'http://xmlns.com/foaf/0.1/',
               'foaf:page': {'@type': '@id'},
               'gr': 'http://purl.org/goodrelations/v1#',
               'gr:acceptedPaymentMethods': {'@type': '@id'},
               'gr:hasBusinessFunction': {'@type': '@id'},
               'gr:hasCurrencyValue': {'@type': 'xsd:float'},
               'pto': 'http://www.productontology.org/id/',
               'xsd': 'http://www.w3.org/2001/XMLSchema#'},
  '@id': 'http://example.org/cars/for-sale#tesla',
  '@type': 'gr:Offering',
  'gr:acceptedPaymentMethods': 'gr:Cash',
  'gr:description': 'Need to sell fast and furiously',
  'gr:hasBusinessFunction': 'gr:Sell',
  'gr:hasPriceSpecification': {'gr:hasCurrency': 'USD',
                               'gr:hasCurrencyValue': '85000'},
  'gr:includes': {'@type': ['gr:Individual', 'pto:Vehicle'],
                  'foaf:page': 'http://www.teslamotors.com/roadster',
                  'gr:name': 'Tesla Roadster'},
  'gr:name': 'Used Tesla Roadster'}]
>>> x[0]['@id']
'http://example.org/cars/for-sale#tesla'



Gartner Identifies Top 10 Data and Analytics Technology Trends for 2019

According to Donald Feinberg, vice president and distinguished analyst at Gartner, the very challenge created by digital disruption — too much data — has also created an unprecedented opportunity. The vast amount of data, together with increasingly powerful processing capabilities enabled by the cloud, means it is now possible to train and execute algorithms at the large scale necessary to finally realize the full potential of AI.

“The size, complexity, distributed nature of data, speed of action and the continuous intelligence required by digital business means that rigid and centralized architectures and tools break down,” Mr. Feinberg said. “The continued survival of any business will depend upon an agile, data-centric architecture that responds to the constant rate of change.”

Gartner recommends that data and analytics leaders talk with senior business leaders about their critical business priorities and explore how the following top trends can enable them.

 

Trend No. 5: Graph

Graph analytics is a set of analytic techniques that allows for the exploration of relationships between entities of interest such as organizations, people and transactions.

The application of graph processing and graph DBMSs will grow at 100 percent annually through 2022 to continuously accelerate data preparation and enable more complex and adaptive data science.

Graph data stores can efficiently model, explore and query data with complex interrelationships across data silos, but the need for specialized skills has limited their adoption to date, according to Gartner.

Graph analytics will grow in the next few years due to the need to ask complex questions across complex data, which is not always practical or even possible at scale using SQL queries.

https://www.gartner.com/en/newsroom/press-releases/2019-02-18-gartner-identifies-top-10-data-and-analytics-technolo




What is the Answer to AI Model Risk Management?

Algorithm-XLab – March 2019

Franz CEO Dr. Jans Aasman Explains how to manage AI Modelling Risks.

AI model risk management has moved to the forefront of contemporary concerns for statistical Artificial Intelligence, perhaps even displacing the notion of ethics in this regard because of the immediate, undesirable repercussions of tenuous machine learning and deep learning models.

AI model risk management requires taking steps to ensure that the models used in artificial applications produce results that are unbiased, equitable, and repeatable.

The objective is to ensure that given the same inputs, they produce the same outputs.

If organizations cannot prove how they got the results of AI risk models, or have results that are discriminatory, they are subject to regulatory scrutiny and penalties.

Strict regulations throughout the financial services industry in the United Statesand Europe require governing, validating, re-validating, and demonstrating the transparency of models for financial products.

There’s a growing cry for these standards in other heavily regulated industries such as healthcare, while the burgeoning Fair, Accountable, Transparent movementtypifies the horizontal demand to account for machine learning models’ results.

AI model risk management is particularly critical in finance.

Financial organizations must be able to demonstrate how they derived the offering of any financial product or service for specific customers.

When deploying AI risk models for these purposes, they must ensure they can explain (to customers and regulators) the results that determined those offers.

Read the full article at Algorithm-XLab.




New!!! AllegroGraph v6.5 – Multi-model Semantic Graph and Document Database

Download – AllegroGraph v6.5 and Gruff v7.3 

AllegroGraph – Documentation

Gruff – Documentation

Adding JSON/JSON-LD Documents to a Graph Database

Traditional document databases (e.g. MongoDB) have excelled at storing documents at scale, but are not designed for linking data to other documents in the same database or in different databases. AllegroGraph 6.5 delivers the unique power to define many different types of documents that can all point to each other using standards-based semantic linking and then run SPARQL queries, conduct graph searches, execute complex joins and even apply Prolog AI rules directly on a diverse sea of objects.

AllegroGraph 6.5 provides free text indexes of JSON documents for retrieval of information about entities, similar to document databases. But unlike document databases, which only link data objects within documents in a single database, AllegroGraph 6.5 moves the needle forward in data analytics by semantically linking data objects across multiple JSON document stores, RDF databases and CSV files. Users can run a single SPARQL query that results in a combination of structured data and unstructured information inside documents and CSV files. AllegroGraph 6.5 also enables retrieval of entire documents.

There are many reasons for working with JSON-LD. The big search engines force ecommerce companies to mark up their webpages with a systematic description of their products and more and more companies use it as an easy serialization format to share data.

A direct benefit for companies using AllegroGraph is that they now can combine their documents with graphs, graph search and graph algorithms. Normally when you store documents in a document database you set up your documents in such a way that it is optimized for certain direct retrieval queries. Performing complex joins for multiple types of documents or even performing a shortest path through a mass of object (types) is too complicated. Storing JSON-LD objects in AllegroGraph gives users all the benefits of a document database AND the ability to semantically link objects together, run complex joins, and perform graph search queries.

Another key benefit for companies is that your application developers don’t have to learn the entire semantic technology stack, especially the part where developers have to create individual RDF triples or edges.   Application developers love to work with JSON data as serialization for objects. In JavaScript the JSON format is syntactically identical  to the code for creating JavaScript objects and in Python the most import data structure is the ‘dictionary’ which is also near identical to JSON.

Key AllegroGraph v6.5 Features:

  • Support for loading JSON-LD and also some non-RDF data files, that is files which are not already organized into triples or quads. See Loading non-RDF data section in the Data Loading document for more information on loading non-RDF data files. Loading JSON-LD files is described along with other RDF formats in the Data Loading document. The section Supported RDF formats lists all supported RDF formats.

 

  • Support for two phase commits (2PC), which allows AllegroGraph to participate in distributed transactions compromising a number of AllegroGraph and non-AllegroGraph databases (e.g. MongoDB, Solr, etc), and to ensure that the work of a transaction must either be committed on all participants or be rolled back on all participants. Two-phase commit is described in the Two-phase commit document.

 

  • An event scheduler: Users can schedule events in the future. The event specifies a script to run. It can run once or repeatedly on a regular schedule. See the Event Scheduler document for more information.

 

  • AllegroGraph is 100 percent ACID, supporting Transactions: Commit, Rollback, and Checkpointing. Full and Fast Recoverability.   Multi-Master Replication
  • Triple Attributes – Quads/Triples can now have attributes which can provide fine access control.
  • Data Science – Anaconda, R Studio
  • 3D and multi-dimensional geospatial functionality
  • SPARQL v1.1 Support for Geospatial, Temporal, Social Networking Analytics, Hetero Federations
  • Cloudera, Solr, and MongoDB integration
  • JavaScript stored procedures
  • RDF4J Friendly, Java Connection Pooling
  • Graphical Query Builder for SPARQL and Prolog – Gruff
  • SHACL (Beta) and SPIN Support (SPARQL Inferencing Notation)
  • AGWebView – Visual Graph Search, Query Interface, and DB Management
  • Transactional Duplicate triple/quad deletion and suppression
  • Advanced Auditing Support
  • Dynamic RDFS++ Reasoning and OWL2 RL Materializer
  • AGLoad with Parallel loader optimized for both traditional spinning media and SSDs.

Numerous other optimizations, features, and enhancements.  Read the release notes – https://franz.com/agraph/support/documentation/current/release-notes.html

 

_linkedin_partner_id = "958892"; window._linkedin_data_partner_ids = window._linkedin_data_partner_ids || []; window._linkedin_data_partner_ids.push(_linkedin_partner_id); (function(){var s = document.getElementsByTagName("script")[0]; var b = document.createElement("script"); b.type = "text/javascript";b.async = true; b.src = "https://snap.licdn.com/li.lms-analytics/insight.min.js"; s.parentNode.insertBefore(b, s);})();
<br />
<img height="1" width="1" style="display:none;” alt=”” src=”https://dc.ads.linkedin.com/collect/?pid=958892&fmt=gif” />




Why Is JSON-LD Important To Businesses?

Forbes – February 2019

Although you may not have heard of JavaScript Object Notation Linked Data (JSON-LD), it is already affecting your business. Search engine giant Google has mentioned JSON-LD as a preferred means of adding structured data to webpages to make them considerably easier to parse for more accurate search engine results. The Google use case is indicative of the larger capacity for JSON-LD to increase web traffic for sites and better guide users to the results they want.

Expectations are high for JSON-LD, and with good reason. It effectively delivers the many benefits of JSON, a lightweight data interchange format, into the linked data world. Linked data is the technological approach supporting the World Wide Web and one of the most effective means of sharing data ever devised.

In addition, the growing number of enterprise knowledge graphs fully exploit the potential of JSON-LD as it enables organizations to readily access data stored in document formats and a variety of semi-structured and unstructured data as well. By using this technology to link internal and external data, knowledge graphs exemplify the linked data approach underpinning the growing adoption of JSON-LD — and the demonstrable, recurring business value that linked data consistently provides.

Read the full article at Forbes.




JSON-LD: A Method of Encoding Linked Data That Adds Meaning to JSON Objects

Hosting Advice – February 2019

Franz CEO Dr. Jans Aasman Explains JSON-LD: A Method of Encoding Linked Data That Adds Meaning to JSON Objects.

JSON-LD, a method of presenting structured Schema.org data to search engines and other parties, helps organize and connect information online. As Dr. Jans Aasman, CEO of Franz Inc. told us, the data-interchange format has far-reaching implications, from standardizing the ecommerce and healthcare industries to building knowledge graphs. With technologies like AllegroGraph helping to convert complex data into insights, JSON-LD is being put to use in a number of ways.

Read the full article at Hosting Advice.