Photo by Henry Chen on Unsplash
The main purpose of data is to deliver useful information that can be manipulated to take important decisions. In the early days of computing, such useful information could be queried from direct data such as those stored in databases. If the information queried for was not available in those data-stores, then the system would not be able to respond to the users’ queries. Consequently, at present, if we consider the mass amount of data that is present on the web, and the exponentially growing number of web users, it is hard to predict and store what exactly these users will be querying for. Simply stated, we cannot manually hard-code the responses to each and every expected question from the users. So what is the solution? The best answer is to extract knowledge from data, refine and recompose that into a knowledge graph, which we can use to answer queries. However, knowledge extraction has proven to be a non-trivial problem. Hence, we use a statistical relational learning methodology that considers past experiences and learns new relationships or similarities between facts in order to extract such knowledge. The probabilistic soft logic (PSL) framework is one such statistical relational learning framework that is used for building knowledge graphs.
A knowledge graph is a massive network of facts composed of entities and connections between them. A knowledge graph can be built upon a particular domain or span across several domains, bearing reference to one another. We have already discussed how to construct a knowledge graph from the facts that we have extracted in my previous blog. However, here’s a quick recap of the knowledge graph pipeline.
First, we go through free text and various sources to extract facts. These facts will be stored in the knowledge base with the confidence from their extractors. However, knowledge bases are filled with redundant information, which need to be weeded out. Hence, we take measures to refine and fine-tune the facts that are passed onto knowledge graphs in order to create a more accurate, relevant knowledge graph [5]. An important part of refining knowledge graphs is link prediction, which is the process of predicting a link between two entities in the knowledge graph. Then, based on the refined knowledge graph, it is easy to fetch results to user queries based on neighboring relationships, newly discovered links and their probability of being true.
As mentioned earlier, we perform link prediction to refine the knowledge graph. To understand link prediction further, let’s consider the following examples.
Example 1
This example is motivated by Pujara et al.[6]. Here, the system attempts to identify facts about the country Kyrgyzstan as it extracts information from various sources. As such, the system has identified all the following facts about Kyrgyzstan.
Although some of these facts are true, it is unlikely that all of these will be true. Therefore, we use link prediction to predict the most likely true fact by considering certain ontological constraints. For example, we already know that the US is in the North American continent and Asia is another continent. In this case, it is obvious that a place cannot be located on both continents. Hence, only one of the two facts, 4 or 5, can be true.
The above is a simple logical implication that states that, if a place P is located in the Asian continent, then it cannot be located in the US and this is true vice versa as well. Hence, if we already know a fact that says that Kyrgyzstan is located in Asia, another fact that we’re uncertain of, which says Kyrgyzstan is located in the US, can be deemed to be false.
Example 2
Consider another domain where we have the social networking information of some people. For example, imagine that we want to know if two users Phoebe and Joey are actually friends. This may even be considered as the simplified scenario of how Facebook tries to suggest friends. If these two users are not connected as friends on Facebook, we do not have direct data that we can use to infer this. However, if we already know that Phoebe is a friend of another user called Rachel and Rachel is a mutual friend of that same user Joey, then it is safe to assume that Phoebe and Joey could possibly be friends as well. Now, let’s try mapping this indirect relationship as a logical rule enforcing its constraints.
But hold on...
In real-world scenarios, we have multiple users. In that case, if we try to discover new relationships among all these users and apply the above or similar rules to all of those relationships, we may end up with numerous, newly inferred relationships. Simply based on the rule’s logical implication, we cannot say that all the relationships that we discovered will be definitely true or false. Hence, we need a way to probabilistically infer how far we are certain that a newly inferred relationship will be true, based on the facts that we already know to be true. And this is where we use PSL.
On the overview, PSL is a framework that can jointly reason the similarity between facts based on related domains and observed evidence [1, 2]. PSL observes the facts that are already known to be true and computes a probability for the facts that are newly inferred. In order to do so, initially, the user defines a set of general logical rules that imply new relationships, such as ontological constraints, similarity or friendship within the PSL program. Then, the PSL program probabilistically infers how confident it is that the new relationship identified will be true. For this, PSL employs a set of defined first order logic rules provided by the user, and infers the applicability of real world entities on top of these rules.
A PSL rule will be in a general format, composed of variables as shown below.
A, B, and C are variables, and in our scenario, they are used to represent three different Facebook users. By substituting actual users onto this rule, we can come up with new relationships that were not explicitly identified previously. Therefore, as we substitute Phoebe, Rachel, and Joey onto this rule, we get the following rule, as indicated previously as well.
PSL differs from its predecessors in the sense of trying to probabilistically identify how far we can be sure of each and every instance of a newly inferred relationship, based on the knowledge that we already have from the existing information and links.
But, first, let’s take a brief glance at PSL’s background.
Most of the motivating concepts for PSL root from Markov Logic Networks (MLN)[3]. In addition to solely looking at the facts, MLNs considered the ontological constraints, such as domain or inverse constraints [4] as well. Nevertheless, MLNs are limited by their ability to only take boolean truth values for logical ‘Relationships’, which we term as, ‘Predicates’. The disadvantage of this is that we cannot render a confidence or probability for the newly inferred fact. Hence, PSL opts for a soft truth value rather than a definitive 1 or 0 and uses multiple extractors to come up with the confidence for newly rendered relationships. Based on these confidences (which are probabilities), we can construct a knowledge graph with existing and highly-probable, newly-inferred facts. Since the knowledge graph is a massive web of information, it becomes easy to retrieve information on related or neighboring data that are highly probable truths, when responding to user queries.
In the following section, we’ll go through the high level overview of how new or missing links can be inferred with a probability of how confident we are that the new fact will hold true.
Now, let us formally reiterate a sample first order logic rule in PSL (Note that we need to specify these rules as an input, along with the facts/triples that we already know to be true, in order to run the PSL program).
where,
This rule is a basic rule of transitivity which states that if A is a friend of B and B is a friend of C, then A and C are most likely friends as well. The above rule is a template for constants (real-world entities) to be grounded on through a process known as grounding. Hence, if we take a new set of constants, which are real world entities within the domain as Phoebe, Rachel and Joey, then, we’ll have ground atoms as Friends(Phoebe, Rachel), Friends(Rachel, Joey) and Friends(Phoebe, Joey). This will give us the following ground rule; a single instance of a rule, which bears a weight of w.
Each of these ground atoms will have a soft truth value in the range of [0,1]. These observed soft truth values will be provided by users as well as taken from the extractors that identify and extract these facts. Hence, it’s a bit of both. Thus, let’s assume that each of these ground atoms has the following soft truth value.
Then, we need to verify if the above grounding satisfies the rule.
Verifying whether the grounding satisfies the rule
This is done by computing the distance to satisfaction for each ground rule. In order to get the distance to satisfaction, d, we treat the ground rule as a formula over the ground atoms in the rule, using the Lukasiewicz t-norms and co-norm.
Lukasiewicz t-norms and co-norm
The Lukasiewicz t-norms provide a relaxation of the following logical connectives in a rule:
The tilde on top of the logical connectives indicates the relaxation provided by the Lukasiewicz.
The following section explicates how we transform the ground rule and apply the Lukasiewicz t-norms to compute and verify whether the grounding satisfies the rule. In order to do so, first,
If it doesn’t satisfy the rule, we
1. Rule Satisfaction
To verify if a rule is being satisfied by its grounding, based on the atoms in the head and the body, we need to see if the overall soft truth value of the head is greater than or equal to that of the body, i.e.,
However, all we have at the moment are individual soft truth values for each atom in the head and the body. Hence, we simply apply the Lukasiewicz t-norms to the atoms in the body and the head separately and get a single soft truth value for Ibody and Ihead.
Since we have only a single ground atom as the head, which is 0.7 as we assumed previously, this leaves us with only needing to apply the Lukasiewicz on the ground atoms of the body.
Since the atoms in the body are conjuncted, we apply:
As such, we render an inference of 0.1 for the body as shown in the following calculation.
Now, we have the following inferences for the body and the head of the ground rule.
It is obvious that this grounding satisfies the rule as the soft truth value of the head is greater than that of the body. In this case, the distance to satisfaction will be 0.
When the grounding does not satisfy the rule, i.e., when the inferred value for the head is not greater than or equal to that of the body, we move on to compute the distance to satisfaction, which will be a value greater than 0.
2. Computing the distance to satisfaction
A simple way to compute the distance to satisfaction, d, is using the following formula.
Where, Igr is the overall soft truth value of the ground rule. Firstly, in order to figure out the overall soft truth value of the rule, we need to transform the ground rule into its disjunctive normal form.
Any formula written as an implication with
can be written as a disjunctive clause and thus, we can transform our ground rule as a disjunctive clause. As the implication is disjunctive, the atoms in the body are negated. Based on this, we get the following disjunctive clause for our ground rule.
In this case, the negated ground atoms and our ground head atom will gain the soft truth values shown below.
Then, we apply the disjunction norm of the Lukasiewicz t-norms to compute Igr. As such,
This leaves us with a distance to satisfaction of 0, meaning that the grounding satisfies the rule, coinciding with our evaluations from checking the soft truth value of the head against the body. Hence, let’s try to use a different scenario with different soft truth values for the atoms. Let’s assume new soft truth values for the ground atoms.
This will give us the set of negated ground atoms as,
Considering these values, Igr = min{1, 0.3+0.2+0.4} = 0.9.
As such, the distance to satisfaction will be 0.1, indicating that the grounding does not fully satisfy the rule.
Recursively, we can verify the rule’s satisfaction using the head and the body atoms as performed in step (1), where we check whether the grounding satisfies the rule, using the new values of the ground atoms. This will render Ibody as max{0, 0.7+0.8–1}, which is 0.5. Checking this against Ihead, which has a soft truth value of 0.4, shows us that the rule is not satisfied. Hence, we can check whether a grounding satisfies the rule using either way.
Let’s try to frame the complete picture of the above computations now.
Assume that we have a PSL program, P, that contains n number of rules defined, with I being an interpretation of those rules (an interpretation is the assignment of a soft truth value). If Rᵏ is a single rule in PSL, we will proceed onto the grounding process and will end up with a set of different groundings of all the instances of Rᵏ. As we discussed previously, we then compute the distance to satisfaction for each ground instance, and, as a result, will have a vector of the distance to satisfaction of all the ground instances, i.e. Vᵏ. This vector of the distance to satisfaction represents only a single rule from the PSL program. Hence, we take the weighted rules and stack all the vectors of the distance to satisfaction for the other rules in the PSL program as well. This leaves us with another vector V(I) = [w₁V₁(I)…wₙVₙ(I)]ᵀ. Knowing this V(I), if zeta is an arbitrary distance metric that we’ll be using later in our inference phase, then,
Here, if we resolve to an L1-norm distance, we could derive a log linear representation that we encounter often in statistical relational learning models. This will bear the form as shown below.
On the other hand, we could also resolve to the squared L2-norm distance, which compensates for the rules not being satisfied by making it a faster-growing function of the distance to satisfaction as shown below.
Following this, we move onto infer the soft truth values for the set of triples with unknown truth values, i.e., the newly inferred information or missing links.
Inferring soft truth values for new data based on previously available evidence
Inferring the most probable values for the triples, given the set of observed triples is based on a Most Probable Explanation (MPE) inference that adheres to the Maximum A Posteriori (MAP) inference. This inference is used to figure out the most likely soft truth value for the ground atoms that were previously unobserved. As discussed earlier, we have a set of known triples that we use as our observations and grounded onto the rules in PSL. Hence, let us name this set of triples with observed soft truth values as x and let the set of triples with unknown soft truth values be y. This brings us to our intended task of finding the most probable soft truth value assignments for Iₘₐₚ(y) given the observed values, Iₓ, which can be performed through the inference shown in the equation below.
This map inference results in the inference of the most probable soft truth value assignments for the unknown triples, y, based on the evidence given as the observed triples, x. Ultimately, we get the probabilities for a newly inferred fact or information in the form of soft truths that we find as Iₘₐₚ(y). With these probabilities, we can arrive at decisions on how far we consider these facts to hold true based on their truths. And this is how auto-complete search results, YouTube playlist mixes, and Facebook friend suggestions use probabilities to display related results.
You can run the sample use-case of the knowledge graph identification scenario that uses the PSL (adhering to the above computations) to identify soft truth values for previously unseen triples here.
This article mainly focuses on elucidating the Probabilistic Soft Logic (PSL), which is a joint probabilistic reasoning framework in detail. The overall intent of PSL is to have a way of evaluating how far an inferred, previously unseen fact, would hold true in a given domain. Hence, the general assumption is that the higher the inferred soft truth value, the higher the possibility of the fact being true. And lower the inferred soft truth value, the more unlikely it is to hold true, which means that there aren’t enough evidence or related facts to support the newly inferred fact. These probabilities can be helpful in supporting the decision making process to identify new information (facts), based on already known facts.
However, since this inference and assumption does not include the application or consideration of human decisions, it is hard to conclusively say whether a fact is precisely accurate or inaccurate solely based on the newly inferred soft truth values. I have my current work in this area, where we examine how to correlate the soft truth values and human evaluated data in order to clearly label the facts as true or false and make our decisions correspond more with actual human evaluations. I’ll be posting more on these details in my following posts.
[1] Brocheler, M., Mihalkova, L., & Getoor, L. (2012). Probabilistic similarity logic. arXiv preprint arXiv:1203.3469.
[2] Kimmig, A., Bach, S. H., Broecheler, M., Huang, B., & Getoor, L. (2012, February). A short introduction to probabilistic soft logic. In NIPS Workshop on probabilistic programming: Foundations and applications (Vol.1, p. 3).
[3] Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine learning, 62(1–2), 107–136.
[4] Bach, S. H., Broecheler, M., Huang, B., & Getoor, L. (2015). Hinge-loss markov random fields and probabilistic soft logic. arXiv preprint arXiv:1505.04406.
[5] Paulheim, H. (2017). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, 8(3), 489–508.
[6] Pujara, J., Miao, H., Getoor, L., & Cohen, W. (2013, October). Knowledge graph identification. In International Semantic Web Conference (pp. 542-557). Springer, Berlin, Heidelberg.