We have been invited to participate in a very stimulating workshop at the Lorentz Center https://www.lorentzcenter.nl/lc/web/2017/878/info.php3?wsid=878&venue=Snellius
to discuss information and knowledge perspectives in language, how they are represented and used on the web, and their impact on individuals and groups in society. We want to understand what are the key research questions to be answered to enable perspective-aware dissemination and understanding of facts. For example: how can we make sure that people are able to distinguish biased news from true facts? or that opinions are clearly distinguishable from facts? how can we improve the capacity of individuals in critically analysing a post? And more things like that…
We are then preparing to attend the workshop that will start on Tuesday and one idea proposed by the organisers is to try and identify 5 stars for the Perspective Web in analogy with what Tim Berners Lee did with the 5-star rating scale for open data.
The following includes notes derived from a discussion we had on this topic, and that we are going to share with the other participants.
A first problem is: what are we rating?
1) Perspective stars depend on communication pragmatics and semiotic type (“linguistic games”).
I will use different criteria to rate a Facebook post from those to rate a journal article.
Another question is: What is the objective of the star-scale?
2) Each star-level must be given a semantics.
Stars may make an assessment about truth, autoritativeness, trust, beauty, goodness, a mix of all.
There is a problem: who is implementing the stars?
Let us imagine a game where a group of people decides to define rules for telling stories based on actual events. The rules would enforce something like: state clearly whether this is your opinion or a report, provide references that support what you say, mention who talked about the same fact in the past, and whether they agreed or not with you, etc. The participants also agree on a rating scale for judging the stories according to how well they comply with the rules. Each participant will then tell a story, and the rest of the group (reviewers) will assign a score to the story based on the rating scale and the shared rules.
If we move this example to the web, and in particular to social networks, we may define a 5-star rating scale to be assigned to individual posts.
If we had a rating scale for e.g. Facebook posts, people reading others’ posts would be influenced by the rating of those posts, when judging their content (similarly to what happens with rating for restaurants, hotels, etc.). In turn, they would be influenced when writing their own future posts, in an attempt to obtain high rates.
Assuming that the rating works well, the result will be virtuous, as people would train themselves to develop better critical analysis capabilities.
However, people are usually not good at in depth critical analysis. In particular, post rating would often result in biased and uniformed judgement, e.g. based on prejudice, or emotions that have nothing to do with critical thinking. For this reason, post rating may be a perfect job for an AI using clear quality criteria, associated with a 5-star rating scale.
3) Perspective star rating of content should be done by an AI, based on formal and transparent criteria.
Going towards the Micro-level:
An automated perspective star rating depends on many variables. In order to be solved, it requires a deep analysis of the problem, as well as rich and articulated knowledge.
From a lay user perspective, a 5-star rating will be perceived as bad-to-good, fake-to-true, untrust-to-trust, etc., or all these things together depending on what they are reading. They will adapt the rating to the type of content. An AI has to be able to do the same: to assign a number of stars that distill a customised set of criteria depending on the type of content, as suggested in 1).
To serve this purpose, an AI should be able to assess the following (not comprehensive list of) things:
A content is about some facts
Such facts can (not) be checked (they are either public or private events)
Other (ranked) sources reporting the same facts are referenced
The content expresses an opinion (and what this opinion is)
Opinions and facts can be distinguished (and which are opinions or facts)
The expressed opinions are based on inferences from the facts or other background knowledge
The other sources’ content (don’t) provide proof for the inferences
There are (not) contradictions emerging between the analysed content and the referenced sources (and possibly which they are)
A content implements a specific rhetorical structure, e.g. metaphor, irony, etc.
A content belongs to an argumentation line
4) Perspective star rating computation requires representation of, and reasoning on, frame systems
The role of Frames:
Facts can be communicated in many ways, each reflecting different perspectives. A story containing some facts is built by combining frames belonging to the facts with frames that express the perspectives.
How can we recognise perspective frames from those strictly belonging to the fact?
What are the aspects to consider in the star-rating computation?
We need to study how frames can be combined. We may need to develop a frame algebra, associated with a frame logic, and possibly a distributional representation (e.g. embedding), as well as algorithmic solutions covering several dimensions:
– communication pragmatics
– provenance data
– background knowledge: of an individual or shared in an interactional context
– intensional relations between frames: e.g. from lexical resources
– extensional relations between frames: discovered from data
– the “mode” of a frame on a specific occurrence (stratification): factual, opinion, emotion, irony, argumentation, etc.
– affordances: possible emerging relations with other contents, deriving from opportunity of action (affordances) caused by the analysed content, e.g. it provokes a reaction in an argumentation interaction
– network propagation of star ratings, e.g. pagerank
As final remarks and useful references:
– there is a keynote that Aldo gave a few years ago that discusses related topics, and is still relevant in terms of open questions. It can be found at https://www.slideshare.net/gangemi/isemantics-keynote
– we have developed a huge linked data frame system resource named Framester
– FRED is our prototype implementing frame-based machine reading
– Sentilo is our prototype implementing frame-based sentiment analysis
Valentina and Aldo