e Neurofuzzy Similarity

The NEFUSI project

NEuroFUzzy approach for semantic SImilarity assessment

NEFUSI Repository

1st Deliverable of the project

2nd Deliverable of the project

3rd Deliverable of the project

What is NEFUSI about?

The NEFUSI project (which stands for "NEuroFUzzy approach for semantic SImilarity assessment") aims designing and developing a hybrid technique based on a neurofuzzy approach for semantic textual similarity based on the intelligent combination of neural networks and fuzzy logics.

The field of semantic similarity attracts much attention among the research community because it represents one of the fundamental challenges that can advance many fields and academic disciplines, including search and discovery. The possibility that a computer can automatically determine the degree of similarity between different pieces of textual information regardless of their lexicography can be very relevant to have better search and discovery mechanisms [Martinez-Gil et al., 2021]. This means that areas related to search and discovery, such as question answering, information retrieval, query expansion, etc., could greatly benefit from any progress in this regard. The following figure shows a prototypical systems of this kind:

We aim to take it a step further and create a novel neurofuzzy technique that can automatically and accurately evaluate the degree of semantic similarity between pieces of textual information. To do this, we suggest using a concurrent fuzzy inference neural network (FINN) technique capable of coupling state-of-the-art neural models with state-of-the-art fuzzy models.

In fact, we intend to benefit from the excellent capabilities of the latest neural models based on the concept of transformers to work with text and at the same time from the possibilities that fuzzy logic offers to aggregate and decode numerical values in a personalized way [de Campos Souza, 2020].

This approach is projected to produce highly accurate results because it combines the computing capacity of neural networks with the capability of information fusion provided by fuzzy logic as explained in [Skrjanc et al., 2019].

Challenges

The most significant technical challenge is two-fold: lack of related existing works and reconciliation of two techniques of different nature.

  • Concerning the first one, we have to deal with a neurofuzzy system, which to the best of our knowledge, has never been used in the field of semantic similarity. A simple glance at the digital libraries of Springer, Elsevier, IEEE, or ACM shows that there is no published work covering this type of system. The DBLP online catalog shows several hundreds of hits for the term neurofuzzy, none of which are related to technical work in the field of Information Retrieval, the Web, or Databases. As this is a pioneering work, we will have references to rely on.
  • Concerning the second one, as for techniques based on neurofuzzy hybridization, no work has yet been done in the field of search and discovery. However, our previous experience in designing solutions based on fuzzy logic leads us to believe that combining the human-like reasoning of fuzzy logics with the learning and connectionist structure of neural networks would yield quite good results.

Neurofuzzy Models

Like most learning systems, our approach also needs a data set to train the system.

In principle, we are working with the following datasets:

  • MC-30 Miller, G., & Charles, W. (1991). Contextual correlates of semantic similarity. Language and Cognitive Processes, 6 , 1-28.
  • RG-65 Rubenstein, H., & Goodenough, J. B. (1965). Contextual correlates of synonymy. Communications of the ACM, 8 , 627-633.
  • GeReSiD Ballatore, A., Bertolotto, M., & Wilson, D. C. (2014). An evaluative baseline for geo-semantic relatedness and similarity. GeoInformatica, 18(4), 747-767.

Although it is within our plans to also use domain specific datasets (legal sector, biomedical sector, etc.).

Implementation and other resources

In principle, we are working with the following datasets:

  • The project falls within the research and development framework in a scientific setting. The aim is to create an improved model, able to overpass or at least to be in line with the state-of-the-art, following an academic model life cycle, which includes the formulation of the hypothesis, the design of the solution, which in our case will be a prototype, the experimentation, and evaluation concerning existing proposals as well as the publication of the results and the associated software artifact.
  • We are currently workin in the first implementation of a model for neurofuzzy similarity. The code is under development and the first versions will be available soon through a Github repository.

Applications

The ability to automatically determine the degree of semantic similarity between two expressions of a textual nature has become increasingly important in recent times. The great importance it has in many other areas of modern computer science and the latest advances in neural computation have made the solutions better and better.

Neurofuzzy similarity can be applied in a variety of applications. In the following, we list a number of those, organized by different fields of applications.

Data Integration

In the field of data integration, semantic similarity measures play a fundamental role since they allow to handle information coming from heterogeneous sources in an effective and efficient way. For this reason, the successful development of new measures based on neurofuzzy models could have a great impact in this area.

Query Expansion

Query expansion techniques allow to automatically expand the information entered by the user in order to cover a larger number of resources. In this way, it is possible to find the desired information, which for one reason or another, was not expressed using the same lexicography as the original query.

Document Sanitization

Many documents must go through a censorship process before being published, as they contain information and references that may be sensitive. These processes are currently done manually. However, semantic similarity measures can help automate the process with suggestions for replacing sensitive information with more generic information.

References

These are the core publications of NEFUSI:

  1. Jorge Martinez Gil, Riad Mokadem, Josef Küng, Abdelkader Hameurlain: A Novel Neurofuzzy Approach for Semantic Similarity Measurement. DaWaK 2021: 192-203
  2. Jorge Martinez Gil, Jose Manuel Chaves-Gonzalez: Sustainable Semantic Similarity. Journal of Intelligent and Fuzzy Systems: Accepted for Publication

Further references used above:

  1. Igor Skrjanc, José Antonio Iglesias, Araceli Sanchis, Daniel F. Leite, Edwin Lughofer, Fernando A. C. Gomide: Evolving fuzzy and neuro-fuzzy approaches in clustering, regression, identification, and classification: A Survey. Inf. Sci. 490: 344-368 (2019)
  2. Paulo Vitor de Campos Souza: Fuzzy neural networks and neuro-fuzzy networks: A review the main techniques and applications used in the literature. Appl. Soft Comput. 92: 106275 (2020)

Acknowledgements

The development of NEFUSI is funded in the project NGI Zero Discovery by the NLnet Foundation and the European Commission. Project number: 2021-04-069