chatbot project
 

R&D in the Artificial Intelligence field is one of the main goals in Mediterra. Our team consists of professionals people who are both interested in Science and development. We have over 10 years of experience in big data, machine learning, middleware, systems, web applications, high-performance network servers, distributed data processing systems, and algorithms.
 
For the last 2 years we've been working on a chatbot technology recently becoming a trend hype due to the interest from big corporations like Facebook. We conducted experiments in order to find the best way for implementation of chatbot. We focused on using open-source tools written in popular languages languages (such as like Python) mixed with our experience and perception of the AI.

Natural language processing (NLP)  originated in the middle of the 20th century as a narrow field of research that put together artificial intelligence (AI) and linguistics. Today it is a wide area of knowledge with lots of ongoing research at academic institutions as well as private companies. Google, Microsoft, and IBM are just a few well-known names.

 

The main goal of NLP is to make a computer-based system that will be able to communicate with a human using his/her native language, such as English, Russian or any other. To achieve this goal several tasks need to be accomplished. Roughly these tasks can be grouped in three steps: text decomposition (information extraction), reasoning and text generation. This is referred to as NLP pipeline* [1] and usually includes the following subtasks (note that subtasks 1-3 correspond to text decomposition phase):

 RESEARCH IN THE AREA OF NATURAL LANGUAGE PROCESSING AND CHATBOT PRODUCT

*Turing test

*NLP pipeline

The current state of NLP

Subtasks 1, 2 and partly 3 are pretty well automated. Over the last several years there has been significant progress in the field of speech recognition. OCR became a mainstream task more than a decade ago. Open source tools like UIMA, GATE, NLTK handle most of the job of lexical and semantic analysis. Commercial tools for lexical and semantic analysis are also available. While part-of-speech tagging was a difficult problem just 10-20 years ago, nowadays the difficulty area moved further down the NLP pipeline.

 

Research topics in modern NLP are mainly concerned with knowledge representation, knowledge inference (reasoning) and partly natural text generation. Modern approach to knowledge representation and inference combines linguistic rules with statistical models which are built over annotated text corpora.

 

Applications of NLP include [5]:

  •  Word processing and publishing (machine translation, localisation, internalisation, spell check, controlled language) [5, 9]
  •  Information management (information retrieval, document categorisation, data extraction, document generation) [5]
  •  Niche markets [6]: online reviews, medical transcription, news summarisation, legal search and discovery, etc
  •  Educational applications [7]
  •  Question answering (simple) or chatterbot (complex) [8, 9]

 

A chatbot is using the entire NLP pipeline presented above and is one of the most complex applications of NLP.


Commercial chatbot research strategy.

A large knowledge base annotated by humans is important to create chatbots that are more powerful than the ones online services like pandorabots.com or personalityforge.com can offer. Modern state of the art NLP techniques utilize a hybrid approach that encapsulates both linguistic methods and statistical models, including machine learning and statistical inference.

 

 

  1. Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. Journal of the American Medical Informatics Association : JAMIA. 2011;18(5):544-551. doi:10.1136/amiajnl-2011-000464.
  2. Preeti, BK Sidhu.  Natural language processing. International Journal of Computer Technology & Applications 4 (5), 2013, 751-758
  3. http://wiki.opencog.org/w/NLP
  4. Cambria, Erik, and Bruce White. "Jumping NLP curves: a review of natural language processing research [review article]." Computational Intelligence Magazine, IEEE 9.2 (2014): 48-57
  5. K. W. Church and L. F. Rau. Commercial Applications of Natural Language Processing. Communications of the ACM. November 1995, Vol. 38, No. 11, 71-79
  6. Eisner, Jason. NLP Tasks and Applications. Introduction to NLP. Johns Hopkins Univeristy, 2014     
  7. http://www.cs.jhu.edu/~jason/465/PDFSlides/lect36-tasks.pdf
  8. Gelbukh, Alexander (ed.). Special issue: Natural Language Processing and its Applications, vol. 46. Proceedings of 11th International Conference on Intelligent Text Processing and Computational Linguistics, 2010 [http://www.cicling.org/2010/Vol46.pdf]
  9. http://research.google.com/pubs/NaturalLanguageProcessing.html
  10. http://research.microsoft.com/en-us/groups/nlp/
  1. Conversion of information that contains natural language to machine-readable text
  2. Morphological and lexical analysis, parsing
  3. Semantic analysis [2] [3], information extraction
  4. Knowledge representation, reasoning
  5. Natural language generation, similar to steps 1-3 but in backward order

Other important aspects of NLP, especially related to knowledge representation, reasoning, and language generation are:

  • To be able to pretend to be human (remember Turing test*), or minimize the difference in perception of the computer-based system (in comparison to a human) by an interlocutor.
  • To have its own identity, authenticity, individuality.