“True conversational AI is a voice assistant that can engage in human-like dialogue, capturing context and providing intelligent responses. Such AI models must be massive and highly complex,” Sid Sharma from ‘What Is Conversational AI?’. This lecture attempts to demystify conversational AI by covering its counterparts that include, but not limited to: Automatic Speech Recognition, Natural Language Processing & Understanding, Text-to-Speech Synthesis, Intention Extraction and Identification, etc.. We use NVIDIA‘s Jarvis, an application framework for multimodal conversational AI services that delivers real-time performance on GPUs, to perform sophisticated conversational AI tasks. By the end of the lecture, we present a Question/Answering Demo powered by NVIDIA‘s Jarvis.
The lecture above shows you how to install Jarvis on your machine. This is done by first installing Docker, then CUDA, registering to NGC, then finally setting up Jarvis. You can work with Jarvis on a Jupyter notebook.
Ahmad Bazzi then shows you how to work with the most essential components of Jarvis that are: ASR (Automatic Speech Recognition)
NLP (Natural Language Processing) and Core NLP, and finally TTS (Text-to-Speech) Synthesis.
A very cool Question/Answering Jarvis-based demo is finally presented in the tutorial. It is trained via Wikipedia articles making use of the wikipedia python package.