Conversational Insights Glossary
Conversational AI is a subset of artificial intelligence that utilizes machine learning, deep learning, natural language processing, natural language understanding, and automated speech recognition to mimic the conversational capabilities of a human. The objective is to help machines facilitate intuitive and contextual conversations with humans and help them achieve their goals.
The generally accepted definition across the world is to derive insights from how potential customers converse to define marketing strategy. However, we are changing the approach and defining conversational insights as humans using conversations and language as an interface to derive meaningful insights from any data storage system. Conversational insights according to us is the next obvious evolution in the field of business intelligence.
Natural Language Processing (NLP) is a field of computer science that deals with applying linguistic and statistical algorithms to text to extract meaning in a way that is very similar to how the human brain understands the language.
Automated Speech Recognition (ASR) is a technology that converts human speech into a sequence of text readable by machines. ASR is an essential part of any system using voice as an input and language as an interface. ASR’s applications are varied right from voice assistants in the phone like Siri or Cortana to voice-activated smart home devices.
Conversational Insights Assistant is an intent-based program that communicates with end users using language as an interface. Its primary purpose is to ingest data and derive the right insights from the data depending on the questions asked by the user in natural language. It also ensures that the answers are also displayed in a visual format as and when required, instead of overloading the end user with a barrage of metrics.
Domain-specific vocabulary understanding speaks about machines analyzing and understanding the meaning of words, phrases, or terminologies related to a specific domain used in different combinations. For example, an AI system having domain-specific vocabulary would be able to understand doxycycline, doxy, dox, or another relevant usage of that word as an antibiotic prescribed to pets.
A framework is a skeleton that provides some basic building blocks and generic functionality for building chatbots (like ML/ NLP or a Dialog Builder) but requires additional user-written code or other third-party services (to match the functionality of an actual platform). Frameworks often are composed of piecemeal components from different vendors.
Deep learning can be called a subset of machine learning, where the system mimics human attributes in knowledge acquisition. It plays a vital role in data science with dependencies on predictive modeling with years of research behind it. Its applications are varied but not limited to computer vision, signal processing, medical diagnosis, and self-driving cars. The modern-day computing power has led to this rise in deep learning systems.
The number of attributes or features existing in the data-set is called a dimension. If a data set has more than a hundred attributes, it’s generally referred to as a high-dimensional data set and calculations are often difficult in such data sets.
The attribute can be defined as the figure describing an observation such as height, weight, or size. For simpler reference, we can picture them as column headers or column names in a spreadsheet.
The activity of dividing or grouping data points based on some similar traits. Data points in a cluster are similar to each other while they’re different compared to other data points in other clusters. It’s an unsupervised learning method used to draw references without labeling.
The act of predicting outside the data ranges is called extrapolation. Machine learning algorithms face some trouble when the extrapolation goes outside training data.
The statistical analysis method deals with collecting and analyzing data to unearth hidden trends and patterns. It aims to remove bias via numerical analysis of the data. Interpreting research data, developing statistical models, planning for surveys, etc. are some key application areas of statistical analysis.
Entity analysis is the process of checking information that is related to an entity by using NLP. For example, the analysis can be based on sentimental analysis through which the texts between human and conversational platforms are analyzed. The analysis lets us know whether it is a positive or negative conversation.
A knowledge base is a database used to store structured/unstructured data that can use for knowledge sharing. Knowledge bases are used by AI as not just a means to store data but also to train itself and find solutions for further problems using data from previous experience within it.
Machine learning refers to the process in which algorithms are capable of learning and improving from experience without being specifically programmed. Machine learning adds emphasis to the development of the AI to access data and learn for itself from previous data.
Natural-language understanding (NLU) is part of natural-language processing in artificial intelligence that deals with machine reading comprehension. Here, Artificial Intelligence uses computer software to interpret text and any type of unstructured data.
Context is the act of humanizing AI’s approach to consuming, understanding, and responding to specific content. Often regarded as a tough project, training AI to have contextual understanding helps businesses reap great benefits, especially in voice AI. For example, if the AI system is able to understand that the term cover means insurance coverage in a question related to insurance is what context training achieves.
A data glossary is a collection of all terms that define the data’s key characteristics, organized in a way that is easy to search. It gives context to the AI system and helps in organizing knowledge. A data glossary serves the same purpose for all the data assets in an organization. It contains business terms, phrases, and concepts that help define the data.
Homophones are similar sounding words, that play a major role in leading to the complexity of conversational AI solutions. In voice-driven AI solutions, the words ears and years sound the same but have different meanings. In a sentence like “How many years did it take for the business to break even?”, the AI system understands that it’s years and not ears just like humans do, by understanding the context of the whole sentence.
Synonyms are terms or phrases that have a similar meaning. For example, quantity, amount, and volume are synonymous. A good conversational insights system needs to understand that synonyms can be used instead of the original word and must decide the meaning based on the context. Hence, AI systems for conversational insights are also trained in synonyms
Contraction is the habit of using a trimmed-down version of the same word while typing. For example, using yr for year or ur for your has been a common practice in the text-driven human society. The same habit repeats itself when talking or “texting” a machine with questions. Today’s AI has been trained in multiple utterances techniques to understand these contractions. The contractions add to the complexity of training AI models.
Abbreviations are shortened forms of a verb or a phrase. For example, the UK stands for the United Kingdom, and ISRO stands for the Indian Space Research Organization. Just like training for contractions, the system needs to be trained for abbreviations and understand them contextually.
A boxplot is a type of chart drawn vertically or horizontally, used to display data distribution by breaking them up into 5 different quartiles. The “whiskers” or the lines extending out from the box plot display the variability outside the upper and lower quartiles. Outliers are displayed as individual dots.
Frequency distribution is a visual representation in graphical format or tabular format to display the number of occurrences or frequency of an item in a data set. Election results, exam scores, and a sports team’s performance are simple examples of a dataset where frequency distribution is employed.
A Histogram visualizes the distribution of data over a continuous interval or certain period. Each bar in a histogram represents the tabulated frequency at each interval/bin. Histograms help give an estimate as to where values are concentrated, what the extremes are, and whether there are any gaps or unusual values. They are also useful for giving a rough view of the probability distribution.
The act of predicting or estimating a future event or occurrence using statistical methodologies based on past data, trends, behaviors, and more is called forecasting. In the world of data analytics, forecasting plays a major role, and in many scenarios makes or breaks the success of a business. Some common examples of forecasting include but are not limited to forecasting market fluctuations, forecasting payment behaviors, etc.
Metadata prefixed by the pound (#) sign is called a hashtag. It is used to identify specific topics. The use of a hashtag is widely spread on social media to make a topic trending or to convey knowledge about the topic. In the context of conversational insights, to group a set of questions related to a specific topic, hashtags are used and then revisited later.
The role of pronouns in conversational insights is similar to that in language. The system is trained to identify the context of the questions based on the pronoun. This eases questioning for the users. For example, “Who closed the most no. of sales in January 2020?”, is a question that the system will fetch results. A follow-up question of “How about March 2020?” will give the answer of the salesperson who closed the most no. of sales in March 2020, because the system is able to understand and relate to the context of the question.
Contributing factors are any dimensions that are contributing to more than a specific percentage of the result. For example, searching for the contributing factors for revenue generation, if California contributed more than 50% of the revenue, then California is a contributing factor to that specific result.
Peers are the attributes that has shown similar properties to another attribute. For example, if more than 50 million dollar donations have come from the midwestern states, and the southern states have donated close to 50 million dollars, then they can be considered as peers within the same dimension, i.e, region.
On a mission to make data access simple. Our thoughts, learnings and quirks on this journey