As we take a view of words in twenty first century when the competition to humans won’t be humans but humans and machines working together and therefore the good, bad, and ugly could turn a million times over. This article is a humble effort to study the future of words.
If you are a language-lover, linguist, NLP scientist, data scientist, AI-practitioner or technologist dealing with media then this article is meant for you. It’s a long article and is divided in 3 parts:
- A quick history of the evolution of words
- 4 milestone developments in the standardization of words and contextual relationships between them in the last 150 years.
Concepts that NLP lovers swear by and the Big Data mining and understanding opportunities beyond current explored contexts. The progress of AGI and massive cloud powered learning system: The massive explosion in words and relationships and its impact on societal culture and businesses in the coming years.
A quick history of the evolution of words
The birth of words
These words were around “thou, I, not, that, we, to give, who, this, what, man/male, ye, old, mother, to hear, hand, fire, to pull, black, to flow, bark, ashes, to spit, worm.” Most of these words are common in some way or other to all Asiatic and European languages. Why is that so?
The basic DNA of human evolving from animals had understood the concept of threat and affinity, resources (food) and association (human, animal, and things), and basic differences (gender, age, skin, size), even before they understood words. The childhood of words was human observation of natural phenomenon and basic psychology triggered by primal emotions. In this childhood of words, the non-critiquing but supremely observing human mind deciphered the way the world physically, chemically, and logically works, it was just that it didn’t have the words to describe that. The understanding of the language (and you may debate), preceded the understanding of words, because the human language was wordless and worded, though the languages we knew later became worded.
The birth of script
Once humans had words, even if they were not consistently understood, for communication or barter trade, writing evolved from pictures to syllabary and eventually to scripts that handled alphabets. The oldest script probably is Cuneiform in Mesopotamia recorded around 3400 BC. Brahmi, the predecessor of Sanskrit started around 2500 BC and Sanskrit started growing around 1500 BC. The earliest mention of Latin was in around 700 BC.
The beginning of scripts is important as it rule-codes the symbols of everyday association of its era with a construct sequence that defines the words and phrases with specific meaning to the learnings and interpretations of the era. The point to remember is that the learning and interpretations around common human needs preceded the birth of script and scripts were a response to cultural manifestation of the times. Scripts evolved, civilizations evolved, and the vice versa relationship influenced the next wave of words. An important part in this evolution was the element of trade, as it helped people across tribes to intermingle and in some way coin common words, and common ways to script them.
Civilizational growth of languages, cross-language dialog, and internationalization
Depending on which part of Eurasia, the use of Iron started around 1200 BC to 600 BC and its power impacted the adopted area in a few hundred years. By 5th century BC, colonization and the empire spread the way we know today had started. This also meant a faster cross dialog of languages that accelerated the interchange of words and influences, like never before. So was born language morphology and the presence of root words, beginning of suffixes and prefixes, tone and stress, and parts of speech. In the Nirukta, written in the 6th or 5th century BCE, the Sanskrit grammarian Yāska defined four main categories of words as nouns, verb, preverb and particle/preposition. And approximately another 200 years down the Plato identified verbs and nouns in sentences and Aristotle added conjunctions as a broader category including pronouns, prepositions, and articles.
While the basis of Parts of Speech were evolving, a wave of political and cultural imperialism (first the spread of Greek and its influence in Latin in Europe peaking with the march of Alexander and Roman exploits, the spread of Buddhist thought process across Asia spearheaded by Ashoka (whose reign itself had an Indo-Greek influence), the spread of Confucianism with the efforts of emperor Wu Di of Han dynasty and the spread of Christianity with the Roman Emperor Constantine I efforts only accelerated the assimilation of culture, rituals, religions, concept of god and morality in words as interpreted or sanctioned by the powers that be.
Across eras, new words would be invented but old words would be popularized till the new becomes old hammered and repeated enough times to be popularized (because every new thing needed a context that is old but a justification that is new and often the hammerer of the repetition of words was a moneyed-political-or-religious entity). And the language of the land and its interactions with the languages of lands it interacts with influenced the expansion of words. Human needs, human greed, human rituals, and human society always gave shape to words, smartly absorbing or expanding local dialects and slowly merging and morphing new languages. An important part in influencing others is the concept of evangelism – be it mercantile or religious or political. So, while In Europe started Christianity expansion, through Southeast Asia and China and Far East there was Buddhist expansion, in South Asia there was Hinduism spread, and then Islam expansion in Middle East, and North Africa. All these influences were helping words get into its teenage as humanity was moving away from the old world into the dark Middle Ages. (There is a brilliant video that shows spread of religions in the world: see here. Influence of religion nearly always influences language.)
Conformance through consistency and grammar and discordance through spread and distortion is a natural phenomenon
As languages grew, consistent grammar that define spellings, word accents, and with-it vowels and consonants, stress points, and usage guidelines also evolved. Panini’s treatise on grammar ‘Asthadhayi’, Tolkāppiyam extensive Tamil grammar scholarship, Arcadius work on Greek Grammar, Priscian’sauthoring of the Institutes of Grammar (the standard Latin study book for Middle Ages) and Sibawayh’s Al-Kitāb defining Arabic grammar and linguistics are amongst popular grammar work that influenced subsequent generations. The author’s contention is that a marquee text of an era (one that is most referred or most used, or systematic study in a stable era of peace) often created amazing works that influenced grammar.
The first known example of this was Chancery Standard in fifteenth century England that enforced a way of writing English for all clerks preparing king’s document. When unofficial standards evolve and get state sanction, the spread is fast, and interpretations are more consistent.
If conformance comes with standardization, discordance comes by circumstances, needs or survival instincts. Throughout history till the beginning of twentieth century, many groups of users often in a few thousand square miles living in relative isolation, without regular contact with most other people, because of geographical factors and sometimes because of political and religious needs. People in these cultures would consistently identify symbols, totems, things, and feelings that would have the elements of the language of the larger nation/continent it is part of, but rule coded in a different way, style, and script, all to make the dialect a culturally cohesive force, a symbol of morality, affinity, and unity. If some of the future rulers in these areas would become powerful, they would give shape to language that stays and prospers for many more centuries. Hundreds of languages in Europe and South Asia are a product of this phenomenon. Where no significant rulers from a dialect came, the dialect stayed more regional and nuanced. In the evolution of words, this was another lesson, the advent of youth of words. The common words (common because of context not by spellings) were global, the popular languages evolution was tied to the trade, political and religious evolution of the land, and dialects and micro-variations were tied with the need for sub-identities. Diversity came from this intercourse.
The rise and rise of English
Beginning fifteen century, modern English started becoming popular (starting with Geoffrey Chaucer’s work, shining with Shakespeare’s aura, and expanding with the rise of British empire, the discovery of new world popularized English across all continents. English in some ways expanded many languages and influenced most languages, and eventually got influenced and enriched by them all. Each time a language rises in popularity, the values associated with the culture it is part of (ethos) and the culture it wants to be part of (cultural imperialism, globalization, or survival – depending on the subsequent interplay) come in play.
English brought with it the basis of constitution, a certain educational system, march of missionaries (but that was more because of Christianism evangelism and it happened with Portuguese or Spanish or French conquests also, just like the spread of Islam evangelism happened across Middle east, Central Asia and North Africa), mercantile system, basis of capitalism, and with it also the basis of socialism though making of communism and its growth in other parts of the world was also progressive power play and civilizational survival mechanism.
Even more English expanded into baby Geos (US, UK, Canada, India, Australia) and influenced culture further. We all know what Hollywood did! As a language expands, its culture expands, and as culture expands it needs to learn from other cultures to pretend and sometimes happily embrace two-way learning. So English learned words from all languages. Below are just less than 1% learning examples. And English also learned other European languages words, and often in places where its colonies had formerly other European rulers.
Guru, Moksha, Avatar, Curry, Dharma, Guru, Jungle, Jute, Khaki, Chatney, Cheetah, Nirvana, Pandit, Yoga, Pyjama, Thug, Dekko, Bangle
Allah, Quran, Masjid, Namaz, Haji, Muezzin, Hazz, Acha, Bura, Shawl, Jungle, Dacoit, Loot, Nahin, Shukriya, Mehenga, Masala, Jumma, Raat
Allowance, Apostrophe, Aviation, Ballet, Brunette, Chauffeur, Cliché, Elite, Fiancé, Heritage, Hotel, Insult, Kilogram, Magnificent, Nocturnal, Poetic, Premiere, Sabotage, Soup, Technique, Variety, Zest
Adios, Armada, Breeze, Cafeteria, Cigar, Embargo, Hola, Incognito, Loco, Siesta, Tomato, Vigilante
Dinghy, Adda, Jute, Bose, Naxalite, Juggernaut, Rasgulla
Typhoon, Tsunami, Karaoke, Sake, Manga, Anime, Emoji, Origami, Sushi, Tofu, Judo, Sumo, Ninja, Zen
Brainwash, Chi, Chop Chop, Dim Sum, Gung-Ho, Kanji, Lychee, Paper Tiger, Ramen, Soy, Tycoon, Wushu, Yen
Alfalfa, Assassin, Babul, Bazaar, Baazigar, Bombast, Bulbul, Candy, Caviar, Chess, Chinar, Daftar, Divan, Inamdar, Gul, Jasmine, Kusti, Mirza, Pasha, Path, Rank, Sepoy, Shahi, Van, Zamindar
Key Milestones in the Language Understanding in the Last 150 years
While the author loves and respects all languages, for this part, the focus of discussion is on the English language but the fundamentals we discuss here are language agnostics.
The birth of the Modern Dictionary
While work began on Oxford dictionary in 1857, it started being published as published in unbound fascicles in 1884. 1928 was an epoch-making year when the Oxford dictionary was published in 10 bound volumes. With time, other famous dictionaries started on their success path Merriam-Webster, Macmillan, Collins, and Cambridge amongst others. The advent of the Internet expanded these to many more players including Dictonary.com, Wiktionary, Google dictionary, and Urban dictionary. Common to most dictionaries are some of these traits. And even in other languages the schema of a dictionary follows these traits.
What it means for writers or linguists
What it means for AI programmers or NLP scientists?
Spelling is the beginning of comprehension, language consistency and word identity. Spellings establish the connection between letters and sounds and in process help understand language script better.
The simplest and most effective program is spell-checker. It corrects spell errors in input text. And we need spell checker for each language.
Most spell checkers would look at one or multiple paradigms.
Language and communication need to be unambiguous and poetic. Right pronounciation means the use of right sounds. The right sound helps in better reception of your words. It establishes the speaker’s aura and confidence in language use.
Text to speech and phonetics are popular use cases. These become the basis for speech synthesizers and the backbone of conversational language bots like Siri, Alexa, Echo, and Cortona. The voices behind text to speech engines have mastered pronounciation for style, impact and culture fit.
Part of Speech
Part of speech (POS) helps understand sentences by breaking down meaning of each word, establishing the grammar and understanding the context of the sentence. POS help in communication ease and in singular sentence understanding.
There are 11 common POS in English: Noun, pronoun, verb, adverb, adjective, preposition, conjunction, determiner, numeral, articles, and interjections.
Understand sentence formation and use it for finding search intent, AI writing, machine learning training readiness, besides being the backbone of information retrieval, information extraction, machine translation, question-answering speech synthesis and recognition. Effective POS tagging helps in most effective contextual phrase-finding and parameter building for any meaningful computational activity. You can find facts from a sentence, find intended meaning in words or phrases, and understand sentences intuitively, like humans, but just a million times faster.
Understanding the rules that made language is important in understanding any word or phrase usage.
Repeated usage of word form collocation. Also, some popular mention of words helps define titles of books, movies, poems, song’s lyrics, slogans, trademarks, and ad jingles. Anybody interested in creative side of words or trending side of words, as linguistics or statisticians, would find word usage pattern interesting. Creativity happens most when words used in one context are smartly used in another context. This insight helps in paraphrasing, generating new creative constructs, and learning from across subject corpuses.
Speech to writing and writing to speech relationship is often not the same. New problems come when you can spell a word in multiple ways as much as you can have a word that means multiple different contexts. In such a scenario finding the right word and its meaning, especially in non-text communication and in multiple language translation is not easy.
In the age of WhatsApp, words are not always spelled in any fixed way. As more languages are written in English for instant communication, the words have different spellings. This is also the case with translation between two languages. Also, in Optical Character Reorganization, where we digitize text from images, we need to understand alternate spellings, context, and social meanings.
Translation of a word or a phrase in one language to another language started with human compilations of a given language set of words and phrases being translated into target languages. The advent of consumer software in late eighties accelerated the demand for these further. The problem though is when translating sentences or speeches words take various forms, could turn idiomatic, or redundant, and can have different level of abstractions. Humans add a lot of value in understanding the culture, subject and context of content being translated.
While the basic translation dictionaries started as human effort, NLP has extended machine translation to a super-efficient science. Machine translations uses pretrained word vectors (word and their context and meaning and frequency from each other), along with context vectors, understand sentiments (sentiment analysis), understand classification (questions, word or phrases as colloquialism or named entities), and optimize linguistic variations (multiple translation fit candidates) to effectively paraphrase and eventually construct the sentence that reads and means the same or better if it were to be translated by an expert human translator.
In dictionaries, these end of sentence limiters are as simple as a period (.), question mark (?), exclamation (!). We intuitively understand the sentence ends here. Same way often empty line break at paragraph end or starting a new line with indenting can tell us a new para starts. These simple and intuitive to the reader or listener.
Sentence breaking and sentence understanding is not an easy task for computers. With the advent of HTML and 100s of tags how to write itself got optimized for layout and consumption. Computationally, it brought nested tags, injected keywords, and interplay of phrases and sentences, with some tags having no separators. How to read this text like how humans read it is a challenge. Sentence breaking and understanding is an essential part of NLP and significant noise in NLP algorithms come from unwarranted separators or lack thereof.
From words to affinity words: The origin of Thesaurus and its many variants
While dictionaries evolved each decade beginning 1880s, the next linguistic ned was to create a reverse dictionary, or Thesaurus. Thesaurus is a list of words, each representing an idea or context, and a list of words that can represent the idea. Peter Mark Roget in nineteenth century, put a book of synonyms and dictionary together and published Roget’s Thesaurus of English Words and Phrases in 1852. The book would go on to have 40 million copies and eventually an industry spawned with each major dictionary major having a Thesaurus and dedicated sites to cover the ever-growing list of synonyms.
Let’s see from a linguistic and computational perspective how Thesaurus would impact the ever-onward evolution of language.
What it means for writers or linguists
What it means for AI programmers or NLP scientists?
A word having the same or nearly the same meaning in each language. For example, tranquillity, calm and stillness are synonyms of peace. Synonyms help understand the sentiment and tone of each word. Knowledge of synonyms extend the edge to writing or communication. The writer or speaker can use the precise meaning contextually relevant word from the list of synonyms. This helps in removing repetition, adding imagery, and becoming precise and yet authentic.
Synonyms are the key to phrase disambiguation, paraphrasing, creative construct, and language translation. And there are a few challenges and opportunities.
A word of opposite meaning. If synonyms are similar meaning words, antonyms are opposite meaning words, which means list of antonyms of a word is also like a list of synonyms of another word, often the antonym of the word. Antonyms help add context and better situational understanding of a topic.
Distinguish between synonyms and antonyms especially when used as phrases. For example, “do not be angry”, can it be a synonym of “peace”, just like ‘lack of war”. In traditional constructs anger and war are contextual antonyms of peace but in modern constructs and paraphrasing, context like peace, lack of war, do not be angry, meditate, are contextual or intent synonyms.
As NLP understands contextual antonyms and synonyms better, they can use text mining and information extraction techniques along with classification and segregation models to build domain dictionaries and knowledge bases.
Another use case of contextual antonyms is in understanding sentiments and tying them with things or services can help classify user reviews and calculate Net Promoter Score better. Further, conversational interfaces self-help solution can do a tokenization of “thing or service” along with “feature/function” and “sentiment” to suggest solutions.
Besides being a list of synonyms and antonyms, Thesaurus evolved beyond the basic definition of list of idea and words representing it. The evolution included
- Category / subcategory models where a hierarchal relationship between one word and other word can be established eg. Gadgets and Electronics on one side and laptop and music-box on another. And we all know that this classification model became eventually an important factor in e-commerce.
- Equivalent terms where two words like song and rhythm could be close to each other and help in many creative takes. As paraphrasing industry evolved, and creativity, equivalent synonyms would become important.
- Associative relationships where two words or phrases may be related but not in hierarchy or as association, for example peace and meditation. And some of these relationships would become important in establishing concept hierarchy or visual hierarchy that would start the growth of visual thesaurus and layer relationship that would define photo-editing and visual stock world. For details, see visual synonyms.
The concept of thesaurus would expand to various industries in various ways and with time would grow Song Thesaurus, Clinician's Thesaurus, Art and Architecture Thesaurus, Legal Thesaurus, Religious Thesaurus, Cultural Thesaurus, Scientific Thesaurus and so on and so forth. There are hundreds of examples floating around.
The advent of keywords and metadata
As language grew, and digitization many times over, and as the world became more globalized, increasingly a small universal village, there was a need to reflect at how we look at words. MIT’s Stuart McIntosh and David Griffel, in 1967, proposed the need for the meta language for better digitization for information storage and retrieval. The concept meta existed for 2000+ years and people used reference sets, call it cards or tags to find records, books, or labels, but scientific use of metadata spread more from the 1970s.
Let’s see how keywords and metadata influenced both writing and NLP research.
What it means for writers or linguists
What it means for AI programmers or NLP scientists?
Information access and discoverability
The use of metadata started with library management systems, in organization of records, and expanded to all types of digital records, be it files or database records, and slowly helping in digital identity management. The use of metadata started with text but extended to images, videos, news, and shopping cards.
Any academic information or scientific information needs consistent metadata, and so does creative text. With time metadata started including collocations and phrases and started being used in context (in content) and as companion text with the files, articles, or records.
The use of metadata helped in information retrieval and with the advent of search engines, in performing and extending searches, and eventually becoming the essence of e-commerce and e-services.
NLP allows the ability to suggest keywords by machines rather than humans and thereby extend discoverability. By generating 2x keywords and humans selecting from the range, reinforcement learning also happens. Similarly, when keywords are used to access and consume information (read beyond threshold time) the data would go in reinforcement training and search rankings.
In large enterprises and in governments, information searching, and content reuse is important. Organized metadata and keywords helped in organizing content for reuse, intra-department sharing, and for better supply chain management.
Metadata also helps in enterprise level use/reuse of assets, scale up community efforts, build, and expand large powerful knowledge bases. Metadata also helped in better communication, human to human shared through news and web, or computer to computer.
Regulatory compliance demand and data management best practices
As more understanding of metadata and keyword dawned, writers became more regulation, safety, identity, IP-copyright aware, and with time plagiarism was easy to detect, and so paraphrasing became both science and art, as much as a software aid.
Writing not to be caught cheating meant being exposed to multiple influences. Globalization accelerated the trend. This awareness and the ease of finding information across cultures, lands, languages from the comfort of one’s sofa allowed lot more pollination of word, of other language and context, in source language. Imagery of each place and each culture slowly started filtering into each language, as much as availability of global goods and services everywhere did.
Metadata could be tied to software and include file type, file size, date of creation and modification, author name, and access priveleges. Metadata could be added by hardware like camera and phones on images. These metadata help in maintaining historical records, change management and access and ensure security best practices. The use of metadata also helps in data security, better management of data identity and easing the entire data management process. Metadata helped in better internal organization of large-scale content, better rights management helping in IP protection, and better archival best practices. With times metadata would extend to GDPR compliance.
Expanding one-to-many context
One of the most important use of metadata and keywords is to expand one to many contexts of text (or for that matter news or image or video) beyond the content in the text. So, metadata becomes an expanded basket of words and phrases that can be used to search the record or provide the article that users want to read based on a search context. Consciousness of search readiness and user consumption patterns influence many writing paradigms.
For NLP scientists, opportunities abound in multi-context metadata and keywords. A few instances:
Metadata, keywords and language creation and expansion
Google realized early that the Internet would be bigger than all the libraries in the world combined. With a precise AdWords and AdSense, they gamified keywords and metadata to be systematically exploited for search engine optimization (SEO) and search engine marketing (SEM). On one side, users would look at demand for keywords and identify information gaps or optimize existing information to include the metadata or keywords that will bring the answers. With time metadata became more keywords in web pages and keyword stuffing (a comma separated series of keywords used at the end of the article or in the middle became less popular as Google penalized it.
For writers, the consciousness of the keywords became a habit of writing in context. Information architecture changed and nature of headings, synopsis, snippets, and associated hyperlinked changed based on understanding of keywords and metadata.
The more relevant keywords were, the more relevant they would become in organic searches or in People Also Ask answer bot. With SEM, google did the bidding war of keywords and used it to place advertisements on high demand or long tail keywords. The many changes in search processes followed by an army of over 5 million plus fulltime web content creators ensured the evolution of language on the web encapsulated a lot of keywords.
If Google would extend search with keywords, shopping sites led by Amazon.com extended keywords for product or service discovery for sale or rent needs. And some of these sites built engaging community with recommendation services and they extended the use of keywords with special creative constructs to factor world class review and recommendation processes.
Hashtags and the birth of a new digital currency
Let’s talk about hashtags, the fourth significant global milestone in the definition and expansion of words. Also, hashtags have a big correlation with images and significant part of global data is because of images and videos only.
A hashtag is a social media phenomenon, starting in 2007, and now it has extended far beyond social media into the realms of creativity, branding, marketing, design, and of course social media. A Hashtag is a unique set of characters without space preceded by sign #. Let’s see how something as simple as hashtag becomes so powerful and in process let’s explore the impact of hashtags on language formation and NLP and AI.
Type of hashtag
What it means for writers or linguists
What it means for AI programmers or NLP scientists?
All common words of all common languages written in English or in native language are hashtag candidates. For example, the following hashtags mean happiness: #bonheur #Glück #しあわせ #आनंद #আনংদ #ارتباط #kebahagiaan #幸福 #ہپنس #felicidad #felicidade #felicitas #సుఖము, #ಸುಖ, #радость
With the use of hashtags people started using multiple languages. One common trend was English hashtag for popularity but then users would add lot more native language hashtags also. This is interesting as it broadens the contextual interpretation of words and hashtags across languages and allow formation of multi-language concept corpus a deep bed for NLP research.
The evolution of hashtag phrases in social media evolved beyond collocations in language (popular 2 or 3 words that come together). Social media created its own collocations. Check these few famous samples: #picoftheday, #followforfollow, #dogsofinstagram, #adoptdontshop, #moodygrams, #trustedseller, #healthyliving, #powerlifting, #flashbackfriday, #crueltyfree, #makeupbyme, #roamtheplanet, #childhoodunplugged, #runningmotivation, #worksucks, #wearthisnext, and #sheisthebest.
The making of a new vocabulary or focused verbiage meant opening a new market for creativity and expansion of words and phrases. By finding words that “contains” in phrases and finding hashtag frequency, and affinity hashtags that come together with each hashtag a world of social NLP kickstarts. Imagine now you have:
There are thousands of possibilities to find and categorize phrases or n-grams on variety of context, parameters, purpose, and filters. Let’s say creative playground is infinite, and so is the NLP playground.
Hashtag simplified the inclusion of names (of people, places, things, products) into the global social vocabulary.
In some ways, there is a branding and imagery in related hashtags. See these hashtags for New York and feel it’s branding: #NewYorkCity #NYC #BigApple #TheCityThatNeverSleeps #EmpireState #NewYorkState #IloveNY #NewYorker #NewYorkLife #NewYorkLove #NewYorkProud #NewYorkCityscape #NewYorkSkyline #NewYorkSummer #NewYorkWinter #CentralPark #TimesSquare #Broadway #5thAvenue #Soho #WallStreet #LowerEastSide.
Try another example. Taj Mahal = #tajmahal #india #agra #monument #symboloflove #peace #beautiful #architecture #history #heritage #travel #bucketlist #mustsee #photooftheday
For NLP experts, hashtags open a way to mine many new narratives.
Memes, emojis and emoticons
Memes and emojis as hashtags (mostly emojis, but memes could be full of emojis) help in language personalization. The common language that we speak and the language we write for most centuries were different. Hashtags are now combining these into a new language, spoken and written, visual and fun, purposeful and yet trendy.
Emojis help in bring in exact feeling, excitement or grief, irony, or fun, to the faceless communication. Emojis rally group behavior and are a hit in instant communication, especially in virtual world with global groups. Emojis are a good substitute to body language in text communication.
Sample these emojis: 😂 ❤️ 🤣 👍 😭 🙏 😘 🥰 😍 😊 🙈 🙉 🙊 💥 💫 💦 🍇 🍈 🍉 🍊 🍋 🍌 🍍 🚣 🗾 🏔️ ⛰️ 🌋 🗻 🏕️ (without any words they are conveying the meaning).
Emoticons are keyboard characters that represent emotions. For example, :) is happy face while :( is a sad face.
The written language for most part was either feelingless or too scholarly, emojis help democratize this language for all and allow all people to express their feelings best, with minimal effort, and amazing impact.
Emojis are character pictorials and can represent human, animal, nature, thing, or service. For NLP research emojis can be used as a:
The evolution of language for most part had simple rules – the entire sentence is either in one language or other (often it was entire article). With popular culture infusion, we started seeing mention of English in other language phrases. In some way, Hinglish (Hindi+English, where Hindi is written in English and freely uses Hindi and English word) is the unofficial conversational language of insta messaging in India.
Sample these popular Indian movie titles or ad lines: ‘Ishq In Paris’, ‘Carry On Munna Bhai’, ‘jaane bhi do friends’, ‘pyaar mein no sorry’, ‘SIP sahi hain’, ‘Yeh hi hai right choice baby’, ‘Hungry Kya’, ‘Hum mein hai hero’, or ‘Life ho to aisi’. The trend moves into songs, lyrics, and daily conversations.
Any creative research that leads to book titles, movie titles, poem lines, song lyrics, advertisements, captions, hero text, trademarks, or creative text in visual art stands to gain by understanding the power of bilingual text, especially hashtags.
As an NLP you may want to research on:
Trending hashtags are hashtags that get disproportionate high per hour mentions compared to other hashtags. These often have a pile on effect on an important event or news, or a troll-like behavior on some event concerning celebrity, justice, politics, religion or entertainment or sports.
Trending hashtags can be caused by a swell-structured marketing campaign with many influencers or micro influencers simultaneously talking about the same thing (brand, service, or opinion).
As an NLP researcher, you may want to use:
Bringing language, creativity, big data, and NLP together for the future of human-machine communication expansion
Through the previous sections of the article, you saw how language, especially English expanded (and many lessons listed here are common to most other languages that have reasonable volume and economies built around them). Let’s see how the interactions of various elements of language, and the growth of business in various areas of digitization, impact the broader development of language, creativity, and NLP-led AI.
Aspect of language
What it means for writers or linguists
What it means for AI programmers or NLP scientists?
Sound across languages come from the interplay of words, vowels, and consonants, with rising or falling pronounciation, and in a way impacting how that word is spoken from which part of mouth, lip action, tongue play and jaw motion.
Sounds needed to be spoken and needed to be heard, and so common words would evolve that would give shape to messages. The Sanskrit word Om (with a long humming of m) was an interplay of a word whose sound seems links a gong and joins an open lip play with closed and bringing concentration of the extension of sound in the mind, and thereby bringing a calm, serene, connected feeling. Was this accidental. No! Linguists of the era, where memorization and concentration, was important for continuation and storage of information, especially oral, mastered this art.
The art of using sounds to find or create words in right context is now slowly getting lost. Sound optimized words survive in advertising, jingle creation and hobbyist poetry fields but the progress on sounds and rhymes for repetition, recall and happy understanding is not what it could have been.
The process of defining a sound, when translated to another language brings in a lot more spelling variation, and this extends the lemma ranges, word stemming process, the heart of tokenization and keyword discovery. Audio signal analysis methods and pattern recognition is the basis of sound recognition, and the scale of sound recognition comes with data processing, feature extraction and classification—all of these are thriving fields for NLPs. Sound event detections and classification of all sounds in a context is a thriving AI area.
When sound recognition is done well, it opens text to speech, conversational interfaces response, detection of unique situations like diseases (patient and sound), threats (drilling, motion, trespass), meditation and stress management (calm, nature sounds), psychology (interview, negotiation, body language measure), noise pollution (detecting machine sounds, industrial sounds).
Some unique use cases for sound and NLP include:
Visual synonym is a less understood concept. Let’s explain this. When we think something, we think in layers, we create a scene in mind. The scene may have only nature, or human beings, an element of colour, emotions, feelings, abstracts, confusion. Thoughts are layer and like sea waves they change fast, and yet the subconscious remembers the layers and shapes our dreams, and our relationship of objects, emotions, colours, words, and voices. In this context, a visual synonym is like the names of multiple layers of possible things that complete a scene, and if you name them together, you can make a scene, or if you are a design artist working on design, you can make layers.
The closest visual synonyms are image tag affinity suggestions you get in image stock web sites (Adobe Stock, Shutterstock, Pixabay etc) or in Google or Bing Images. Essentially you search a concept, and you get an affinity words or phrases that other designers are using in similar creativity.
Now with the advent of creative art using text-to-image models with algorithms such as Dall E-2, MidJourney, Google’s Imagen AI model, an unprecedented level of photorealism can be created with words. What we did say words. The problem is these words still are either visual tokens or art types/art styles or art metadata (size, colour), they are not absolute scene definition that can go to hundreds of words to describe and then make the precise art. But the models will evolve, and these would need understanding visual synonyms, visual feeling, scene definition, art rendering and many similar factors. Believe me this would be goldmine for NLP/AI for text researchers.
Take this further, currently image/video stock systems work on image discovery based on keyword affinity, search metrics and consumption patterns what if a smarter visual tagging system could tie the creation of next generation of visual synonyms (keywords, feelings, context, scene) in layers that are in the photo-editing software with the text and intent tokens from the web articles and social posts that use the finished engine. The cycle would generate 100x more power in visual parametrization. This gold mine has not only gold but also platinum, diamonds, and even rarer pricier relationships, go dig and understand language and thought relationship like never before.
Each language is limited with words but has unlimited potential for names – be it name of person, things, brands, products, or services. And some of these names come from different languages and ideally names are universal across languages, though sometimes different languages have different meanings or words for those names.
Named Entity Recognition is an essential part of machine learning and AI. Using combination of logic, name detection and performance parameters an unlimited range of AI applications are possible. For example, sample these:
With the advent of modern virtual-real-virtual world of communication where 3S: Search, Social and Shopping talk together with the 3Cs: Content, Community and Context, a new language is born, call it conversational language. It has many ways to differentiate itself.
The scripts of YouTube videos, the discussions in IM, conversational blogs, community building processes, discussion boards, conversational messages scripts, software aided workflows, software IOT talk, wikis, slack talk, slowly things and communication around us becoming conversational and we need to participate in and accelerate the process as writers and communicators and as NLP experts and AI geniuses.
NLP in conversational language has its own fun as NER, tokenization and conversational template identification go together. Frequent scenarios and context have a response from a preselected or near similar contextually same but differently worded strings, call them templates. The identification of templates and the segregation of small talk from substance and context is both a science and art in NLP.
There is a challenge to find the dialog with speaker in instant messaging and in a trolling discussion or misinformation, finding the source also has challenges in NLP tied with NER and conversational language understanding. Similarly, finding the context or making it more useful, and using that for better recommendation systems is a continual endeavour.
Enhanced training outcomes and better user participation in training and community development scenario need understanding of conversational language, and so does better digital marketing performance.
The making of conversational agents in self-help, presales, knowledge management, machine concierge desks, and interacting with IOT the conversational ways are all emerging technologies.
As communication evolves, and as more people interact across different cultures, there are more and more instances of multiple language words and sounds in the same sentence or same monologue/dialog.
Intense social media data mining requires identifying tokens across languages or converting/translating them on the fly to a common language for better context prediction and recommendation. Multilingual corpus development, Part-Of-speech taggers for bilinguals, sentiment analysis for hybrid languages, conversation context finder for different scripts and languages is all an area that needs far more investment than we really have put in.
Collaboration and Creation
For most history, language evolution and word evolution were passive. Originally, travellers and traders from one land to another carried back information of culture and learnings, and then scholars and writers and poets, learned from the best of each other right into the second half of the twentieth century.
Collaboration for structured creation, the communities that we know in the last 20 years have accelerated the collaboration that happened in last 5000 years. So, it won’t be surprising that collaboration and creation would be 20-50x again in the next 10 years. And this would be everywhere and in new forms of communication giphy, animated gifs, emojis, text brushes, fonts, but even more meta-readiness, co-art creation, “co-edit of text, video, AR and VR”, collaboration knowledge building ….
Creativity next would always be the frontier to shoot for. NLP and AI expert have a unique role in defining creative communication and expand:
While a greater study on language evolution and the many possibilities it has for linguists, writers, creative people, NLP, and AI experts is an endless field, a happy immersion in the possibilities is an invite to creative fascination and tech understanding of a world where language is more and more becoming democratic. The speaker of the language is no more a trained language aware human being but could be third and fourth language speaker or a machine. And the interaction and discussions, the speed of search and social, the ever-demanding needs of marketers, and the never satiated desire of creative world will allow this pollination of language to become a nonstop laboratory of infinite possibilities.
I hope through this article, a part of you starts looking at the evolution of language and its accelerated growth differently from business, technology, societal impact, and creative lens.
About The Author
Pawan Nayar is Founder and Chief Thought Officer at Cretorial Media Services.
Pawan is a polygamist and living a life with four love stories. Writing is his first love, creativity second, NLP third. (And Shalini, Pawan’s wife, is his zeroth love). Pawan believes everybody has a right to be creative and a duty to be.
Global technologies and businesses have helped people be far more visually creative than text/message creative. Search, social and shopping have divided trillions of daily queries according to their business lines. In that backdrop, Pawan is busy building text-based creativity at scale with the goal to help each person turn more creative writer, reader, searcher, connector, and doer. Many more things to discover when you connect.