Portal Berita Malaysia [email protected]

Deep Diving AI on Steroids: Redefining Survey & Analysis

By Dr Rais Hussain

Artificial intelligence (AI), with its array of software code replicating human skills and cognitive abilities, has expectedly made an indelible mark on scientific research. Known for its ability to sift through vast amounts of structured and unstructured data quickly and accurately, uncovering the intricate patterns invisible to the human eye, AI keeps its high promise of increased efficiency by diverting a significant amount of researchers’ time from mundane work to deeper analysis, synthesis, conceptualisation and further connecting dots.

Even more, AI expands possibilities beyond what one could previously imagine —researchers in social sciences are now way better equipped to understand complex intricacies of associations between ideas, beliefs, attitudes, behaviours, demographics and other contexts present among humans.

However, just like in other fields of AI application, extra caution is needed to ensure higher reliability, transparency and ethical alignment in the use of technology. The compelling synergy between AI and survey research again highlights the broader symbiotic relationship between humans and machines, leveraging their distinct “blind spots” for more profound societal benefits when appropriately utilised.

Indeed, AI is still far from materialisation in its strong form, defined as an intelligent algorithm capable of solving a wide range of intellectual problems at least on par with the human mind. However, even in its weaker form (an intelligent algorithm that imitates the human mind in solving specific highly specialised problems), AI has already revolutionised the world of survey creation, administration, and analysis.

Recent advancements in Large Language Models (LLMs), a branch of AI dedicated to natural language processing (NLP) tasks, have significantly improved the ability to ask the right questions to the right people in the right way. LLMs, like GPT-2, T5 or GPT-3, function as complex conditional distributions over natural language (probability that words co-occur based on statistical inferences), generating synthetic texts nearly indistinguishable from human-created.

Survey instrument creation and enhancement

Today, many Software as a Service (SaaS) platforms can autogenerate questionnaire inventory given a few keyword prompts related to the phenomenon of interest or survey objectives. This provides a quick head start in survey instrument creation subject to further validation by human researchers with or without AI aid.

Furthermore, intelligent algorithms can tweak the structure of survey questions, significantly increasing their conciseness, readability, and clarity while eliminating potential biases and distortions (for example, leading or double-barrelled questions, etc.) towards greater research objectivity and improved response rate.

Given historical survey data availability, machine learning (ML) algorithms can further fine-tune questionnaire design by predicting the items likely to elicit more accurate and meaningful responses from a specific target audience.

Other recently developed advanced ML algorithms (for example, post-least absolute shrinkage and selection operator / post-Lasso method) can select the most relevant inventories from a massive set of candidate indicators to identify the causal effect of one unobserved variable on another.

Insights beyond measurable variables

AI can extract meta-meaning from survey response patterns. Electronically conducted surveys can combine these insights with other subtle but salient associations, such as the tiniest differences in time spent by respondents across different question groups. These inputs enable the clustering of respondents into distinctive categories, unveiling unexpected behavioural insights for researchers — accidental discoveries for future hypothesis development, surpassing the mere measurement of variables of interest.

We can also think of AI-assisted analysis of combined extensive historical survey data across various questionnaires (measuring different concepts or the same concepts with varying question inventories) to pinpoint patterns, correlations, and more effective question structures, potentially leading researchers towards crafting new hypotheses and new optimal ways of asking questions for enhanced insights.

Dynamic survey administration

In electronic surveys, there is an opportunity to harness AI’s profound ability to understand respondents’ behavioural patterns beyond the reach of the human grasp to improve response rate by dynamically adjusting the set of questions asked and their sequence. At the same time, researchers can maintain the comparability of the results through other advanced statistical methods such as Rasch modelling.

Text analytics for open-ended questions

Just like with automatic survey item creation, AI provides a researcher with a quick start in categorising open-ended survey responses and other longer and thicker narrations derived from qualitative surveys, significantly reducing the manual effort and time required. However, despite considerable progress in text analytics — the area where AI really shines, results remain far from ideal. They can only serve merely as “raw materials” that require further refinement by skilled analysts.

Synthetic human samples

This near-futuristic application or neural networks in survey research was born as an alternative perspective on a well-known AI weakness — its tendency to inherit biases (racial, gender, economic, etc) from the data it is trained on.

In a very recent study, researchers have demonstrated that when LLM is correctly conditioned on a massive amount of “socio-demographic backstories from real human participants in multiple large surveys”, it can emulate response distributions that go “far beyond surface similarity” with an actual human sub-population — it is entirely “nuanced” in its multifaceted complexity of the interplay between attitudes, beliefs and behaviours and socio-cultural contexts that characterise the targeted population.

The above profound results open a plethora of possibilities for survey research. For example, AI can simulate hard-to-reach populations, not to replace them but to study them more effectively. Other novel opportunities include testing theories and hypotheses about human behaviour at a grand scale and speed, using synthetic human samples to generate new hypotheses to be subsequently confirmed in human populations, pilot-testing questionnaires, etc.

Although AI can significantly enhance survey research, it’s crucial to bear in mind its well-known weaknesses:

An AI model can be only as good as the data samples it was trained on (and AI requires massive data samples). Such dependency can inadvertently reinforce existing biases, which, as we already saw, can be a desired trait but can also become problematic if not appropriately managed.

Internal logic behind the decisions by these deep neural networks represents a black box even for those who programmed them. Although AI can cluster survey respondents or responses or chunks of text in a certain way, indicating that those are distinctive categories, it provides no clue why it is so — answering this question requires further analysis by a researcher.

Ironically, while machines greatly surpass humans in computational ability and in detecting data and patterns that, although physically present, are invisible to the human eye, only humans possess the ability to preserve universal unseen laws without physical manifestation in the form of massive “data” arrays but in the form of wisdom.

For the above reasons, AI can encounter inputs for which outputs can be unpredictable, bizarre and even dangerous.

Therefore, there is an urgent need to cultivate essential skills for effective collaboration with AI, including a deep understanding of AI tools’ capabilities and limitations, thus allowing researchers to discern when to rely on AI-generated insights and when to apply their professional judgment.

This brings the argument back to the importance of the national education system and not out of context of broad distinctive national AI strategy which exists in many highly innovative nations such as Canada, China, Great Britain, Singapore, South Korea, the United States or the United Arab Emirates.

If research is the driver of national progress, then obviously, in its current form, it demands widely taught programming skills, including a thorough understanding of AI and its various branches, including the crucial aspects of ethics and governance associated with such systems.

Dr Rais Hussin is the CEO of Technology Park Malaysia (MRANTI), a government agency aimed at accelerating demand-driven R&D in technology, positioning Malaysia as a nucleus for innovation and a leading technology producer nation.