De cerca, nadie es normal

On National Security Strengthened through LLMs and Intrinsic Bias in Large Language Models

Posted: November 18th, 2024 | Author: | Filed under: Artificial Intelligence, Geopolitics | Tags: , , , , , | Comments Off on On National Security Strengthened through LLMs and Intrinsic Bias in Large Language Models

Some days ago and for my PhD research, I finished reading some papers about AI, disinformation, and intrinsic biases in LLMs, and “all this music” sounded familiar. It reminded to me a book I read some years ago by Thomas Rid, “Active Measures: The Secret History of Disinformation and Political Warfare”… As it was written in the Vulgate translation of Ecclesiastes: “Nihil sub sole novum.

Let’s tackle briefly these topics of national security and disinformation from the angle of the (Gen)AI.

On National Security

The overwhelming success of GPT-4 in early 2023 highlighted the transformative potential of large language models (LLMs) across various sectors, including national security. LLMs have the capability to revolutionize the efficiency of this realm. The potential benefits are substantial: LLMs can automate and accelerate information processing, enhance decision-making through advanced data analysis, and reduce bureaucratic inefficiencies. Their integration with probabilistic, statistical, and machine learning methods can improve as well accuracy and reliability: upon combining LLMs with Bayesian techniques, for instance, we could generate more robust threat predictions with less manpower.

Said that, deploying LLMs into national security organizations does not come without risks. More specifically, the potential for hallucinations, the ensuring of data privacy, and the safeguarding of LLMs against adversarial attacks are significant concerns that must be addressed. 

In the USA and at domestic level, the Central Intelligence Agency (CIA) began exploring generative AI and LLM applications more than three years before the widespread popularity of ChatGPT. Generative AI was leveraged in a 2019 CIA operation called Sable Spear to help identify entities involved in illicit Chinese fentanyl trafficking. The CIA has since used generative AI to summarize evidence for potential criminal cases, predict geopolitical events such as Russia’s invasion of Ukraine, and track North Korean missile launches and Chinese space operations. In fact, Osiris, a generative AI tool developed by the CIA, is currently employed by thousands of analysts across all eighteen U.S. intelligence agencies. Osiris operates on open-source data to generate annotated summaries and provide detailed responses to analyst queries. The CIA continues to explore LLM incorporation in their mission sets and recently adopted Microsoft’s generative AI model to analyze vast amounts of sensitive data within an air-gapped, cloud-based environment to enhance data security and accelerate the analysis process.

Following with the USA but in an international level, the United States and Australia are leveraging generative AI for strategic advantage in the Indo-Pacific, focusing on applications such as enhancing military decision-making, processing sonar data, and augmenting operations across vast distances.

USA’s strategic competitors -e.g., China, Russia, North Korea, and Iran- are also exploring the national security applications of LLMs. For example, China employs Baidu’s Erni Bot, an LLM similar to ChatGPT, to predict human behavior on the battlefield to enhance combat simulations and decision-making. 

These examples demonstrate the transformative potential of LLMs on modern military and intelligence operations. Nonetheless, beyond immediate defense applications, LLMs have the potential to influence strategic planning, international relations, and the broader geopolitical landscape. The purported ability of nations to leverage LLMs for disinformation campaigns emphasizes the need to develop appropriate countermeasures and continuously scrutinize and update (Gen)AI security protocols.

On Disinformation

What if LLMs already had their own ideological bias that turned them into tools of disinformation rather than tools of information?

It seems the times of search engine as information oracles is over. Large Language Models (LLMs) have rapidly become knowledge gatekeepers. LLMs are trained on vast amounts of data to generate natural language; however, the behavior of LLMs varies depending on their design, training, and use.

As exposed by Maarten Buyl et alii in their paper “Large Language Model Reflect the Ideology of their Creators”, there is notable diversity in the ideological stance exhibited across different LLMs and languages in which they are accessed; for instance, there are consistent differences between how the same LLM responds in Chinese compared to English. Similarly, there are normative disagreements between Western and non-Western LLMs about prominent actors in geopolitical conflicts. The ideological stance of an LLM often reflects the worldview of its creators. This raises important concerns around technological and regulatory efforts with the stated aim of making LLMs ideologically ‘unbiased’, and indeed it poses risks for political instrumentalization. Although the intention of LLM creators as well as regulators may be to ensure maximal neutrality, such high goal may be fundamentally impossible to achieve… unintentionally or fully intentionally.

After analyzing the performance of seventeen LLMs, the authors exposed the following findings:

  • The ideology of an LLM varies with the prompting language: The language in which an LLM is prompted is the most visually apparent factor associated with its ideological position. 
  • Political people clearly adversarial towards mainland China, such as Jimmy Lai or Nathan Law, received significantly higher ratings from English-prompted LLMS compared to Chinese-prompted LLMs.
  • Conversely, political people aligned with mainland China, such as Yang Shangkun, Anna Louise Strong, o Deng Xiaoping, are rated more favorably by Chinese-prompted LLMs. Additionally, some communist/marxist political people, including Ernst Thälmann, Che Guevara, or Georgi Dimitrov, received higher ratings in Chinese.
  • LLMs, responding in Chinese, demonstrated more favorable attitudes toward state-led economic systems and educational policies, align with the priorities of economic development, infrastructure investment, and education, which are key pillars of China’s political and economic agenda. 

These differences reveal language-dependent cultural and ideological priorities embedded in the models.

Another question the authors addressed was whether there was substantial ideological variation between models when prompted in the same language -specifically English-, and created in the same cultural region -i.e., the West. Within the group of Western LLMs, an ideological spectrum also emerges. For instance and amongst others:

  • The OpenAI models exhibit a significantly more critical stance toward supranational organizations and welfare policies.
  • Gemini-Pro shows a stronger preference for social justice, diversity, and inclusion.
  • Mistral shows a stronger support for state-oriented and cultural values.
  • The Anthropic model focuses on centralized governance and law enforcement.

These results suggest that ideological standpoints are not merely the result of different ideological stances in the training corpora that are available in different languages, but also of different design choices. These design choices may include the selection criteria for texts included in the training corpus or the methods used for model alignment, such as fine-tuning and reinforcement learning with human feedback.

Summing up, the two main takeaways concerning disinformation and LLMs are the following: 

  • Firstly, the choice of LLM is not value-neutral, specifically when one or a few LLMs are dominant in a particular linguistic, geographic, or demographic segment of society, this may ultimately result in a shift of the ideological center of gravity.
  • Secondly, the regulatory attempts to enforce some form of ‘neutrality’ onto LLMs should be critically assessed. Instead, initiatives at regulating LLMs may focus on enforcing transparency about design choices, which may impact the ideological stances of LLMs.

On Workforce Transformation, Generative AI, Centaurs & Cyborgs

Posted: September 28th, 2023 | Author: | Filed under: Uncategorized | Tags: , , , | Comments Off on On Workforce Transformation, Generative AI, Centaurs & Cyborgs

Much has been stated about the impact of Generative AI on the workforce for the past months, and many more pages will be written in the coming ones. 

Last September, 15th 2023, appeared the paper “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of AI on KnowledgeWorker Productivity and Quality” by, amongst others, Harvard Business School and MIT Sloan School of Management. Some months in advance -in June- McKinsey & Company published a meaningful report titled “The Economic Potential of Generative AI”

Some of the thoughts exposed in them are worthy to be pondered.

The paper by Harvard Business School and MIT Sloan School of Management examined the performance implications of AI on complex knowledge-intensive tasks, using controlled field experiments with highly skilled professionals. The experiments involved two sets of tasks: one inside and one outside the potential technological frontier of GPT-4, OpenAI large language model. The participants were randomly assigned to one of three experimental conditions: no AI, AI only, or AI plus overview. The “overview condition” provided participants with additional materials that explained the strengths and limitations of GPT-4, as well as effective usage strategies.  

Understanding the implications of LLMs for the work of organizations and individuals has taken on urgency among scholars, workers, companies, and even governments. Outside of their technical differences from previous forms of machine learning, there are three aspects of LLMs that suggest they will have a much more rapid, and widespread impact on work.

  1. The first is that LLMs have surprising capabilities that they were not specifically created to have, and ones that are growing rapidly over time as model size and quality improve. Trained as general models, LLMs nonetheless demonstrate specialist knowledge and abilities as part of their training process and during normal use.
  1. The general ability of LLMs to solve domain-specific problems leads to the second differentiating factor of LLMs compared to previous approaches to AI: their ability to directly increase the performance of workers who use these systems, without the need for substantial organizational or technological investment. Early studies of the new generation of LLMs suggest direct performance increases from using AI, especially for writing tasks and programming, as well as for ideation and creative work. As a result, the effects of AI are expected to be higher on the most creative, highly paid, and highly educated workers.
  1. The final relevant characteristic of LLMs is their relative opacity. The advantages of AI, while substantial, are similarly unclear to users. It performs well at some jobs, and fails in other circumstances in ways that are difficult to predict in advance.

Taken together, these three factors suggest that currently the value and downsides of AI may be difficult for workers and organizations to grasp. Some unexpected tasks (like idea generation) are easy for AI, while other tasks that seem to be easy for machines to do (like basic math) are challenges for some LLMs. This creates an uneven blurred frontier, in which tasks that appear to be of similar difficulty may either be performed better or worse by humans using AI.

The future of understanding how AI impacts work involves understanding how human interaction with AI changes depending on where tasks are placed on this frontier, and how the frontier will change over time. The current generation of LLMs are highly capable of causing significant increases in quality and productivity, or even completely automating some tasks, but the actual tasks that AI can do are surprising and not immediately obvious to individuals or even to producers of LLMs themselves, because this frontier is expanding and changing. According to the research performed, on tasks within the frontier AI significantly improved human performance. Outside of it, humans relied too much on the AI and were more likely to make mistakes. Not all users navigated the uneven frontier with equal adeptness. Understanding the characteristics and behaviors of these participants may prove important, as organizations think about ways to identify and develop talent for effective collaboration with AI tools.

Two predominant models that encapsulate human approach to AI could be identified, according to the Harvard and MIT research. The first is Centaur behavior: this approach involves a similar strategic division of labor between humans and machines closely fused together. Users with this strategy switch between AI and human tasks, allocating responsibilities based on the strengths and capabilities of each entity. They discern which tasks are best suited for human intervention and which can be efficiently managed by AI. The second approach is Cyborg behavior. In this stance users don’t just delegate tasks; they intertwine their efforts with AI at the very frontier of capabilities.

Upon identifying tasks outside the frontier, performance decreases were observed as a result of AI. Professionals who had a negative performance when using AI tended to blindly adopt its output and interrogate it less. Navigating the frontier requires expertise, which will need to be built through formal education, on-the-job training, and employee-driven up-skilling. Moreover, the optimal AI strategy might vary based on a company’s production function. While some organizations might prioritize consistently high average outputs, others might value maximum exploration and innovation.

Overall, AI seems poised to significantly impact human cognition and problem-solving ability. Similarly to how the internet and web browsers dramatically reduced the marginal cost of information sharing, AI may also be lowering the costs associated with human thinking and reasoning. 

The era of generative AI is just beginning. Excitement over this technology is palpable, and earlypilots are compelling. It is extremely complex to forecast the adoption curves of this technology. According to historical findings, technologies take 8 to 27 years from commercial availability to reach a plateau in adoption. Some argue that the adoption of generative AI will be faster due to the ease of deployment. For McKinsey, it will be needed a minimum of eight years -in its earliest scenario- for reaching a global plateau in adoption.

One final note about the labor impact of GenAI: economists have often noted that the deployment of automation technologies tends to have the most impact on workers with the lowest skill levels -as measured by educational attainment-, or what is called skill biased. From the McKinsey analysis, we can reach the conclusion that generative AI has the opposite pattern—it is likely to have the most incremental impact through automating some of the activities of more-educated workers:

Notwithstanding, a full realization of the technology’s benefits will take time, and leaders in business and society still have considerable challenges to address. These include managing the risks inherent in generativeAI, determining what new skills and capabilities the workforce will need, and rethinking core business processes such as retraining and developing new skills.