Large Language Models and Sentience – When the System Knows the Criteria
Posted: December 8th, 2024 | Author: Domingo | Filed under: Artificial Intelligence, Large Language Models | Tags: AI, artificial intelligence, deep computational markers, global workspace theory, LLMs, Monte Carlo Simulation, perceptual reality monitoring theory, reinforcement learning | Comments Off on Large Language Models and Sentience – When the System Knows the CriteriaSince it seems we are developing AI models gradually more intelligent -probably owing to this quantum leap that GenAi has meant-, let’s raise the level: what about their sentience? I.e., their capacity for feeling or perceiving consciousness.
Last week I have the pleasure to talk to my good friend Gregory about AI, ethics, the future of work, AI and geo-politics… and he recommended to me the book “The Edge of Sentience” by Jonathan Birch. I do appreciate his recommendation. There is a chapter devoted to LLMs and the gaming problem. Let’s analyze what this problem is about.
According to Birch, sentience does not require or imply any particular level of intelligence. Yet intelligence and sentience are related: intelligence can make sentience easier to detect. The AI case, however, shows us that intelligence of certain kinds can also make it more difficult to assess the likelihood of sentience. For the more intelligent a system is, the more likely it will be able to game our criteria. What is it to ‘game’ a set of criteria? Gaming occurs when systems mimic human behaviours that are likely to persuade human users of their sentience without possessing the underlying capacity. No intentional deception is needed for gaming. It could happen in service of simple objectives, such as maximizing user-satisfaction or bettering interaction time. When an artificial agent is able to intelligently draw upon huge amounts of human-generated training data (as in LLMs), the result can be gaming of our criteria for sentience.
The gaming problem initially leads to the thought that we should ‘box’ AI systems when assessing their sentience candidature: that is, the AI model must be denied access to a large corpus of human-generated training data. However, this would destroy the capabilities of any LLM. According to the author, what we really need in the AI case are deep computational markers, not behavioral markers. We could use computational functionalist theories -such as the global workspace theory and the perceptual reality monitoring theory– as sources of deep computational markers of sentience. If we find signs that an AI system has implicitly learned ways of recreating them, this should lead us to regard it as a sentience candidate. Nevertheless, the main problem with this proposal is that we currently lack the sort of access to the inner workings of LLMs that would allow us to reliably ascertain which algorithms they have implicitly picked up during training.
Some years ago I wrote about the following paradox in AI: Is an infallible machine really intelligent? Echoing Turing’s approach, it couldn’t be expected a machine infallible and intelligent at the same time. Instead of building infallible computers, fallible machines should be developed, which could learn from their own mistakes; i.e., a sort of reinforcement learning, in which the AI model learned an optimal (or near-optimal) course of action that maximized the reward function. Maybe we should follow this deeply human approach to “teach sentience” to machines: by the end of the day, human beings learn through testing and we replicate those actions that bring us reward. In this case, the reward could be a profound feeling of self-assurance and happiness but how could we encode that in a, for instance, Monte Carlo simulation?
Who said teaching was an easy task 🙂