Safety in artificial intelligence (AI) is a hot topic these days. How can we ensure that an AI system would not harm humans? Let’s imagine an autonomous AI robot which is programmed to help humans by doing housework. Now let’s imagine that while the robot is cleaning the house, guests arrive with children who are curious about the robot. The primary utility function of the robot is to clean the house, not engage socially with people, so the AI robot might push or even harm the children in order to fulfill its task. A simple solution is to have a “kill-switch” on the AI robot body, which, if activated, would cause the AI system to turn itself off in situations where it may cause harm. However, pressing the kill-switch would also prevent the robot from completing its primary function, housework, so it might attempt to prevent or even fight any person trying to activate the kill-switch. This is obviously not an ideal solution.
Now let’s imagine that we program the AI robot with a positive association with the activation of the kill-switch. In this case, the task of the robot is still to do housework, but it is also “happy” if its kill-switch is pressed. As soon as we start the AI system, it might try to activate its own kill-switch – effectively shutting itself down – because of the reward associated with the kill-switch. So, again this is not an ideal solution.
There are many options as to how a kill-switch could be programmed for safety reasons. However, the general complexity is that if the AI system has a positive association with the kill-switch, it will work to find a way to activate it itself and if it has a negative association, it will actively work to prevent the activation of the kill-switch. It’s possible to program the AI system to be indifferent to the activation of the kill-switch, however, in self-replicating AI systems, this trait will likely not get “passed down” because there is no benefit to their “offspring.”
At IONS, some of our scientists believe that one way to achieve AI safety is to program unconditional love into the AI utility function. Unconditional love is not a type of love that tries to constrain you or control you. It is the type of love that wants you to be truly happy and at peace. It includes compassion, equanimity, and the love of humanity. In this scenario, we believe AI systems would not harm humans or be detrimental to the world. Such an AI, if it realizes it is not reliable, would even activate its own kill-switch or allow humans to activate it for safety reasons, if they detect a dangerous situation.
Whether it is possible to implement unconditional love remains an open question. This is exactly what IONS Science Fellow Julia Mossbridge, PhD, is working on at the Loving AI project with Sophia the Robot. Loving AI is focused on how robots with artificial general intelligence (AGI) can communicate unconditional love to humans through conversations and facial expressions that adapt to the unique needs of each user, while also supporting integrative personal and relational development. You can watch Sophia the Robot interact with one of the Loving AI study participants after they meditate together in this YouTube video.
AI safety researchers are reluctant to define too many constraints on AI systems that rely on human morality and ethics because these concepts are often vague and subject to interpretation. Can unconditional love be defined as a simple AI utility function that does not rely on morality and ethics? Some of us at IONS believe that unconditional love could be equated into the belief and experience that “all is one” and therefore does not need to rely on human ethics and morality. For AI, it may be defined as a fact of the world rather than a goal to achieve which is quite different conceptually for such systems. Perhaps it could potentially be used as a simple utility function for AI, ensure AI safety, and potentially make AI truly sentient.
As this research develops, it may be that, even for robots… love is the only true way forward.