AI and Robotics Replications of Human Capabilities: Part 1

Humanity has always been looking towards replicating human capabilities through the use of machines, mechanisms, technologies, computers and computer programs. This first resulted in a shift from human labor to mechanization and automation by technologies and machines. Nowadays, the trend continues in the form of robotization: AI and Robotics.

Fancy looking AI robot — AI technologies connect computers with human intelligence.

Table of Contents

Robotics vs AI

We use our bodies to collect input from receptors like eyes, ears or skin. We then use our mind to derive useful data from input, analyze it and respond with output. Learning from input or making sense of input requires intelligence, thinking, a cognitive process.

Robotics is only concerned with collecting input or responding with output. It focuses on creating artificial human beings. The word “artificial” means that a robot is not a real human. It is a human replica consisting of a combination of metals, electronics and software.

AI, on the other hand, is artificial intelligence. The word “intelligence” means thinking. In AI intelligence results from algorithms, mathematical models and computer programming. It teaches the robot how to act like a human, express emotions through face, speak, recognize people, move, etc. You use intelligence to solve problems, plan, learn, make sense of visual images and sounds, process information and make conclusions.

Developing AI and robots is akin to a growing child. You want AI and robotics to do exactly what you as a human learn to do.

We as humans develop different capabilities at different stages of life. There are some capabilities that we are barely aware of and others that we know well. In this article I’ll break down the capabilities of a human being, in chronological order of attaining. I will also list AI and robotics technologies that teach machines to achieve the same things as humans.

Making Sense of Tactile and Visual Inputs

We all have vision and tactile senses. They are the basis of our interactions with the world. We not only see something or feel something but use our minds to make sense of what we feel. Feelings enable us to learn, make decisions and respond with output. We call this intelligence. Here I present how human and AI achieve intelligence by processing and responding to visual and tactile information.

Tactile Perception in Humans

Sensory input by touch starts in the mother’s womb. When a person is pressing his feet against the belly, he learns the forces and pressures. The whole body is a big input machine that constantly receives stimulus from the environment. Our bodies output responses by muscle contractions and movements.

When a person is born, he learns to use his vestibular sense to hold balance on both feet. He learns to walk, make his first steps. This is all done by making use of the tactile and vestibular sensory input. We use input to learn the necessary output from muscles. Muscles learn to achieve a certain pose, sequence of moves, movements of objects, etc.

Tactile Perception in Robotics and AI

In robotics and AI tactile perception, the goal is to replicate human ability to control his body. Here comes the realm of physics. AI and robots use knowledge from mechanics and kinematics to achieve certain movements that humans are capable of.

In robotics, the human body is replicated by sensors: accelerometers, gyroscopes, and force sensors. They provide real-time data on position, orientation, and forces acting on the robot.

To create desired movement output from sensory input, here are robotics technologies that help with that:

Technology	Description	Outcomes
Odometry	Odometry is a method that uses motion sensor data to estimate the change in position over time	Estimating position and orientation of the robot.
Robotics Kinematics	Forward kinematics – calculating the position and orientation of the robot’s end-effector from joint angles. Inverse kinematics – determining the joint angles needed to reach a desired position.	Designing and controlling robotic movements
Control algorithms like PID (Proportional-Integral-Derivative)		Adjust movements based on sensor input

AI provides the following tools to analyse and process physical sensory data with the goal of achieving certain outcomes:

Technology	Description	Outcomes
Genetic Algorithms	A computer science technique that uses natural selection to solve problems by evolving a population of solutions	Learning to walk, hold balance, make first steps, move your body to achieve certain goals.
Reinforcement Learning	A type of machine learning where an agent learns to make decisions by interacting with an environment.	Robots trial different movements and receive feedback to optimize step patterns.
Machine Learning	A subset of artificial intelligence (AI). It enables computers to learn patterns from data and make decisions or predictions without being explicitly programmed.	Predict and adapt to disturbances, improving balance over time

Vision in Humans

Vision develops along with tactile senses. First, when you are born, your visual system is very limited. You first learn to see and recognize colors. At this stage your entire visual input is just a disordered set of colored spots. You learn to bring order to these spots, that close-by same-colored spots are organized into areas. You learn to segment 2D image into areas of same color and distinguish areas with edged borders.

Your next step is to learn to recognize patterns such as faces. Remember when you are 5 months old. You learn that the face has a mouth on bottom and eyes on top. Then suddenly someone appears in front of you upside down with mouth on the forehead and eyes at the bottom. You think you see something weird. What happens is your visual system is not developed enough to perceive depth. You cannot make the conclusion that the person is just hanging over your bed from above your head. Your visual system is, for the moment, limited only to 2-dimensional plane of colors. On that plane something moves, colors change but most of it remains a random mess.

3D Perception

Visual system starts to perceive depth when the child learns to overlay two visual inputs, one from each eye, in such a way that they start to reveal 3D space. You notice suddenly that there is space before you, beyond what you currently look at. You suddenly understand that by reaching your hand forward you can touch objects a certain distance in front of you, which you previously believed impossible. Gaining 3 dimensional perception helps you recognize objects better: a bike riding on the road, people’s bodies, water coming from the tap and more. It helps you understand why the bicycle suddenly disappears behind the wall of a building. It does so because it is further away from you than the wall.

Vision in Robotics and AI

Robots capture visual data by means of sensors and cameras. Visual data is captured as raw pixel values, typically in formats like RGB images or video streams. Each image is represented as a grid of pixels, with each pixel containing color intensity values.

Previously we explored how a young person learns to make sense of 2D image inputs. To do the same with AI, researchers have developed the necessary algorithms. AI identifies patterns such as edges, textures, and shapes by applying filters to small regions of the image. It can do segmentation and highlight important areas in the image.

AI can then use deep learning models, especially convolutional neural networks (CNNs), to extract features from the visual data. In a neural network, early layers detect simple patterns, while deeper layers recognize complex structures like faces or objects.

Further Developments in Vision

Once features are extracted, the AI interprets the visual data: locating and classifying objects in an image, categorizing images into predefined classes, breaking an image into regions and assigning a label to each pixel (e.g., sky, road, car), and others.

Achieving 3D perception, extraction of depth values from two or more 2D images, can be done with computer stereo vision. Depth information from stereo vision is useful for agents navigating in 3D space. Depth can also be used to estimate distance to objects in view or reconstruct a 3D scene from a set of 2D images.

In the case of 3D object reconstruction, the pipeline that converts a set of 2D images into printed 3D models can take many steps. Depth information extraction is only a small part of it. Other steps include combination of multi-view depth maps for 3D model generation and the conversion of point clouds into mesh models.

However, even within the stereo vision field, to just find the matching points on the two images one must consider various image block similarity metrics, search optimization, essential and transformation matrices, camera calibration and image rectification.

Other examples of processes in computer images and videos include recognizing activity in videos like what the person is doing, reconstructing 3D worlds from 2D images, finding change in location by a camera provided two images of the scene captured before and after the change, image captioning which is using text to describe the content of images, searching images and videos based on some information query, and, most importantly, determining meaning of the image or video.

By combining powerful algorithms with vast amounts of visual data, AI can “see” and interpret the world with increasing sophistication, mimicking and even surpassing human visual understanding in certain contexts.

Conclusion

The pursuit of replicating human capabilities through robotics and AI has led to remarkable advancements in machine intelligence, automation, and human-like perception. While robotics focuses on physical interactions with the world, AI enables cognitive functions such as learning, problem-solving, and decision-making. Together, these technologies have made significant strides in mimicking human sensory perception, including tactile feedback and visual processing.

In this article I have touched topics on replication of human capabilities with AI and Robotics. I covered human and robot tactile and visual systems. In the next article I will dive deeper in AI capabilities exploring the use of intelligence to understand language.