Intro to Robotics: ML vs Traditional Approaches

Physical-World AI as the Next Frontier

In recent years, we have witnessed the enormous success of Large Language Models (LLMs) in text-based applications. Copy editing, translation, document parsing, chatbots, Retrieval-Augmented Generation (RAG), and agentic systems have become easier than ever before.

This enormous success has led to the idea of applying the same approach to the physical world, namely, using the transformer architectures behind LLM breakthroughs for robotics.

We have all seen robotic arms, bipedal humanoids, and quadruped “robot dogs”. So far, however, their practical applications have been limited. Most real-world deployments are concentrated in factories and rely on traditional robotics approaches.

In this article, we briefly outline the current state of physical AI, its key challenges, and emerging applications.

Old Robotics vs New ML Robotics Approaches

Classical robotics, especially for fixed industrial processes, relies heavily on model-based control, kinematics, dynamics, manual system design, and explicit programming.

Traditional robotics typically relies on:

Kinematics and Dynamics (aka differential equations)
Control Theory ( PID - Proportional-Integral-Derivative, Model Predictive Control (planner), Impedance Control (force sensing) )
Finite-state machines
Rule-based Logic

Programming is done in:

Vendor-specific languages e.g. KRL (https://www.kuka.com/), RAPID (www.abb.com), KAREL (www.fanuc.eu) for Fixed industrial processor ROS / C++ for autonomous robots that require sensors, perception, and path planning

Programming industrial robots is expensive and requires specialized expertise. This is one of the main reasons why small and medium-sized enterprises (SMEs) struggle to adopt robotics. A small depot or workshop often cannot afford the required setup and maintenance costs.

In contrast, ML-based Robotics relies on demonstrations and data rather than explicit programming. The number of demonstrations required varies, from as few as 50 examples for imitation learning approaches (e.g., ACT) to much larger datasets for reinforcement learning or Vision-Language-Action (VLA) policies.

In imitation learning, the main challenge is collecting a suitable dataset and then allowing the robot to replicate the demonstrated behavior. For simple, well-defined tasks, this dataset can be relatively small, sometimes only a few dozen repetitions.

As a result, ML robotics has the potential to democratize robotics, making it accessible to small and medium businesses rather than only large industrial players.

This shift from classical, model-based methods to data-driven, learning-based paradigms is unlocking unprecedented capabilities in autonomous systems. The frontier of robotics research is increasingly moving away from purely model-based control toward learning-centric approaches.

Limitation and challenges of ML robotics

It is worth noticing that ML Robotics Is Not a Full Replacement (Yet)

In practice, most successful robotic systems today are hybrid systems. They combine:

Learning-based perception and decision-making
Classical control, safety constraints, and low-level motion execution

Even state-of-the-art humanoids and manipulation systems still rely on traditional control loops for stability, safety, and real-time guarantees.

The data collection is still a large challenge. Although imitation learning can work with relatively small datasets, collecting high-quality demonstrations in the physical world remains costly and time-consuming. Teleoperation, simulation, and synthetic data are often required to scale.

How to Gather Data for ML Robotics: Simulation and Teleoperation

In ML robotics, a dataset is required to teach a robot how to perform a task. In the physical world, this usually means demonstrations. For example, a human shows the robot how to pick an object and place it into a bin. This method of teaching robots is known as teleoperation.

Teleoperation can be performed using:

Mobile devices
VR controllers
Physical guiding arms (lead-through teaching)

In this setup, the human operator performs the task on behalf of the robot. During the demonstration, all relevant data is recorded, such as joint positions, gripper states, camera images, and force signals. This data is later used to train a so-called policy, which allows the robot to reproduce the demonstrated behavior.

While effective, this way of collecting data is quite labor-intensive and time-consuming. As a result, an important alternative is simulation — that is, creating a virtual environment in which the robot can perform tasks.

Demonstrations can be collected not only in the real world but also inside simulation environments, for example:

Simulation-based data collection offers several key advantages over real-world teleoperation. In simulation, robots can generate data at scale, run continuously, and explore a wide range of scenarios without risk of hardware damage or safety concerns.

In addition to human demonstrations, datasets can be generated without any human involvement. This is done by artificially creating demonstration trajectories using:

Heuristic controllers
Classical motion planners
Randomized exploration
Reinforcement learning agents

These automatically generated trajectories can serve as demonstrations for imitation learning or as training data for reinforcement learning policies.

Policies: ACT and VLA

In the case of industrial robots, control is usually script-based. Simply speaking, the action sequence is hard-coded in a programming language and executed step by step.

In ML robotics, by contrast, the robot is controlled by a policy. A policy is typically a probability distribution:

P(NextAction | PreviousState,Sensors)

In other words, the policy determines what action the robot should take next based on its current state and sensor observations.

This probability distribution can be implemented in different ways, most commonly as a neural network with a specific architecture. The network maps sensor inputs (such as images, joint states, and forces) and current state to robot actions.

The most common policy types used in modern ML robotics are:

ACT is primarily used for imitation learning. Instead of predicting a single next action, ACT predicts a chunk of future actions at once. This makes execution smoother and more stable, especially for manipulation tasks.

VLA policies extend the idea of a policy by incorporating language as part of the input. In addition to vision and proprioception, the model also consumes a textual instruction, such as: “Pick up the red object and place it into the blue bin”.

Hosting the Policy

Hosting the policy is a separate and important question. In most ML robotics systems, the policy is implemented as a neural network, which cannot run on a simple microcontroller such as an STM32.

Instead, the policy must be hosted on more powerful compute platforms, such as:

NVIDIA Jetson (Nano, Xavier, Orin)
Raspberry Pi paired with an accelerator (Coral TPU, NPU)
x86 mini PCs (e.g., Intel NUC)
FPGAs, where the model is ported to Verilog or synthesized using hardware description tools
Cloud

The choice of where to host the policy has significant implications for performance, cost, and reliability.

Cloud-based execution may be acceptable for high-level planning or monitoring, but low-level control loops must remain local.

Conclusion

ML robotics is an emerging approach that promises to significantly democratize the field of robotics. It has the potential to make robotic systems accessible not only to large-scale industrial players, but also to small and medium-sized enterprises (SMEs).

High-quality ML policies remain an area of active development, and the tools, models, and datasets required to build them are improving rapidly.

While classical robotics will continue to play a critical role (especially in safety-critical and highly structured environments) the integration of learning-based policies is reshaping what robots can do and who can afford to use them.

References

https://arxiv.org/html/2510.12403v1