AI for Robotics: Perception and Control

date

Feb 21, 2023

slug

perception-control

author

status

Public

The Chat

Ritvik I would like to give some observations I’ve had from working with AI and robotics over the past two and a half years. Robotics can be divided into two main sub-problems: perception and control.

In terms of perception, computer vision has yet to have its GPT moment. We’ve seen language capabilities scaled quite a bit, but computer vision hasn’t had the same type of attention. The community is focused too much on trying to make novel architectural changes that they fail to realize that out-of-distribution generalization is terrible in most CV tasks. This is especially important in robotics where you frequently work with in-the-wild scenarios and things such as lighting, reflections. and specular reflections make an enormous difference. This is where synthetic data needs to play a role (and it slowly is) to offer dense annotations at scale. Speaking from personal experience working on some projects, simple models and synthetic data scale can offer far more robustness than any fancy architecture.

In terms of control, we can use reinforcement learning to learn complex, multi-modal behaviours, far surpassing the capabilities of classical methods. However, the sim-to-real gap is something that fundamentally plagues robotics (to the extent that different training seeds can drastically affect real-world transfer). This null-space in simulator fidelity is something that we have no answers to, and raises fundamental questions of reliability and safety of these systems. Furthermore, task specification is something that is still such an open problem. We can reward engineer specific solutions, but it’s unclear what method we can even use to enable some form of generalizable autonomy. It feels like reinforcement learning is ill-equipped to handle generalizability.

It feels like maybe to enable generalizable behaviour, we cannot disentangle the perception and control aspects. This is because it’s unclear e.g. what representation of an environment should be used to inform a controller to enable generalizable behaviour. This is where language could come in: it is the most natural method of specifying a task (thereby being the most natural method to enable generalizability). Work involved with CLIP embeddings which tries to create a unified representation of vision plus language could be key.

Jonathan I think I generally agree with your ideas. There’s been huge improvements in the software and architecture side of things, but I haven’t seen much impact from integration of those ideas with robotics in the real world.

Ritvik Yeah, it’s really hard, it’s genuinely the final frontier for AI. Bringing intelligence into the messy/stochastic real world is such a difficult problem, it’s why you don’t see them deployed and you only see heavily scripted demos.

Jonathan This might also be a problem related to safety - failure modes of software suck but can’t directly harm humans. I think there will be a fundamental barrier to AI breakthroughs unless we allow robots to run around and learn from real world interactions, but again this poses a big safety risk. There is some interesting research being done in this area, but it is still quite primitive. I saw a video of a robotic arm controlled by natural language.

Ritvik This is why I fundamentally disagree with people like the CEO of Scale.ai, who say that simulation is useless. I think simulation (e.g. high fidelity physics simulators, ray traced synthetic data pipelines) is the only way forward. Today I talked to a drone startup, and they say that synthetic data was the only way they could design controllers that could avoid things like power lines.

(In response to the robot arm) I may sound like a cynic, but I take anything that comes out of many research labs with a huge grain of salt. Many things are just demos that offer zero generalization. Also while language can provide high level generalization, I should note it’s not dense enough for low-level control tasks.

Takeaways

I previously hadn’t really thought much about how AI would be integrated into robotics. In media, AI takes over the world by using machines that are more powerful than humans (re: Terminator). But in the current moment in the land of reality, progress with robotic integration has been severely lagging behind software improvements. Robots are completely overshadowed by flashy new toys like ChatGPT and Dall-E 2.

One of the biggest challenges with using AI for robotics is the amount of noise present in the real world.

Detailed and thoughtful simulations might provide a solution to the previous point. It would also offer some robustness in terms of AI safety - we can check to see what a robot would do in a given scenario using a simulation, rather than needing to deploy it in the real world (useful in the case of a strong AI - we don’t want to unleash it on the world without knowing what it will do).

Thank you for the chat Ritvik!