Transcript
1L6ffIBrvHk • TWIST2: Scalable, Portable, Mocap-Free Humanoid Data Collection and Whole-Body Control (Unitree G1)
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/FoundationModelsForRobotics/.shards/text-0001.zst#text/0013_1L6ffIBrvHk.txt
Kind: captions Language: en Okay, let's be real. Have you ever been wrestling with a fitted sheet and just thought, "Man, there has to be a better way." Well, that simple frustration actually gets us to a really big question in robotics. Seriously, why can't a robot fold your laundry yet? I mean, it seems like it should be simple, right? But for a robot, it's unbelievably complex. And you know, the main reason we don't have robot butlers buzzing around our homes boils down to one huge thing, a massive data problem. See, teaching a humanoid robot to move and interact just like a person, but out in the messy real world, that's been practically impossible to do at any kind of scale. Well, until now, that is. All right, so let's break this down. For a long, long time, robotics researchers were stuck with this really frustrating trade-off. On one hand, you had the mocap lab. Think Hollywood special effects, right? You get this incredibly precise full body data, but it's crazy expensive, super complex, and it's totally stuck in one room. On the other hand, you've got a portable VR setup. This is way more affordable and you can take it literally anywhere. The catch, it usually only gives you partial control, like just the arms and the head. The legs kind of just follow along with basic commands. So, you were forced to choose amazing highquality data that's stuck in a lab or kind of mediocre lowquality data that you can actually take out into the wild. And that choice, it created this massive bottleneck. I mean, think about it. We've seen huge breakthroughs in pretty much every other corner of AI, you know, with things like language models and image generators, and it's all been fueled by massive amounts of data. But for humanoid robots, that data revolution just never happened. There was just no good way to get enough of that highquality realworld data to teach them to be genuinely useful. So yeah, the bottom line is that all the old systems had to make some pretty major compromises. You had what's called decoupled control, which is kind of wild. It's where you might have one person controlling the robot's arms and a completely different person driving the legs. Then there was partial control, where the legs are just following these super basic speed commands. That's not how we move at all. The only way to get that true full body control was to go back to those giant, expensive, and totally non-portable mocap labs. But what if what if you didn't have to choose? What if you could get the best of both worlds? Well, that is exactly what this new system called Twist 2 does. It's a breakthrough that basically shatters that old trade-off. And you can sum it up in just three simple words. First up, it is portable. The whole setup is designed to get out of the lab and into the real world, an office, your house, literally anywhere. Next, it's scalable. This thing is built from the ground up for efficient, massive data collection. The idea is that tons of different people can contribute data, which is exactly what's needed to solve that bottleneck we were talking about. And finally, and this is really the magic ingredient, it's holistic. Twist 2 gives you full unified whole body control. It's capturing all the tiny, subtle, coordinated movements of a person from their feet right up to their head. Okay, so how on earth does this actually work? Let's pop the hood and see what kind of hardware and software makes Twist 2 tick. What's so cool about this is how simple and accessible the parts are. We're talking about a regular off-the-shelf VR headset and just two little motion trackers you strap to your calves. That's it for the human side. Then on the robot, they've added a custom-designed neck that can move up and down and side to side, which is so important for giving it that active human-like vision. Then you've got the software, which is the brain of the operation, translating your movements to the robot. And finally, a smart AI controller. It's called a reinforcement learning policy. Make sure the robot carries out all those moves smoothly and without falling over. And the whole process is just really elegant. It's so simple. A person just puts on the VR gear and starts doing the task. That's it. In real time, the software is watching every single move you make, walking, bending over, reaching for something, and translating it all into commands for the robot. The robot copies you. And the whole time this is happening, the system is recording everything from the robot's perspective, creating this perfect highquality data that can be used later to train a fully autonomous AI. And get this, that custom piece of hardware, the little mech module that makes all that crucial Activision possible, it costs about 250 bucks to build. That's it. That incredibly low cost is what blows this whole thing wide open. It's the key that unlocks this technology for researchers everywhere. And it really truly democratizes the entire field. All right, so that's the tech, but what can you actually do with it? Let's check out some of the real world results because they are pretty awesome. So, here's that scalable idea in action. Look at these numbers. In just 18 and 12 minutes, one single person collected 98 successful demos of a two-handed task. 98. For a tougher mobile task, they still got 46 demos in less than 20 minutes. And look at that last column. 100% success rate. Just wow. This is a game-changing pace for collecting highquality data for humanoids. And that incredible efficiency means the robot can now perform these really complex long-term tasks. Things that need both delicate hand movements and the ability to move around. We're talking about folding multiple towels in a row, which needs that precise pinching and whole body movement, or grabbing baskets, walking through a dorm with them, and setting them down. It can even do dynamic stuff like kicking a soccer ball. This user study is fascinating because it shows just how much every single piece of the system matters. Okay, so look at that first bar on the left. With the full Twist 2 system, it took people about 68 seconds to collect 10 demos. Not bad. But now look what happens when you take away the stereo vision. The time jumps up to 98 seconds. And if you take away that active neck module, it takes over 112 seconds. This chart is perfect proof that those design choices are absolutely critical for making the system easy and fast to use. But you know, the impact of Tibus2 is actually much bigger than just what this one robot can do. It's really about empowering the entire research community. And this quote from the project just says it all. Humanoid data is better when universally sharable. I love that. Their goal isn't just to build one cool system. It's to create a foundation that everyone else can build on top of. And they're really putting that philosophy into practice with a few key principles. First, the idea that no data set is too small. Every little bit helps. Second, by getting everyone to use the same standardized affordable hardware, the whole community can move forward faster together. And finally, using a single unified data format means that an AI model trained by one lab can easily be used and improved by another. It's literally creating a rising tide that lifts all boats in the world of robotics. So, this brings us all the way back to the beginning. For the very first time, we have a system that is portable, scalable, and holistic. A way to finally collect the data we need to train truly capable humanoid robots. That bottleneck we talked about, it's been broken. And that leaves us with one final and really exciting question to think about. Now that pretty much anyone can teach a robot, what's the first thing we should teach them to do?