Recapping Our Reddit AMA with Helm.ai CEO Vlad Voroninski
We dropped by the social sharing community to answer questions about AI, Autonomous Driving, Deep Teaching and more. Here we share some highlights…
Announcing Deep Teaching and coming out of stealth has been a whirlwind tour for us. We’ve been featured in Forbes, sat on a panel at The Information’s Autonomous Vehicles Summit, and even relaunched the Helm.ai website.
As part of our push, we were thrilled when Helm.ai CEO Vlad Voroninski recently held court at Reddit’s r/iAmA. During the discussion, Vlad fielded plenty of great questions from curious minds and had the opportunity to share some insights into our Unsupervised Learning philosophy and much more.
Many of the key takeaways from that discussion centered around Deep Teaching, which enables us to train neural networks without human annotation or simulation, for the purpose of advancing AI systems — particularly with respect to autonomous vehicle technologies, yet potentially having impacts far beyond that specific application.
Succinctly, we feed data sourced from thousands of dash-cam videos from around the world into our algorithm, so our system can learn without manual human labeling of images or videos. This speeds up the learning process and makes the software far more scalable, and increases the efficiency with which we can extract accurate data to inform the driving AI.
Below, we’ve included a few of our favorite questions and answers re: deep teaching, lightly edited for clarity:
Q: That looks like an impressive system. What features are you developing to allow autonomy to continue in heavy fog, dust storms or poorly lit areas? — Reddit User HHS2019
A: For any given scenario, if it’s theoretically possible to infer enough information from sensor data about the environment to drive safely, we will be able to approach that limit using deep teaching, by training for the appropriate sensors. More specifically, we can use radar data in heavy fog/dust storms, and HDR cameras + active illumination like headlights for poorly lit areas.
However, it’s not guaranteed to drive safely in all scenarios for an AI system or a human, depending on what information the sensor data contains and the physical properties of the environment, and just like a human driver, the AI system can decide to pull over in certain intractable situations.
Q: When do you think we are going to have AI designed products? What type of products? I know it already helps in a ton of stuff, yet I’m talking about the whole product designed by the computer. — Reddit User UbajaraMalok
A: It will be a gradient. There have already been art pieces created completely by AI that are sold as commercial products. Humans and AI will continue to be involved in the process of creating products at various levels. As far as a product category developed end-to-end by an AI, and previously completely unforeseen as a potential product by humans, it could certainly happen in this decade.
Q: It was my impression that unsupervised learning is typically used to get a good representation, which can later be fine-tuned for different tasks. Does deep teaching learn to perform self-driving end-to-end, down to the actuation level? Or does it produce something intermediate, like segmentations? — Reddit User pelicoptah
A: Deep teaching here produces all the intermediate representations like segmentation. The rest of the stack in this case, including path planning and control, is classical. It’s important to be able to train accurate networks for every subproblem, both from a reliability and validation perspective. A fully end-to-end approach that goes from images directly to controls has limited value when developing self-driving systems, since there is very little learning signal, and validation becomes intractable.
Q: You mentioned that you’re not using lidar, GPS, or maps in your demo. In a production system, what sort of sensors would you add? And how granular would the maps be?
For L2+ we built a reference stack with cameras and radars, as well as fusion with a semantic map for navigation purposes.
For L4 there is also Lidar involved initially, which would be weaned off over time to rely more and more on vision to lower the cost of the stack. We’re also using vision to automate how HD maps are created and updated to improve cost and accuracy.
Q: Do you think precise depth estimations (let’s say below 10cm accuracy for 5–10m distance) with a very low error rate of small, dynamic, diverse and moving objects (e.g. kids) is possible only using a camera?
There are theoretical limits on depth precision using cameras, as a function of baseline, distance to the object and the validity of monocular priors. The goal isn’t to replicate the depth precision capability of a Lidar system using a vision system, the goal is to build an AI stack that can drive much better than a human, while also being fully interpretable.
This latter goal doesn’t require centimeter-level depth prediction at every distance, similarly to how humans don’t need to predict depth at those levels of accuracy to be able to drive well (provided they’re not distracted, inebriated, overly aggressive, etc). We will eventually have AI vision-only systems that do just as well as humans at depth prediction … all other variables being equal.
We definitely enjoyed the opportunity to provide insights into our methodologies, encourage you to swing by the Reddit IAMA thread to get the full scoop and can’t wait to share more with you within the coming weeks. Interested in learning more about how Helm.ai’s approach to Unsupervised Learning and AI for autonomous driving differs from the rest of the pack? Be sure to check out the Helm.ai YouTube channel for more context or visit the Helm.ai website.