The fact that the coronavirus pandemic has closed people at home has also negatively affected autonomous vehicle development.. This is because of the lack of data.. Data serves as oil today. It is the most important building block for training AI models.. Artificial intelligence models learn the necessary things with the help of certain algorithms from the data given to them during the training process and create a prediction mechanism.. The situation is the same in autonomous vehicles.. For example, in order for the car to recognize traffic lights, you have to show and teach it hundreds, maybe even thousands, of traffic light photos, just like a baby.. However, the lockdown of people due to the pandemic has greatly hindered obtaining data to train autonomous vehicles.. Autonomous vehicle manufacturers, on the other hand, are trying to produce solutions with synthetic data sets they have created with engines such as Unity.
Real world data is truly priceless.. But in a presentation at Transform 2020, which was held recently, Unity Authorized Machine Learning Engineer Cesar Romero stated that they support autonomous vehicles, robots and more artificial intelligence model training with synthetic datasets.. Although Unity is famous for its game engine, it also offers tools for the transportation, film, architecture, engineering and construction industries.. Working in all these areas, the company emphasized that it is very difficult to obtain some of the data they need from the real world, so synthetic data sets are important.
Synthetic Data Sets Can Eliminate Data Violations
In the statement of Unity official, “First of all, we have a GDPR (General Data Protection Regulation) problem regarding data.. This regulation tries to emphasize that the collected data belongs to the individual, not the collecting company.. This makes it very difficult to collect data without violating the rights of users.. However, there is no such problem with simulated, synthetic data.. Since the data is completely synthetic, there is no confidentiality to be violated or ownership to be questioned.”. For example, in order for autonomous vehicles to learn about the accident situation, we need to present them with images containing the accident.. Visuals like this are so hard to find in real life that sometimes you just don’t have enough data to train the model.. However, if you can determine the behavior of the cars with a simulation environment and find out what the result will be in the event of an accident, you will find data for your model without any expense..
Training objects created in the simulation environment do not have to be only 2 dimensional. You can create 3D objects and rotate them, zoom in and out and change the background. In this way, you have training data in many variations.
We mentioned that synthetic data costs less than real-world data.. This graphic by Cesar Romero proves this.
Romero also cited the Synthia Data Set, which is used to train autonomous vehicles, as an example of a synthetic dataset.. As an article, he cited the article “Domain Randomization to Transfer Deep Neural Networks from Simulation to Real World” for training robot arms published in 2017.. As a third example, he stated that the synthetic data used by the Google Cloud AI Research Team in the model they developed to detect supermarket items is more effective than real data.
For simulations and synthetic data, SynthDet, which Unity currently offers as open source, stated that it can be used. But Romero said that you should not use SynthDet if you are running it on your own hardware, and that if large-scale simulations are required, Unity Simulation, which they offer as a cloud service, can be used. undoubtedly.