Back

Synthetic data and artificial intelligence. Is the connection unbreakable?

A few months ago, our blog on Medium published an article Let's talk about synthetic data? Usefulness and Anonymity: How to Find a Balance, where we talk about why synthetic data is becoming an invaluable tool for the future of technology, which is approaching faster than expected.

Why is this tool really so important?

Is there a privacy risk?

It is believed that synthetic data has become an extremely important tool in AI projects, as it provides machine learning models and predictive analytics without any privacy risk. Why does this happen?

Synthetic data has the same mathematical and statistical properties as the original data and preserves correlations between data variables so that trends in the original data set are also reflected in the generated data set. But they do not contain information that could jeopardize privacy. Unlike data that has simply been "de-identified" (stripped of identifying details), synthetic data sets are completely separate and cannot be linked back to their source.

The benefits of synthetic data can revolutionize research. For example, in medicine, they not only reduce bias by modeling patients from underrepresented groups, but also address conflicting results often seen in pediatric and rare disease studies due to small numbers of patients. All of this creates a more compelling and compelling look at synthetic data, showing us how far we've come and how far we can go.

Why are synthetic data and AI inextricably linked?

As privacy laws tighten, the use of synthetic data increases. By 2024, according to the American research company Gartner Inc., synthetic data will account for 60% of all information used to develop artificial intelligence and analytics projects.

Synthetic data and AI have a closed, mutually beneficial relationship: synthetic data is created with the help of AI, and AI models are built on the basis of synthetic data.

"You start with a real data set - for example, clinical trial data - and train an AI model to learn patterns in that data," says Dr. Khaled El Emam, a Canadian researcher in the Department of Medical AI at the University of Canada. "Then you can generate new data from the AI model."

"Synthetic data enables engineers and developers to work on innovations that would normally require real data, which is increasingly difficult to obtain," said Ms. Colucci, co-founder of a San Francisco startup that creates synthetic computer vision data for developers who are looking to rapidly build AI models for applications ranging from warehouse security and robotic inventory to virtual fitness coaching.

"For example, to create a warehouse system that automatically detects spills, you would have to feed machine learning hundreds, even thousands of images that would teach it to recognize what a spill looks like," says Ms. Colucci. "You can either go and photograph different kinds of spills - different sizes, shapes, colors, textures and in different lighting - or create synthetic images of spills based on multiple real-world images."

Will synthetic data replace real data?

The benefits of synthetic data are promising. In medical research, this improves research quality and outcomes, for example by modeling patients from underrepresented racial and socioeconomic groups to reduce research bias. This ability to model data could also solve a persistent problem in research into treatments for pediatric and rare diseases: the small patient populations that have historically made it difficult to prove whether new drugs work.

For example, the non-profit organization Health Cities collected synthetic data for a project aimed at preventing opioid addiction.

"We have about 400,000 data points over seven years, which includes pharmacy data, emergency room visits, diagnostic data and administrative data," says Health Cities CEO Reg Joseph. "With this, we can begin to look at prescribing and use habits and all sorts of metrics to find patterns that can help inform addiction prevention practices."

At the Massachusetts Institute of Technology in Cambridge, Massachusetts, a group of scientists has created an open-source platform to give other organizations access to software for creating synthetic data.

Synthetic data storage has had no shortage of users, according to co-founder Kalyan Veramachaneni, chief scientist at MIT's Information and Decision Systems Laboratory.

But…

At this time, synthetic data will not replace real data. Today, researchers who use synthetic data to reach a conclusion usually then compare their results with real data. They work side by side and will continue to do so in the near future.

"We don't know enough - we're still trying to figure out how reliable our synthetic data is, and we're seeing in publications that artificial intelligence is prone to systematic errors," says Health Cities' Joseph. - But the future is in both synthetic data and AI definitely exists."

Subscribe to our channels on social networks:

LinkedIn
YouTube
Instagram
Facebook
Telegram
Medium

Contact us:

business@avitar.legal

Authors:

3.17.2023 17:15
Іконка хрестик закрити

Let's discuss your project

Application successfully sent
Request submission error
By clicking "Allow all" you agree to store cookies on your device to enhance website navigation, analyse usage and assist in our marketing efforts
Allow chosen

Submit

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
You can find more in our
Cookie Policy
Text Link
Data Protection