You probably know that the new generation of generative AI tools that have exploded in popularity can generate words, images, and even videos that closely resemble those created by humans. But did you know that it can also be used to generate data itself?
Modern artificial intelligence (AI) works by recognizing patterns in data and using them to answer questions or predict what will happen next. With generative AI like Open AI’s ChatGPT, you use it to create more data that follows the rules of the data it was trained on.
However, real data presents complications. Collection can be difficult and expensive, and creates security and privacy obligations.
For example, consider a dataset containing thousands of human faces that is used to train a facial recognition algorithm. You have to find thousands of people, take their photos, and get their permission to store and use their data. Then, countless checks and balances must be followed to ensure the data is free of harmful bias.
One solution is synthetic data. This is machine-generated data that closely resembles real-world data that can be used for many of the same purposes.
Snowflake is one of the world’s largest data-as-a-service companies, offering analytics services as well as a data marketplace covering thousands of topics, including healthcare, finance, and retail.
We are now powering these services with AI-generated synthetic datasets, allowing us to use generated AI in several other interesting applications. Let’s take a look!
First, what is synthetic data?
Synthetic data is information that is artificially generated without real-world data so that it has the same characteristics as a real-world dataset.
Generative AI is particularly suited to this task, as it can easily analyze any dataset and create synthetic data that closely matches it. This means companies can train AI algorithms and run tests and simulations without exposing personal or sensitive information that real-world data may contain.
In the financial sector, it is used to train fraud detection algorithms to identify intentionally falsified transactions, in the medical sector to avoid using sensitive patient data, and in the retail and marketing sector. synthetic customer and analyze their purchasing behavior.
according to gartner surveyBusiness leaders are most likely to rely on synthetic data due to the accessibility, complexity, and availability challenges of real-world data. We also found that partially synthetic datasets, in which real-world data is augmented with synthetic data, are more commonly used than fully synthetic datasets.
Generating synthetic data allows companies to create the information they need to fill gaps in existing records or create entirely new datasets. This does not negate the need for real-world data needed to create synthetic data in the first place. But when used effectively, it can reduce costs, speed up training of machine learning models, and help companies automate and make better decisions.
Generating synthetic data in Snowflake
Snowflake sells data to businesses through the Snowflake Marketplace, one of the world’s largest B2B data intermediaries.
In addition to thousands of real-world datasets, Snowflake now provides access to synthetic datasets created by generative AI algorithms.As an example, San Francisco-based Synthesis AIThe synthetic human face dataset consists of 5,000 individual images of different human faces.
In the past, facial recognition algorithms have been criticized and even Banned This is because there are concerns about bias in the datasets used for training. This has led to differences in the ability to identify people from different ethnic backgrounds, leading to accusations that they may be unfair or biased.
Using synthetic data in this way can help address these issues by allowing you to create datasets tailored to the level of representation and comprehensiveness you require (note – we don’t claim to completely solve them) .
Synthetic data has been around since before the advent of generative AI, but a new class of generative algorithms means datasets can quickly scale to the size you need. Datasets created in this way can also be easily customized to suit the needs of different customers around the world.
We also provide synthetic financial data such as: clear box AI, consisting of simulated mortgage applications designed to mimic both legitimate and fraudulent applications. The data in these sets was enriched with data created by generative AI.
Snowflake has revealed that it expects synthetic data generated by AI to play a key role in its future business. As generative models such as large-scale language models (LLMs) become more sophisticated, they can create synthetic data that more accurately reflects the real world, leading to cheaper and more efficient insights for businesses. Masu.
How else is Generative AI used in Snowflake?
In addition to providing access to synthetic AI-generated data, Snowflake has created a number of tools based on generative AI that customers can use.
Thanks to this year’s acquisition of Neeva, a search startup founded by former Google employees, the company is implementing natural language queries for datasets. This effectively allows users to interact with data and gain insights by asking straightforward questions rather than performing traditional data science analysis. CEO Frank Slotman said: venture beat“Working with data through natural language is becoming more common. This increases the opportunity for non-technical users to extract value from data.”
Also, launched a partnership We partnered with Nvidia and used the chipmaker’s NeMo LLM to create a platform that allows Snowflake users to build generative AI applications that can access Snowflake data, such as chatbots and search engines.
Another LLM initiative is to Document AI tools This allows users to query documents such as legal contracts and invoices and extract their meaning. It was developed using technology acquired by Snowflake when it acquired Swedish natural language platform Appplica in 2022.
In summary, it’s clear that Snowflake has high hopes for generative AI, building tools that help create synthetic data and analyze and extract value from it.