Synthetic Data Panel Discussion

ESOMAR’s Art & Science of Innovation Summer Conference wrapped up in Chicago to great success. Founder and CEO of Generation1.ca, Arundati Dandapani, MLitt, CAIP, CIPP/C, CIPM also discussed synthetic data with several other leaders from around the world including President of ESOMAR Ray Poynter, Kimberley Clarke, USA’s Global Lead of Consumer Insights, Ipsita Ghosh, Leonardo Valente, CTO of LivePanel in Argentina, led by Dharmendra Jain, Founder and CEO of Actnable AI, from Kenya, Africa, who is also on the ESOMAR Council.

One of the use cases Founder Arundati will be including in her ESOMAR 2024 Congress presentation will be of understanding hard-to-reach immigrant populations with augmented synthetic data, in collecting information that is sensitive yet well-curated demonstrating strong human oversight and transparent, explainable and responsible use of AI backed by top regulatory frameworks and data protection best practices.

See some photos from the panel discussion embedded below. Click on the arrow to advance.

Overall, the panel seemed to conclude that while synthetic data alone was not enough and that augmented synthetic data had more value at this stage of technology’s maturity, we should not be afraid of experimenting. We should not underrate nor underestimate the power of technology, when it comes to the different possibilities that AI advancements have added to the potential of synthetic data. The low costs, scaleable value to enterprise and privacy-preserving potential of synthetic data (secure anonymization and de-identification) of personal and sensitive information with strong human oversight and humans in the loop, make this alternative promising.

Synthetic data or artificially generated “fake” data solves a problem of data scarcity, allowing access to a wider range of data, more cheaply and at scale, and in privacy-preserving ways. Machine learning and artificial intelligence tools offer scale and productivity to synthetic data promising more possibilities and explorations in data insights, analysis, prediction, and change.

Synthetic data could be fully synthetic or partially synthetic, or fully synthetic or augmented with real datasets with the historic roots of synthetic data tracing back to deidentification application (data cleaning). Founder Arundati has long been saying we are moving into a post-people world with population decline on the global horizon forcing researchers to look to opportunities with preserving human truths and culture.

The risks with currently using synthetic data include re-identification of the source data when it is trained on real data. Moreover, depending on whether the information been fully anonymized or de-identified, different laws might apply, depending on your jurisdiction and the use case of that data. However, the important caveat to note is whether your synthetic data is fully synthetic or partially synthetic, transparency is key to users.

The biggest differentiator in uses of advanced technologies like synthetic data is trust when working across clients and stakeholders. Can we assure users of synthetic data that the privacy measures being followed are embedded in our data governance structures and that the quality of human oversight is strong? Do we have strong safeguards and frameworks that inform our organizations’ AI policies in sustainable ways (e.g. using Privacy by Design)? It is important to have an AI governance plan and team that determines the responsible use of such synthetic artificially generated data from the end to end of a data lifecycle in any organization. A well-thought fit-for-purpose and overarching AI governance strategy will also help us achieve the right privacy-vis-a-vis utility tradeoff that is meaningful to our various organizations in their use of synthetic data.

Currently uses of synthetic data in research are more notable across public sectors. A strong example would be in regions where jurisdictions and grids are not established and rather hard to map, and access to public services like healthcare is not easy to track or measure. In healthcare, synthetic data also allows researchers to formulate hypotheses and obtain estimated analytical results without jeopardizing patients’ privacy, saving time for patients, hospitals and other stakeholders to save the sick and advancing research and diagnoses on diseases and medical conditions.

In addition, synthetic data facilitates cross-sectoral and cross-organizational collaboration and exchanges, and lessens the difficulties public organizations face in grafting together different models of use and service processes. The US Census Bureau used synthetic data, a system of differential privacy to add statistical noise to an existing dataset to protect the release of census data, “balancing privacy and accuracy in a surgical way” that met with both praise and criticism. Both Forrester and Gartner have been quite vocal in the highlighting the positive significance of synthetic data to AI and other tech advancements in society.

The use of synthetic data according to our founder can even circumvent challenges in survey design in certain regions and countries where asking certain demographic survey questions are not legal (e.g. religion, sexuality, ethnicity, etc.). Moreover the use of synthetic data can offer privacy preserving alternatives, where we can simulate a village, township or country with computerized data that is not mapped to any real PII. While synthetic data started out as a deidentification technique in the 1980s, it has returned with a whole expanded scope powered by advancements in machine learning and AI to offer use cases that simulate whole populations or research eco-systems to deliver results that can be both meaningful and privacy-preserving. Furthermore, when Bill C-27 becomes law in Canada, there will be clear definitions of anonymized data and de-identified data with assigned status for each on where they fall in the privacy regime and where the law applies to both types of data and where they might be exempt from privacy laws.

Want to hear lots more? Come join our founder at ESOMAR’s Congress 2024.


Leave a Reply