“…asked whether he was worried about regulatory probes into ChatGPT’s potential privacy violations. Altman brushed it off, saying he was “pretty confident that soon all data will be synthetic data.” — Financial Times (2023)
Sam Altman’s remark that soon “all data will be synthetic” is not just a provocation—it captures how debates on synthetic data often unfold through bold claims that sidestep the pressing issues. What, in fact, are synthetic data? Why do they matter? And how can we critically understand their world-making potential, especially when viewed from the vantage point of the Global South?
This post argues that synthetic data are not neutral technical tools but socially constructed representations whose world-making potential is deeply contested.
This post argues that synthetic data are not neutral technical tools but socially constructed representations whose world-making potential is deeply contested. They are both fictions and frictions: fictions because they are constructed representations of reality, embedded with the assumptions of generative models; frictions because they produce practical and epistemological tensions when deployed within infrastructures such as finance. These tensions are particularly acute in Global South contexts, where synthetic data narratives often obscure existing infrastructural and governance challenges. We draw on our academic study of responses to synthetic data in Latin America to illustrate distinctions between dreams and realities as well as to push for reimagining what synthetic data are and could be.
Fictions of Synthetic Data in Finance
The European Data Protection Supervisor defines synthetic data as “artificial data that [are] generated from original data and a model that is trained to reproduce the characteristics and structure of the original data.” While this framing is technically accurate and useful for legal purposes, it misses the extent to which synthetic data are narrative-laden fictions.
Synthetic data are promoted as risk-free, privacy-preserving, and innovation-friendly. In finance, synthetic data are said to enable experimentation without endangering consumer privacy, as well as to simulate risk scenarios without exposing real data.
Yet the term “synthetic” is not an innocent descriptor. Once again in finance, ‘synthetic’ evokes the ghosts of risky instruments at the heart of global market collapse. The 2007–08 financial crisis, precipitated in part by synthetic products like collateralized debt obligations, left lasting stigmas related to synthetic products. Financial professionals approach anything synthetic with caution. It is no coincidence that the Financial Times interview with Sam Altman cited above opted for the term “computer-generated data” rather than “synthetic”. Even though the meaning and uses of what counts as synthetic have evolved since the financial crisis, the term persistently evokes worries in this sector, whether in relation to the synthetic risk transfer market or to so-called synthetic stablecoins.
In financial contexts, narratives around synthetic data are therefore marked by a persistent tension: they are presented as innovative and promising yet remain overshadowed by enduring associations with risk and opacity.
In financial contexts, narratives around synthetic data are therefore marked by a persistent tension: they are presented as innovative and promising yet remain overshadowed by enduring associations with risk and opacity. This specific context prompts us to posit that synthetic data must be understood not only as technical artifacts but also as infrastructural narratives.
Synthetic data operate within wider socio-material systems that define what counts as scarcity, what is framed as innovation, and whose expertise is deemed legitimate. This is why undertaking what Science and Technology Scholars call “infrastructural inversion” is necessary: identifying frictions between synthetic data narratives and infrastructural realities.
Frictions of Synthetic Data: The Case of Latin America
Latin America, and particularly Brazil’s PIX payments system, highlights the disconnect between the Global North’s narratives of synthetic data (as a solution to “data droughts”), and the actual realities of data abundance, governance bottlenecks, and human capacity constraints in the Global South.
Institutions in the Global North frequently portray the South as suffering from a lack of data, arguing that synthetic data could fill these gaps as well as accelerate financial inclusion. Yet our research into the realities of synthetic data in Latin America, particularly Brazil’s PIX system, illustrates the opposite. Far from a data drought, the challenge is more accurately described as a persistent data deluge, one that requires resources for actively managing too much rather than too little data, undermining a key justification for producing, selling, and utilising synthetic data.
PIX is an instant payment system developed and operated by Brazil’s Central Bank. Since November 2020, it has enabled real-time transfers and payments in Brazilian reais, available 24/7, including weekends and holidays, without interruptions or transaction fees, based on the Banco Central do Brasil’s definition. In 2024, it processes over 6 billion financial transactions monthly, handling more than 2 trillion reais (approximately $338 billion) and accounting for three quarters of all daily financial transactions in Brazil were conducted through PIX in 2024. It is the country’s most widely used payment method, surpassing debit and credit cards, and the second-largest national payment system in the world. In fact, PIX accounted for 15 percent of all global instant payments, an astonishing figure for a system less than five years old.
PIX forms many citizen’s first sustained interaction with the formal financial system. It therefore functions simultaneously as a payment infrastructure and as an inclusion mechanism: a Digital Public Infrastructure (DPI) with potential to become core to Latin America’s financial backbone.
By drastically reducing merchant fees compared to credit cards, PIX enabled millions of small-scale vendors to participate in digital finance. Street markets and neighbourhood stores increasingly adopted PIX QR codes, making instant digital payment a normalized everyday practice. PIX forms many citizen’s first sustained interaction with the formal financial system. It therefore functions simultaneously as a payment infrastructure and as an inclusion mechanism: a Digital Public Infrastructure (DPI) with potential to become core to Latin America’s financial backbone. Uruguay implemented it in 2025 for Brazilians in the country and Uruguayan bank account holders when in Brazil. In 2023, Argentina expressed interest in linking to PIX, and discussions of regional interoperability have been underway. Peruvian regulators explicitly cited PIX as a model for developing their own real-time payment infrastructure, and European and East Asian institutions have also studied PIX closely.
Our study of PIX in the context of efforts to promote the use of synthetic data in Latin America shows that it poses challenges to the Global North narratives of data scarcity, and synthetic data does not emerge as a need for the most relevant financial digital infrastructure. As one PIX official put it in an interview we conducted in late 2024: “What we need to improve, in the predictive terrain, is already being addressed through real data.” In other words, the challenge for Brazil is not the absence of data but the human organizational capacity to process and make sense of the volumes already being produced. This resonates with broader critiques of synthetic data narratives, which often obscure the socio-technical labour required to organize, curate, and interpret financial data.
Brazilian officials have themselves narrated PIX as a potential node for international integration. In 2024, Gabriel Galípolo, President of the Central Bank, remarked: “PIX has the potential to integrate with international instant payment systems. Today, technology is no longer a barrier to this connection”. His colleague Otávio Damaso reinforced this perspective, explaining: “When parties are interested, there are no technical issues. When they are not, all technical problems arise.” Such statements make clear that the true bottlenecks faced in the region are political and organizational, not technical or data-related. The core challenge lies not in the absence of data but in governance frameworks, organizational resources, and skilled expertise.
The core challenge lies not in the absence of data but in governance frameworks, organizational resources, and skilled expertise.
Interviews we conducted with Latin American financial regulators in 2024 confirmed that staff shortages and gaps in advanced technical expertise, particularly in machine learning and data science, were far more restrictive than the absence of data. PIX faces human resource constraints. Interviews with Central Bank of Brazil staff underscored how waves of retirement and demands for improved working conditions had reduced available expertise. PIX generates staggering amounts of data—around 200 million transactions daily—the Central Bank has struggled to maintain sufficient analytic capacity to keep pace. This mismatch has created vulnerabilities in areas such as cybersecurity and fraud detection, vulnerabilities that cannot be solved by layering synthetic data on top of existing infrastructures.
Risks and Consequences
PIX demonstrates how Global South infrastructures are frequently rendered invisible in Global North framings of synthetic data’s value and potential. Narratives of “data drought” position the South as an empty space awaiting artificial supplementation. Yet PIX demonstrates that the reality is more complex: there is no absence of data, but rather an abundance that requires human governance, coordination, and expertise. The fiction of data scarcity risks shifting attention away from these pressing socio-technical infrastructural challenges. Synthetic data appear to offer little added value in this context. While they may have niche uses in training fraud detection systems or stress-testing cybersecurity scenarios, they do not address the fundamental challenges facing evolving financial infrastructures of the Global South.
By framing the Latin American region as data-deficient, synthetic data narratives risk obscuring the pressing realities of governance, labour capacity, and political coordination.
By framing the Latin American region as data-deficient, synthetic data narratives risk obscuring the pressing realities of governance, labour capacity, and political coordination. The PIX case illustrates how synthetic data fictions from the Global North collide with infrastructural frictions in the Global South. Rather than filling a void, synthetic data risk becoming a distraction from grounded, context-specific innovations already underway, some of which may need synthetic data, but not across all sectors, nor region-wide. And there is even more to it.
Synthetic data are narrated as risk-free technologies, particularly in finance, where their appeal lies in the promise of experimentation without endangering privacy or destabilizing markets. Yet closer examination reveals that these fictions generate new frictions, producing risks that are often obscured by the optimism surrounding their deployment. The claim that synthetic data preserve privacy is deeply contested. Even when individual identifiers are removed, the process of anonymization can expose sensitive information, thereby undermining the very trust that synthetic data are supposed to secure. The fiction of anonymity, then, produces its own set of vulnerabilities.
Producing synthetic data also does not eliminate bias. Because they are derived from existing datasets and generative models, synthetic data typically reproduce the same social and structural inequalities embedded in their source material.
Producing synthetic data also does not eliminate bias. Because they are derived from existing datasets and generative models, synthetic data typically reproduce the same social and structural inequalities embedded in their source material. Rather than correcting for bias, they can recast it in new forms, making it more difficult to detect and contest. Furthermore, producing synthetic data – like all digital activities – carries an environmental cost. The computational intensity of generative models leaves a “dirty footprint” in the form of energy consumption, raising questions about sustainability that are largely absent from dominant narratives.
Synthetic data, finally, contributes to opacity within speculative domains such as decentralized finance, or DeFi. Their growing production, use, and circulation in the burgeoning blockchain and synthetic crypto-asset ‘space’ supports heightened speculative activity, amplifying the kinds of risks and uncertainties that financial regulators are tasked with managing. In this respect, synthetic data echo the dynamics of earlier synthetic financial instruments: they offer the illusion of control while intensifying instability. Taken together, these issues lead us to conclude that far from reducing risks, synthetic data merely redistribute risks in new ways. Promising innovation, neutrality, and privacy, in practice synthetic data also frequently generate additional layers of risks that require a reimagining of what they are and can do.
Toward Counter-Fictions
As we’ve seen, infrastructural frictions consistently constrain the realization of synthetic data fictions. Narratives of data drought fail to recognize the abundance of transactional data produced through systems like Brazil’s PIX. More importantly, synthetic data risk bypassing real priorities of human governance and institutional coordination that are far more pressing than artificial data generation. What is needed are counter-fictions: alternative narratives that resist technocratic determinism and foreground accountability, equity, and democratic contestation. Counter-fictions about synthetic data should be attentive to the socio-political contexts of the Global South, where infrastructural realities diverge sharply from Global North framings.
Synthetic data are not inherently harmful, nor are they transformative on their own. Their world-making potential depends on the infrastructures into which they are embedded, and on the narratives that shape their use.
Synthetic data are not inherently harmful, nor are they transformative on their own. Their world-making potential depends on the infrastructures into which they are embedded, and on the narratives that shape their use. To ensure more equitable futures, synthetic data narratives must move beyond fictions of scarcity and neutrality, and instead be situated in the human and institutional capacities that can make existing data meaningful, accountable, and inclusive.