Is data laundering the next big heist of big tech?

During the rise of social media and search the consumer sold its soul for pictures of cats and ability to share. Now they are coming for the rest.

Jul 03, 2024

We have all heard the saying “If you are not paying for the product, you are the product.” It's clever way to say that our “attention” is valuable and companies like Meta, Google and Apple are happy to monetize our attention via apps, clicks, ads and any other ways they can think up.

Now these same companies want to use our conversations and content (in some cases copyrighted) to power the training of their A.I. models. Up until now, they essentially laundered our data by using it for training which has effectively reduced the chance of violating copyright or privacy terms since the output was a combination of many different works and not a direct copy of one. In fact, one could argue A.I. was “inspired” by the original content and created something entirely new and original.

The irony is that these same companies are seeing their own content being used to train other A.I. models without their consent. This two edged sword puts all of us in the middle of a standoff where data laundering may be a legitimate way to build next generation A.I. and quid pro quo is the law of the land.

The Breakdown

In the rapidly evolving landscape of Artificial Intelligence (AI) and machine learning, companies are progressively altering their terms of service and privacy policies to cater to the growing need for data to train A.I. models. These adjustments have significant implications for privacy, ethical use, and consent, sparking widespread discussions and concerns.

Privacy Policy Change

A groundbreaking change was recently executed by Google in July; the tech giant subtly modified its privacy policy to specify that publicly available information could be employed to train its A.I. models. The new policy explicitly states that Google uses publicly available information to help train its language A.I. models and build products and features like Google Translate, Bard, and Cloud A.I. capabilities. This marked change emphasizes Google's commitment to integrating A.I. more deeply into its services by leveraging publicly accessible data.

Wording Change

The strategic change in wording is not merely a formality; it represents a broader industry trend. Companies, including Google, are recognizing the immense value in publicly available data for training A.I. models that power essential features and products.

Industry Trend

Google is not isolated in this pursuit. Other tech companies, such as Snap and Meta, are also revising their terms of service to incorporate clauses that allow for the use of data in A.I. and machine learning models. This collective shift indicates an industry-wide move towards embracing A.I. and ensuring that vast reservoirs of public data are utilized for these purposes.

User Data Utilization

Specifically, companies like Snap and Meta have been transparent in their updates, informing users that their public posts and data shared with A.I. chatbots will be used to train these sophisticated models. This utilization of user data aims to enrich the training datasets, thereby enhancing the performance and capabilities of AI-driven features.

User Concerns

However, as these changes take effect, they raise significant concerns among users, particularly those in creative fields like writing, illustration, and visual arts. The primary worry is that their work, shared online, could be incorporated into training datasets without explicit consent, potentially threatening their livelihoods and intellectual property rights.

Broader Implications

These policy updates reflect a considerable shift in how companies treat user data, emphasizing the central role A.I. is poised to play in the future of tech products and services. The aggregation and use of publicly available data signal a move towards more integrated and powerful A.I. applications.

Regulatory and Ethical Issues

Finally, these modifications underscore ongoing tensions and debates over privacy, consent, and the ethical use of data in A.I. development. As companies increasingly rely on public data, striking the right balance between innovation and user rights becomes ever more crucial. Regulatory bodies may need to step in to ensure that these practices are fair and transparent, protecting users' interests while allowing technological advancements to flourish.

The Big Question: Is Training on Data Enough to Bypass Privacy or Copyright Concerns?

A significant question arising from these developments is whether the training of A.I. models on publicly available data is sufficient to bypass privacy or copyright concerns. Companies argue that the output generated by AI—whether text, images, or other content—differs enough from the original data to mitigate these issues. However, this stance is not without controversy.

Privacy Implications

There is an ongoing debate over whether the utilization of public data infringes on individuals' privacy rights. Even if the AI's output is not a direct copy, the patterns and insights derived from the data could still represent a breach of privacy.

Copyright Concerns

The matter of copyright is equally contentious. Creators argue that using their works as part of A.I. training datasets without explicit permission constitutes a violation of their intellectual property rights, regardless of whether the generated content is sufficiently altered. The transformative nature of the AI's output is a critical factor in this debate, yet it does not wholly exempt ethical and legal scrutiny.

Final Thoughts

As A.I. continues to evolve, companies are making critical changes to their terms of service and privacy policies to harness the potential of user data. While this paves the way for innovative AI-driven products and features, it also brings forth challenges related to privacy, consent, and ethical data use. Adding to these challenges is the critical question of whether training on data can indeed circumvent privacy or copyright concerns if the AI's output is altered enough. Users, particularly in creative sectors, must stay informed and engaged in these developments to ensure their rights and interests are safeguarded.