On the way to 100 years of Flickr

A report on archival strategies

By Ashley Kelleher Skjøtt

Flickr is an important piece of social history that pioneered user-driven curation, through folksonomic tags and through a publicly-accessible platform at scale, crystallising the web 2.0 internet. Applying tags to one’s own images and those of others, Flickr’s users significantly contributed to the emergence of commons culture. These collective practices became a core tenet of Flickr’s design ethos as a platform, decentralising and democratising the role of curation.

Of course, Flickr was not alone in pioneering this—hashtags and social sharing on other platforms added momentum to the general shift which was overall democratising by giving users agency over what they shared, experienced, and categorised. This shift in curatorial agency is just one aspect of Flickr’s significance as a living piece of social history.

Flickr continues to be one of the largest public collections of photographs on the planet, comprising tens of billions of images. Flickr celebrated its 20th birthday in February 2024. The challenge of archiving Flickr at scale, then, perhaps becomes about designing processes for preservation which can also be decentralised.

In August 2023, I learnt from a dear friend and colleague, Dan Pett, that the Flickr Foundation, newly based in London, was beginning to build an innovative archival practice for the platform. With my interest in digital cultural memory systems, an interest for which I have moved continents, I was determined to contribute in some way to the Foundation’s new goal. After exploring and discussing the space with George Oates, Director of the Flickr Foundation, we agreed that a practice-based information-gathering exercise could be useful in building up an understanding of such a practice.

So, what would an archive for Flickr look like?

Flickr is a living social media environment, with up to 25 million images uploaded each day. The reality of the company’s being acquired by a number of different parent companies over the course of its 20-year lifetime—already a remarkable timespan by social media standards—additionally brings to the forefront a stark case for working to ensure the availability of its contents into the long future. This is a priority shared today between Flickr itself and the new Flickr Foundation.

I have prepared a report of findings, written over a deliberately slow period and which aims to present a colloquial yet current answer to the question of archival practice for Flickr as a unique case, both when it comes to scale and defining what should be prioritised for preservation. Presuming that the platform is not invulnerable to media obsolescence, what on earth (or space) should an archive preserving the best of Flickr look like today? The work of asking this question again and again through the days, months, years, and decades to come leads us to the Foundation’s own question: what does it look like to ensure Flickr lasts for one hundred years?

REPORT: 20 Years of Flickr: Archiving the Living Environment

This information-gathering exercise consisted of seven interviews with sector peers across a wide range of practice, from academia to a small company, to a global design practice and within the museum world. My sincere thanks to:

  • Alex Seville (Head of Flickr),
  • Cass Fino-Radin (Small Data Industries),
  • Richard Palmer (V&A Museum),
  • Annet Dekker (University of Amsterdam),
  • Jenny Basford (British Library),
  • Matthew Hoerl (Arch Mission Foundation), and
  • Julie May (Bjarke Ingels Group)

Many thanks for taking the time to generously share their thoughts on the prospect, reflections on their own work, and expertise in the area.

The report sets out to define the value of what should be preserved for Flickr, as (1) a social platform, (2) a network-driven community, (3) a collection of uniquely user-generated metadata, and (4) as an invaluable image collection, specifically of photography. It then proceeds through a discussion of risks identified through the course of interviews. Finally, it proceeds through ten identified areas of practice which can be addressed in the Foundation’s archival plan, divided into long- and short-term initiatives. The report closes with six recommendations for the present.

An archive for Flickr which honours its considerable legacy should be created in the same vein. One interviewee reflected that the work of the archivist is to select what to preserve. This is, effectively, curation – the curation of archival material. It follows then, that if a central innovation of Flickr as a platform was to democratise the application of curatorial tools – enabling tags as metadata based in natural language, at scale – then the approach to archiving such a platform should follow this model in allowing its selection to be driven by users. What about a “preserve” tag?

Thanks to Flickr and other internet pioneers, this is far from any kind of revolutionary idea – and is one worth creating an archival practice around, so that coming generations can access the stories we want to tell about Flickr: the story of the internet, of the commons, of building open structures to find new images and of what it means to be a community, online.

Introducing Eryk Salvaggio, 2024 Research Fellow

Eryk Salvaggio is a researcher and new media artist interested in the social and cultural impacts of artificial intelligence. His work, which is centered in creative misuse and the right to refuse, critiques the mythologies and ideologies of tech design that ignore the gaps between datasets and the world they claim to represent. A blend of hacker, policy researcher, designer and artist, he has been published in academic journals, spoken at music and film festivals, and consulted on tech policy at the national level.

Ghosts in the Archives Become Ghosts in the Machines

I’m honored to be joining the Flickr Foundation to imagine  the next 100 years of Flickr, thinking critically about the relationships between datasets, history, and archives in the age of generative AI. 

AI is thick with stories, but we tend to only focus on one of them. The big AI story is that, with enough data and enough computing power, we might someday build a new caretaker for the human race: a so-called “superintelligence.” While this story drives human dreams and fears—and dominates the media sphere and policy imagination—it obscures the more realistic story about AI: what it is, what it means, and how it was built.

The invisible stories of AI are hidden in its training data. They are human: photographs of loved ones, favorite places, things meant to be looked at and shared. Some of them are tragic or traumatic. When we look at the output of a large language model (LLM), or the images made by a diffusion model, we’re seeing a reanimation of thousands of points of visual data — data that was generated by people like you and me, posting experiences and art to other people over the World Wide Web. It’s the story of our heritage, archives and the vast body of human visual culture. 

I approach generated images as a kind of seance, a reanimation of these archives and data points which serve as the techno-social debris of our past. These images are broken down — diffused — into new images by machine learning models. But what ghosts from the past move into the images these models make? What haunts the generated image from within the training data? 

In “Seance of the Digital Image” I began to seek out the “ghosts” that haunt the material that machines use to make new images. In my residency with the Flickr Foundation, I’ll continue to dig into training data — particularly, the Flickr Commons collection — to see the ways it shapes AI-generated images. These will not be one to one correlations, because that’s not how these models work.

So how do these diffusion models work? How do we make an image with AI? The answer to this question is often technical: a system of diffusion, in which training images are broken down into noise and reassembled. But this answer ignores the cultural component of the generated image. Generative AI is a product of training datasets scraped from the web, and entangled in these datasets are vast troves of cultural heritage data and photographic archives. When training data-driven AI tools, we are diffusing data, but we are also diffusing visual culture. 

 

Eryk Salvaggio: Flowers Blooming Backward Into Noise (2023) from ARRG! on Vimeo.

 

In my research, I have developed a methodology for “reading” AI-generated images as the products of these datasets, as a way of interrogating the biases that underwrite them. Since then, I have taken an interest in this way of reading for understanding the lineage, or genealogy, of generated images: what stew do these images make with our archives? Where does it learn the concept of what represents a person, or a tree, or even an archive? Again, we know the technical answer. But what is the cultural answer to this question? 

By looking at generated images and the prompts used to make them, we’ll build a way to map their lineages: the history that shapes and defines key concepts and words for image models. My hope is that this endeavor shows us new ways of looking at generated images, and to surface new stories about what such images mean.

As the tech industry continues building new infrastructures on this training data, our window of opportunity for deciding what we give away to these machines is closing, and understanding what is in those datasets is difficult, if not impossible. Much of the training data is proprietary, or has been taken offline. While we cannot map generated images to their true training data, massive online archives like Flickr give us insight into what they might be. Through my work with the Flickr Foundation, I’ll look at the images from institutions and users to think about what these images mean in this generated era. 

In this sense, I will interrogate what haunts a generated image, but also what haunts the original archives: what stories do we tell, and which do we lose? I hope to reverse the generated image in a meaningful way: to break the resulting image apart, tackling correlations between the datasets that train them, the archives that built those datasets, and the images that emerge from those entanglements.