The Ghost Stays in the Picture, Part 1: Archives, Datasets, and Infrastructures

Eryk Salvaggio is a 2024 Flickr Research Fellow, diving into the relationships between images, their archives, and datasets through a creative research lens. This two-part series focuses on the ways archives such as Flickr can shape the outputs of generative AI in ways akin to a haunting.

“The Absence Becomes the Thing.”
– Rindon Johnson,
from The Law of Large Numbers

Every image generated by AI calls up a line of ghosts. They haunt the training data, where the contexts of photographs are reduced to the simplest of descriptions. They linger in the decisions of engineers and designers in what labels to use. The ghosts that haunt the generated image are hidden by design, but we can find them through their traces. We just need to know how to look.

As an artist, the images created by AI systems are rarely interesting to me solely as photographs. I find the absences that structure these images, and the stories told in the gaps, to be far more compelling. The images themselves recycle the tropes of their training data. By design, they lean into the most common patterns, changing the details like a lazy student changing the words of a plagiarized essay.

I don’t turn to generative AI for beautiful images. I look for evidence of ghosts.

What exactly is a ghost in an AI system? It’s a structure or decision that haunts information in barely discernible, even invisible, ways. Datasets are shaped by absences, and those absences shape the image. As a diffusion model seeks the path to an image, the absence of pathways constrains what is possible. We can read these paths by looking at AI images critically, addressing the negative space of what appears on our screens. Who are the people we don’t see? What are the stories these images cannot tell?

This can mean absences in representation. When we have thousands of photographs of white children tagged as “girls,” but few black children, black girls are absent from the images. Absence haunts the generated image, shaping it: we will see mostly white girls because black girls have been pushed to the edges. This is not just a glib example. The exact scenario is precisely what I found when I analyzed a dataset used for training image generation tools and automated surveillance systems in 2019. The pattern holds today. Victorian-era portraits of white girls are prevalent in the training data for generative AI systems such as Stable Diffusion. Black girls are absent, with highly sexualized images of adult women taking their place.

Infrastructure makes ghosts, too. We build complex systems one step at a time, like a set of intersecting hallways. Artificial Intelligence is, at its heart, a means of automating decisions. They carry decisions from the past into the future. Once we inscribe these decisions into code, the code becomes infrastructure, subsumed into a labyrinth made by assembling the code of others and code yet to be written. As we renovate these structures through new code or system upgrades, the logic of a particular path is lost. We may need to build new walls around it. But when we bury code, we bury decisions beneath a million lines of if/then statements, weights, and biases of machine learning. Unchallenged, the world that has slipped past us shapes the predictions of these systems in ways we cannot understand.

This is true of most data driven, automated systems, whether we are talking about resume filters or parole decisions. For the generated photograph, these decisions include how we test and calibrate image recognition systems, and how we iterate upon these systems with every new model launch and interface.

Diffusion models — at the core of image generation systems — are an entanglement of systems. It relies on one system to label images, examining how pixels are clustered and matching them with human descriptions. We relied on underpaid labor by humans to test these systems by comparing the results of that tool to what they saw themselves. These comparisons are recorded and integrated into the memory of the model. The actions of those people were fused into the infrastructure of the model, shaping decisions long after they stopped working on the dataset.

We tend to make up stories about synthetic images based on what’s depicted. That is the work of the human imagination: a way of making sense of media based on references we’ve seen before. That is a ghost story, too. But if we want to meet the ghosts that shape AI-generated images, we have to dig deeper into the systems that produce them. The AI ghost story is a story of the past reaching into the present, and to understand it, it helps to know the lineage of those decisions.

Image synthesis has a history, and that history lingers in black boxes of the neural nets as they shape noisy pixels into something recognizable. Part of that story is the datasets, but data is applied to a vast number of uses. One of those uses is training larger systems to sort and handle larger sums of data.

Data shapes data infrastructure. From small sets of data, patterns are found and applied to larger sums of data. These patterns are repeatedly invoked whenever we call upon these systems into the future. The source data is always an incomplete model of things. But nonetheless, it is applied to larger datasets, which inherit and amplify the gaps, absences, and decisions from the past.

This is part of my creative research work on the seance of the digital archive. It focuses not only on data, but the lineage of data and the decisions made using that data to shape larger systems. A key piece of this lineage, and one that merits deeper exploration, is Flickr.

The Archive and the Dataset

With the rise of generative AI, vast troves of visual, textual, and sonic cultural heritage data have been folded into models that create new images, text, even music. But images are host to a special kind of spectral illusion. Most images shared online were never intended to become “data,” and in many ways, this transformation into data is at odds with the real value at the heart of what these archives preserve.

What is the difference between an archive and a dataset? We are dealing with many levels of abstraction here: an archive consists of individual objects designed to serve some human purpose. These objects may then be curated into a collection. It may be a collection of pamphlets, political cartoons, or documentary photographs. It may be the amateur photographer aiming to preserve a snapshot of a birthday party whose daughter and granddaughter celebrated alongside one another. Flickr, as a photo-sharing website, is host to all of these. The miracle of data, compression, and the world wide web is that the same infrastructures can be shared for moments important to “history” but also to the individual. It preserves images from cultural heritage institutions and family beach outings alike.

Flickr is three things at once: an archive and a dataset, most famously. But it is also a kind of data infrastructure. Let’s explore these one by one.

Flickr is an archive. It is a website that preserves history. It holds digital copies of historical artifacts for individual reflection and context. Flickr is a website for memories, stored in its copies of images, snapshots, aids to the remembrance of personal stories. These are assembled into an archive, a collective photo album. Flickr as an archive is a place where the context of an individual item is preserved. But we make sense of this archive socially. Meanings change as users sort these images, tag them, and reuse them (with permission) across the web. The archive is a collection of images with their own history beyond the website itself.

Flickr is a dataset. Flickr images can be described, at scale, in pure numbers. In 2011, the web site claimed to have 6 billion images and more recently boasted of having “tens of billions” of photos, with estimates of 25 million uploads per day. By contrast, the largest widely used image dataset used in machine learning, LAION 5B, contains roughly 5.85 billion images. Flickr as a massive, expanding dataset poses a particular set of challenges in thinking about its future. One of these is the daunting task of sorting and understanding all of those images. The dataset, then, is really just the archive viewed through the abstraction of scale. Billions of images now seen as one data set, with each image merely a piece of the collective whole. As a dataset, we focus on the ways the entirety of that set can be preserved and understood.

But in shifting our lens of focus from an archive to a dataset, individual details become less important. In changing scales in this way, it’s important to move fluidly between them — much as we close one eye, then the other, as we look at the letters of the eye exam. If we want to tackle the myopia of design decisions, we must get used to shifting between these two views, rather than treating one as the sole way we see the world.

What does it mean for Flickr to be “infrastructure” for AI? It helps to define this slippery term, so I turn to a definition used by the Initiative for Public Digital Infrastructure at UMass Amherst:

“Infrastructures are fundamental systems that allow us to build other systems—new houses and businesses rely on the infrastructures of electric power lines, water mains, and roads—and infrastructures are often invisible so long as they work well.”

In the relationship to images in particular, Katrina Sluis describes the shift in meaning attributed to images as their context shifts from archive to data infrastructures:

“Photographic culture is now being sustained by a variety of agents who sit outside the traditional scope of cultural institutions. I’m thinking here of the computer scientist, web designer, Silicon Valley entrepeneur or Amazon Mechanical Turker. And of course, these are actors who are not engaged with photographic culture and the politics of representation, the history of photography or the inherent polysemy of the image. In the computer science lab, the photograph remains relatively uncomplicated – it is ultimately a blob of information – whether materialized as a “picture” or left latent as data.”

Flickr’s images played an important role in shaping image recognition systems at the outset, and in turn, image generation systems. As a result of this entrenchment of images into AI, many Flickr images have become a form of “accidental infrastructure” for AI. I should be clear that Flickr has not trained a generative AI model of its own for the production of new images, nor has it arranged, as of this writing, for the sale of images for use in image training.

When we examine Flickr as infrastructure, we will see that these two worlds — archive and dataset — have come to occupy the same space, complicating our understanding of them both. Flickr’s movement from archive to dataset in the eyes of AI training isn’t permanent. It reflects a historical shift in the ways people understand and relate to images. So it is worth exploring how that shift changes what we see, and how ghosts from the archive come to “haunt” the dataset. In establishing these two lenses of focus, we might find strategies of shifting between the two. This can help us better articulate the context of images that have built, and likely will continue to build, the foundations of generative AI systems and the images these systems produce.

Flickers in the Infrastructure

How did Flickr’s transition from archive to dataset allow it to become a piece of AI infrastructure?

It started with one of the first breakthroughs in AI image generation — StyleGAN 2. StyleGAN 2 could produce images of human faces that were nearly photorealistic. It was entirely a result of the FFHQ dataset, which NVIDIA made from 70,000 Flickr portraits of faces. NVIDIA’s dataset drew on photographs from Flickr and, notably, warned that the dataset would inherit Flickr’s biases. The FFHQ dataset also went on to be used for image labeling and face recognition technologies, too.

We can easily trace the influence of that dataset on the faces StyleGAN 2 produced. In 2019, I did my own analysis of that dataset, looking image by image at the collection. In so doing, I examined the dataset through the lens of an archive. I looked at it as a collection of individual photographs, and individual people. I discovered that less than 3% of the faces sampled from the dataset contained black women. As a result, the faces produced by the image model were less likely to generate faces of black women. When it did, they were less photorealistic than other faces. The absences were shaping what we saw.

If datasets are haunted, then the synthetic image is a seance — a way of generating a specter from the datasets. This word, specter, refers to both the appearance of a spirit, but also the appearance of an image, deriving from the Latin for spectrum. The synthetic image is a specter. It’s an image which appears from an unknown place. It is one slice from a spectrum of possible images associated with a prompt. Absences in the dataset constrained the output of possible images. Without black women in the dataset, black women were not in the images. This is one way absences can haunt the synthetic image.

But there is another case study worth exploring, which is the ways that Flickr haunts infrastructures of AI. How did the dataset shape the automated decision making processes that were then included in longer, more complex systems of image generation?

In part two of this blog post, we’ll look at YFCC100M, a dataset of 99.2 million photos released in June 2014. And we’ll look at the path it has taken as it moved the world’s relationship to this collection of Flickr images from images, into an archive, into a dataset. Along the way, we’ll see how that dataset, by becoming a go-to reference for calibration and testing of image recognition and synthesis, became infused into the infrastructures of generated images.

On the way to 100 years of Flickr

A report on archival strategies

By Ashley Kelleher Skjøtt

Flickr is an important piece of social history that pioneered user-driven curation, through folksonomic tags and through a publicly-accessible platform at scale, crystallising the web 2.0 internet. Applying tags to one’s own images and those of others, Flickr’s users significantly contributed to the emergence of commons culture. These collective practices became a core tenet of Flickr’s design ethos as a platform, decentralising and democratising the role of curation.

Of course, Flickr was not alone in pioneering this—hashtags and social sharing on other platforms added momentum to the general shift which was overall democratising by giving users agency over what they shared, experienced, and categorised. This shift in curatorial agency is just one aspect of Flickr’s significance as a living piece of social history.

Flickr continues to be one of the largest public collections of photographs on the planet, comprising tens of billions of images. Flickr celebrated its 20th birthday in February 2024. The challenge of archiving Flickr at scale, then, perhaps becomes about designing processes for preservation which can also be decentralised.

In August 2023, I learnt from a dear friend and colleague, Dan Pett, that the Flickr Foundation, newly based in London, was beginning to build an innovative archival practice for the platform. With my interest in digital cultural memory systems, an interest for which I have moved continents, I was determined to contribute in some way to the Foundation’s new goal. After exploring and discussing the space with George Oates, Director of the Flickr Foundation, we agreed that a practice-based information-gathering exercise could be useful in building up an understanding of such a practice.

So, what would an archive for Flickr look like?

Flickr is a living social media environment, with up to 25 million images uploaded each day. The reality of the company’s being acquired by a number of different parent companies over the course of its 20-year lifetime—already a remarkable timespan by social media standards—additionally brings to the forefront a stark case for working to ensure the availability of its contents into the long future. This is a priority shared today between Flickr itself and the new Flickr Foundation.

I have prepared a report of findings, written over a deliberately slow period and which aims to present a colloquial yet current answer to the question of archival practice for Flickr as a unique case, both when it comes to scale and defining what should be prioritised for preservation. Presuming that the platform is not invulnerable to media obsolescence, what on earth (or space) should an archive preserving the best of Flickr look like today? The work of asking this question again and again through the days, months, years, and decades to come leads us to the Foundation’s own question: what does it look like to ensure Flickr lasts for one hundred years?

REPORT: 20 Years of Flickr: Archiving the Living Environment

This information-gathering exercise consisted of seven interviews with sector peers across a wide range of practice, from academia to a small company, to a global design practice and within the museum world. My sincere thanks to:

  • Alex Seville (Head of Flickr),
  • Cass Fino-Radin (Small Data Industries),
  • Richard Palmer (V&A Museum),
  • Annet Dekker (University of Amsterdam),
  • Jenny Basford (British Library),
  • Matthew Hoerl (Arch Mission Foundation), and
  • Julie May (Bjarke Ingels Group)

Many thanks for taking the time to generously share their thoughts on the prospect, reflections on their own work, and expertise in the area.

The report sets out to define the value of what should be preserved for Flickr, as (1) a social platform, (2) a network-driven community, (3) a collection of uniquely user-generated metadata, and (4) as an invaluable image collection, specifically of photography. It then proceeds through a discussion of risks identified through the course of interviews. Finally, it proceeds through ten identified areas of practice which can be addressed in the Foundation’s archival plan, divided into long- and short-term initiatives. The report closes with six recommendations for the present.

An archive for Flickr which honours its considerable legacy should be created in the same vein. One interviewee reflected that the work of the archivist is to select what to preserve. This is, effectively, curation – the curation of archival material. It follows then, that if a central innovation of Flickr as a platform was to democratise the application of curatorial tools – enabling tags as metadata based in natural language, at scale – then the approach to archiving such a platform should follow this model in allowing its selection to be driven by users. What about a “preserve” tag?

Thanks to Flickr and other internet pioneers, this is far from any kind of revolutionary idea – and is one worth creating an archival practice around, so that coming generations can access the stories we want to tell about Flickr: the story of the internet, of the commons, of building open structures to find new images and of what it means to be a community, online.

Introducing Eliza Gregory, research partner

Eliza Gregory is a social practice artist, a photographer, an educator and a writer.

Research is a key facet of the Flickr Foundation’s work. We are gathering a group of intersectional researcher partners to question the idea of a 21st century image archive together, and Eliza is one of them.

Who ARE you, Eliza?

My name is Eliza Gregory. I’m a mom of two daughters, a wife/partner, a photographer, a social practice artist, a curator, and an educator. I like cake and noodles and I keep chickens. I have issues with chronic clutter. I am getting more and more interested in plants. This might be the result of middle age, or it might be related to feeling like connecting with plants is the roadmap back from total social and environmental collapse. Or both.

For about ten years I made work about cultural identity and cultural adaptation through a mixture of large format portraiture, interviews, events and relationships. Those projects focused on resettled refugee households in Phoenix, Arizona; mapping the wide array of Australian cultural identities (indigenous, recent-immigrant, and long-time-ago-immigrant; cultural identity tied to gender and sexuality, etc.) in the neighborhood where I lived in Melbourne; and immigration to the Bay Area in California over the last 40+ years.

More recently, I curated a show called Photography & Tenderness that investigates how we can hold photography accountable for the ways in which it has been used to build a racist society and somehow still use it to make something tender. That took place at Wave Pool Art Fulfillment Center in Cincinnati, OH as part of the Cincinnati FotoFocus 2022 Biennial.

And I’ve been working on a project I call [Placeholder], about holding and being held by place. It investigates relationships between people and land and asks what might happen if we acknowledged the fundamental rupture that has occurred between land and people, and began working to repair it. So far I’m mainly in the research phase of that work, but my research has taken place with my students at Sacramento State University, and with other artists, and I’ve pulled together two different exhibitions to invite audiences into that research at Axis Gallery, Sacramento, CA: [Placeholder] a studio visit with Eliza Gregory and [Placeholder]: florilegia.

 

I started out my career trained as a fine art photographer and a creative writer. I have always been interested in telling stories with pictures, but as soon as I tried my hand at it I got caught up in questions about the ethical implications of making an object about (i.e. objectifying) another person. I started to solve those problems by building out relationships and project structures that relied on exchange and accountability, and then went to grad school in Art & Social Practice at Portland State University. That program was a revelation for me and really provided the tools and the language I needed to keep building out my work in a way that felt good. In my experience, the dialogue around social practice is much more radical and useful and socially critical than the dialogue around photography, so I’ve really leaned into that space. But I still enjoy pictures and appreciate how powerful they can be.

Flickr is an interesting organization because it hosts a lot of pictures, but it also catalyzes a lot of relationships and interactions around those pictures. So Flickr represents an institution based around social practice and photography, in a certain way.

Why did you join as a research partner at flickr.org?

What is the relationship between justice and photographic representation? That is a question I think about a lot.

The human brain likes to simplify things. It’s how we are able to perceive so much and yet still focus on a single task or idea. And it’s why we take something like a human being, with a whole life full of perceptions and feelings and paradoxes, and reduce them to a single descriptor–child. American. Woman. White. Cis-gendered. Hetero. Middle aged. Tall. Pink. (I had someone I was photographing once tell me I was “big and pink.” And…I couldn’t argue.) Or we take an individual from another species, who has a whole life full of specific experiences, and reduce it to just the species name: rat. Grey squirrel. Monarch. Or even more reductively: Tree. Butterfly.

Photographs basically do the same thing. You take a whole moment filled with a million different feelings, thoughts, respirations, scents, sensations, views and reduce it to one small, flat, rectangle. And we call that a picture. And we equate it with “truth.”

That’s a problematic process, based on a problematic (though necessary and useful) human tendency. It’s inherently reductive. And yet we see it as a mechanism for communication, inquiry and learning. Photography can be a mechanism for those things, certainly. I used it for that purpose in a project called Massive Urban Change, where I photographed a dynamic urban environment that you can never fully take in SO that it would hold still; so that you could look at it more closely. But that reductive quality of photography can be used for radically different ends. It has also been a tool for building racist societies; for creating and cementing stereotypes; for mapping natural resources for extraction and destruction. Sometimes photography obfuscates truly important complexities by reducing things too much.

A lot of my work has been about interrogating the process of making photographs, especially of people (and now of places) to try to understand when photography is doing what we like to tell ourselves it’s doing, and when it’s doing something else.

I want to know, how do photographs shape the stories we tell ourselves, and how do those stories, in turn, shape society?

Thinking about Flickr is a way of approaching some of these questions. And thinking about how to conserve Flickr adds a whole new dimension to them.  I wanted to work with the Flickr Foundation mostly because I like the people it is bringing together–there is so much work going on around archiving images and cataloging images and reading images and finding certain images that goes beyond what I know as a maker of images. I love getting to be at the table with people who work on photography from such different angles. It helps blast me out of my normal frame of reference.

I also want to be bringing my students into photographic dialogues that are larger than our classroom. The Flickr Foundation is actively thinking about how to intersect with students and curriculum design. I want to create opportunities for my students to do meaningful work, and I see the Flickr Foundation as a partner in that.

Finally, I really love exhibitions. In some ways, exhibitions seem to be heading toward obsolescence, much like museums themselves. Both those structures are built on gatekeeping, colonial hierarchies, and a top-down, hierarchical flow of knowledge. So in the social practice dialogues I am a part of, sometimes the exhibition as a form feels sort of passé. But I love it as a way of creating experiences for people, of shaping or catalyzing dialogues, of giving people a gift. And the Flickr Foundation feels like a partner that I could potentially build visual experiences (exhibitions!) with.

What do you think will be the hardest parts of achieving its 100-year plan?

The questions around how to conserve digital material for a hundred years are HARD. That’s what I learned from bringing some of those questions to a group of senior photography students at Sacramento State University this fall. George has been delivering a 100 year plan workshop to various groups, and we conducted a version of that experience with my students. It’s basically asking people to think about what digital images will look like, consist of, and be viewed through in 100 years. As well as, What will it take to preserve a digital image we have now for that long? And how do you build an organization that can do that?

George had us start with finding an image of a place that’s meaningful to us, and then going out and trying to find the oldest photograph we can of that same place. Right away, that activity makes you think about how we view places, and what photographs we have access to, and what places we have access to visually. I once asked a group of photo history students, What is a photograph you wish you could see that’s impossible to make? A really surprising number of them said, “I wish I could see a picture of the pyramids being constructed!” That feels like a complementary mind-exercise to me, because we are so used to being able to see anything and everything we want in pictures. It’s important to remember that they haven’t always existed. And to contemplate what is un-photographable.

Then my students and I struggled to project our imaginations even into the near future to anticipate how technology will change, how behaviors will change around technology (both as it currently exists, and in terms of platforms and processes that haven’t been invented yet), and what it will mean to actually translate a jpg into multiple new file formats without losing whatever data make it a recognizable image in the first place.

Everything about this seems hard to me. The only things I’ve been able to hang on to so far, and visualize, are some of the foundation’s ideas around ritual—perhaps there will be a ritualized translation from one format to another every five or ten years. The idea that conserving something by allowing it to change feels very resonant—perhaps that is a shift in perspective that we are approaching on many fronts at once, from interpersonal relations (growth mindset!) to global ecology (I’m thinking of Anna Tsing’s book The Mushroom at the End of the World).

The scale is also difficult to fathom. 50 billion images is…so many images. And the collection is likely to grow. So the usual questions around archives are present too—what do we keep? What do we throw away? How does someone access the resource? How does someone FIND what they are looking for? (And along the way can we help them maybe find a few things they aren’t looking for but need or want to see?)

At the end we made zines to try to pull our thoughts together.

How do you hope to use the partnership to further your own research?

In my current artistic work, I research intergenerational narratives—both because inserting ourselves into them in families leads to improved mental health and in terms of how thinking about intergenerational narratives shifts our understanding of stewardship of the land that cares for us—and I’m a photographer. So the question, How do we approach the conservation of digital images for future generations? relates to HOW we are going to tell those intergenerational stories. I think that some of the long-term storytelling strategies we’ve lost track of or never understood within British-influenced contemporary American colonist culture—such as oral history and land-based, place-based knowledge—are tools we might turn to. But right now we are so image-obsessed that pictures will be in the mix too, and they might be the bridge that gets us to new (or old!) styles of connection, communication and storytelling.

Eliza Gregory is an artist and educator. She makes complex projects that unfold over time to reveal compassion, insight and new social forms.
www.elizagregory.org

With apologies to Eliza for leaving it so long to post this! ❤️
– George

Data Lifeboat Update 2a: Deeper research into the challenge of archiving social media objects

By Jenn Phillips-Bacher

For all of us at Flickr Foundation, the idea of Flickr as an archive in waiting inspires our core purpose. We believe the billions of photos that have amassed on Flickr in the last 20 years have potential to be the material of future historical research. With so much of our everyday lives being captured digitally and posted to public platforms, we – both the Flickr Foundation and the wider cultural heritage community – have begun figuring out how to proactively gather, make available, and preserve digital images and their metadata for the long term.

In this blog post, I’m setting my sights beyond technology to consider the institutional and social aspects that enable the collection of digital photography from online platforms.

It’s made of people

Our Data Lifeboat project is now underway. Its goal is to build a mechanism to make it possible to assemble and decentralize slivers of Flickr photos for potential future users. (You can read project update 1 and project update 2 for the background). The outcome of the first project phase will be one or more prototypes we will show to our Flickr Commons partners for feedback. We’re already looking ahead to the second phase where we will work with cultural heritage institutions within the wider Flickr Commons network to make sure that anything we put into production best suits cultural heritage institutions’ real-world needs.

We’ve been considering multiple possible use cases for creating, and importantly, docking a Data Lifeboat in a safe place. The two primary institutional use cases we see are:

  1. Cultural heritage institutions want to proactively collect born digital photography on topics relevant to their collections
  2. In an emergency situation, cultural heritage institutions (and maybe other Flickr members) want to save what they can from a sinking online platform – either photos they’ve uploaded or generously saving whatever they can. (And let me be clear: Flickr.com is thriving! But it’s better to design for a worst-case scenario than to find ourselves scrambling for a solution with no time to spare.)

We are working towards our Flickr Commons members (and other interested institutions) being able to accept Data Lifeboats as archival materials. For this to succeed, “dock” institutions will need to:

  • Be able to use it, and have the technology to accept it
  • Already have a view on collecting born digital photography, and ideally this type of media is included in their collection development strategy. (This is probably more important.)

This isn’t just a technology problem. It’s a problem made of everything else the technology is made of: people who work in cultural heritage institutions, their policies, organizational strategies, legal obligations, funding, commitment to maintenance, the willing consent of people who post their photos to online platforms and lots more.

To preserve born digital photos from the web requires the enthusiastic backing of institutions—which are fundamentally social creatures—to do what they’re designed to do, which is to save and ensure access to the raw material of future research.

Collecting social photography

I’ve been doing some background research to inform the early stages of Data Lifeboat development. I came across the 2020 Collecting Social Photography (CoSoPho) research project, which set out to understand how photography is used in social media in order to be able to develop methods for collection and transmission to future generations. Their report, ‘Connect to Collect: approaches to collecting social digital photography in museums and archives’, is freely available as PDF.

The project collaborators were:

  • The Nordic Museum / Nordiska Museet
  • Stockholm County Museum / Stockholms Läns Museum
  • Aalborg City Archives / Aalborg Stadsarkiv
  • The Finnish Museum of Photography / Finland’s Fotografiska Museum
  • Department of Social Anthropology, Stockholm University

The CoSoPho project was a response to the current state of digital social photography and its collection/acquisition – or lack thereof – by museums and archives.

Implicit to the team’s research is that digital photography from online platforms is worth collecting. Three big questions were centered in their research:

  1. How can data collection policies and practices be adapted to create relevant and accessible collections of social digital photography?
  2. How can digital archives, collection databases and interfaces be relevantly adapted – considering the character of the social digital photograph and digital context – to serve different stakeholders and end users?
  3. How can museums and archives change their role when collecting and disseminating, to increase user influence in the whole life circle of the vernacular photographic cultural heritage?

There’s a lot in this report that is relevant to the Data Lifeboat project. The team’s research focussed on ‘digital social photography’, taken to mean any born digital photos that are taken for the purpose of sharing on social media. It interrogates Flickr alongside Snapchat, Facebook, Instagram, as well as region-specific social media sites like IRC-Galleria (a very early 2000s Finnish social media platform).

I would consider Flickr a bit different to the other apps mentioned, only because it doesn’t address the other Flickr-specific use cases such as:

  • Showcasing photography as craft
  • Using Flickr as a public photo repository or image library where photos can be downloaded and re-used outside of Flickr, unlike walled garden apps like Instagram or Snapchat.

The ‘massification’ of images

The CoSoPho project highlighted the challenges of collecting digital photos of today while simultaneously digitizing analog images from the past, the latter of which cultural heritage institutions have been actively doing for many years. Anna Dahlgren describes this as a “‘massification’ of images online”. The complexities of digital social photos, with their continually changing and growing dynamic connections, combined with the unstoppable growth of social platforms, pose certain challenges for libraries, archives and museums to collect and preserve.

To collect digital photos requires a concerted effort to change the paradigm:

  • from static accumulation to dynamic connection
  • from hierarchical files to interlinked files
  • and from pre-selected quantities of documents to aggregation of unpredictably variable image and data objects.

Dahlgren argues that “…in order to collect and preserve digital cultural heritage, the infrastructure of memory institutions has to be decisively changed.”

The value of collecting and contributing

“Put bluntly, if images on Instagram, Facebook or any other open online platform should be collected by museums and archives what would the added value be? Or, put differently, if the images and texts appearing on these sites are already open and public, what is the role of the museum, or what is the added value of having the same contents and images available on a museum site?” (A. Dahlgren)

Those of us working in the cultural heritage sector can imagine many good responses to this question. At the Flickr Foundation, we look to our recent internet history and how many web platforms have been taken offline. Our digital lives are at risk of disappearing. Museums, libraries and archives have that long-term commitment to preservation. They are repositories of future knowledge, and expect to be there to provide access to it.

Cultural heritage institutions that choose to collect from social online spaces can forge a path for a multiplicity of voices within collections, moving beyond standardized metadata toward richer, more varied descriptions from the communities from which the photos are drawn. There is significant potential to collect in collaboration with the publics the institution serves. This is a great opportunity to design for a more inclusive ethics of care into collections.

But what about potential contributors whose photos are being considered for collection by institutions? What values might they apply to these collections?

CoSoPho uncovered useful insights about how people participating in community-driven collecting projects considered their own contributions. Contributors wanted to be selective about which of their photos would make it into a collection; this could be for aesthetic reasons (choosing the best, most representative photos) or concerns for their own or others’ anonymity. Explicit consent to include one’s photos in a future archive was a common theme – and one which we’re thinking deeply about.

Overall, people responded positively to the idea of cultural institutions collecting digital social photos – they too can be part of history!— and also think it’s important that the community from which those photos are drawn have a say in what is collected and how it’s made available. Future user researchers at Flickr Foundation might want to explore contributor sentiment even further.

What’s this got to do with Data Lifeboats?

As an intermediary between billions of Flickr photos and cultural heritage institutions, we need to create the possibilities for long-term preservation of this rich vein of digital history. These considerations will help us to design a system that works for Flickr members and museums and archives.

Adapting collection development practices

All signs point to cultural heritage institutions needing to prepare to take on born digital items. Many are already doing this as part of their acquisition strategies, but most often this born digital material comes entangled in a larger archival collection.

If institutions aren’t ready to proactively collect born digital material from the public web, this is a risk to the longevity of this type of knowledge. And if this isn’t a problem that currently matters to institutions, how can we convince them to save Flickr photos?

As we move into the next phase of the Data Lifeboat project, we want to find out:

  • Are Flickr Commons member institutions already collecting, or considering collecting, born digital material?
  • What kinds of barriers do they face?

Enabling consent and self-determination

CoSoPho’s research surfaced the critical importance of consent, ownership and self-determination in determining how public users/contributors engage with their role in creating a new digital archive.

  • How do we address issues of consent when preserving photos that belong to creators?
  • How do we create a system that allows living contributors to have a say in what is preserved, and how it’s presented?
  • How do we design a system that enables the informed collection of a living archive?
    Is there a form of donor agreement or an opt-in to encourage this ethics of care?

Getting choosy

With 50 billion Flickr photos, not all of them visible to the public or openly licensed, we are working from the assumption that the Data Lifeboat needs to enable selective collecting.

  • Are there acquisition practices and policies within Flickr Commons institutions that can inform how we enable users to choose what goes into a Data Lifeboat?
  • What policies for protecting data subjects in collections need to be observed?
  • Are there existing paradigms for public engagement for proactive, social collecting that the Data Lifeboat technology can enable?

Co-designing usable software

Cultural heritage institutions have massively complex technical environments with a wide variety of collection management systems, digital asset management systems and more. This complexity often means that institutions miss out on chances to integrate community-created content into their collections.

The CoSoPho research team developed a prototype for collecting digital social photography. That work was attempting to address some of these significant tech challenges, which Flickr Foundation is already considering:

  • Individual institutions need reliable, modern software that interfaces with their internal systems; few institutions have internal engineering capacity to design, build and maintain their own custom software
  • Current collection management systems don’t have a lot of room for community-driven metadata; this information is often wedged in to local data fields
  • Collection management systems lack the ability to synchronize data with social media platforms (and vice versa) if the data changes. That makes it more difficult to use third-party platforms for community description and collecting projects.

So there’s a huge opportunity for the Flickr Foundation to contribute software that works with this complexity to solve real challenges for institutions. Co-design–that is, a design process that draws on your professional expertise and institutional realities–is the way forward!

We need you!

We are working on the challenge of keeping Flickr photos visible for 100 years and we believe it’s essential that cultural heritage institutions are involved. Therefore, we want to make sure we’re building something that works for as many organizations as possible – both big and small – no matter where you are in your plans to collect born digital content from the web.

If you’re part of the Flickr Commons network already, we are planning two co-design workshops for Autumn 2024, one to be held in the US and the other likely to be in London. Keep your eyes peeled for Save-the-Date invitations, or let us know you’re interested, and we’ll be sure to keep you in the loop directly.

This work is supported by the National Endowment for the Humanities.

NEH logo

Research diary: long-term thinking and lots of reading

New Research Fellow Jenn Phillips-Bacher shares what she’s been working on at the Flickr Foundation

It’s hard to believe that it’s already been two months since I joined the Flickr Foundation as a Research Fellow. Now that I’m settled in at HQ, I’m ready to share what I’ve been working on.

My starting point for this fellowship was to explore the long-term implications of digital collections access. I wanted to spend some time on the idea of tending to an ‘end of life’ for a collection – whether that’s through intentional institutional policies like digital weeding, or catastrophic loss through climate change. 

One of the first pieces I read was Dr. Temi Odumosu’s article The Crying Child: On Colonial Archives, Digitization, and Ethics of Care in the Cultural Commons, where she writes:

“…the opportunities for intervening both in back-end collections practices and web user experience, which insists on a more conscientious data flow around the commons, feels like something approximating practical ethics.” 

The phrase conscientious data flow has become a generative force for my research so far—I might as well have it tattooed on my arm. It’s made me think about the whole lifecycle of a digital object: how a photo or other object is selected for digitization and public access, what happens to it when people view and interact with it, and what traces it leaves behind.

In focussing on the lifecycle of an object, my reading has coalesced around three main areas:

  • Ethics of care throughout the life of a digital object
  • Responsible data stewardship and radical transparency
  • Climate impacts of unconstrained digital collections

Alongside these themes, I’m also getting more familiar with AI (no, really, what have I missed?), the decentralized web and the indieweb, Personal Knowledge Management systems, and generally how to be a good, care-full citizen of the Web. 

Here are some highlights:

Ethics of care

Following on from Dr. Odumosu’s work, I delved into the brilliant work of The Shift Collective who work directly with small community-based archives and memory workers to explore the cultural, financial and technological systems in which they operate. Their extensive research demonstrates how those systems must change to enable autonomy, equity and sustainability for the communities they serve. 

I’ve also been digging into the CARE Principles for Indigenous Data Governance, principles that set out how, as researchers or institutions working with Indigenous or marginalized communities, we can put autonomy back into the hands of those whose data (or content, objects or cultural heritage) is in the public realm. 

These principles are crucial grounding for the Flickr Foundation. We need to be aware of potential imbalances of power as a non-profit tech company that builds software for Flickr Commons and its preservation. To embody the CARE Principles in Flickr Foundation’s work means to design interventions that allow community control over their digital heritage and its preservation. 

Responsible data stewardship 

Flickr Foundation’s mission to keep Flickr photos visible for 100 years implies that we need new mechanics to move content around the web (whatever the web looks like in 20, 50, 75 years) and keep it somewhere where people can find it. What contextual information needs to travel with the Flickr photos to enable future generations to use them? And are we at that point even now, given that there are Flickr APIs that allow programmatic access to the Flickr corpus? What documentation is needed to support the ethical use of any slice of Flickr’s content? 

My introduction to this topic was the hot-off-the-press Datasheets for Digital Cultural Heritage (October 2023) which proposes a standardized template for cultural institutions to collaboratively document their open data sets derived from the digitization process. I’ve been working my way back through the history of the datasheet as a method of transparency, looking at the work of Emily Bender, Timnit Gebru, Mahima Pushkarna and other significant researchers in academia and industry, and following it through to current uses by Hugging Face and the Smithsonian

I’ve been working on mocking up a datasheet specifically for Flickr Commons. I’ll be ready to make this available for feedback from the Flickr Commons community in January 2024. 

Climate impact of digital collections

Though perhaps not as directly connected to the work of the Flickr Foundation, I’m keen to find out what the GLAM sector is doing to understand and plan for the long-term preservation and short-term access to its digital collections in light of the climate crisis. My sense is that culture workers are in the early days of considering the carbon costs of digital activities. And while climate change is a systemic issue that must be addressed through global cooperation, government policy and regulation, every one of us will need to make changes in the future.

If every job is a climate job, what does that mean for people working in cultural heritage? Will energy considerations make their way into collection development and retention policies, for example? 

And that’s not all

I’m fortunate to be based in the London office with 2/3 of the permanent team. Being part of Flickr Foundation HQ gives me a well-rounded picture of the breadth of its activities, and gives me a chance to work on software projects. I’ve chipped in on some user interface design ideation, helped to test Flickypedia before its launch, and started working up some design ideas for a Flickr to IIIF toy to help Flickr Commons members make their Flickr photos interoperable with alternative platforms.

If you’re interested in following along with what I’m reading, I’m keeping a list on my Pinboard.

And if there’s something you think is a must-read, send it my way!

Image credit: Bedbril / Glasses for reading in bed. Nationaal Archief / Flickr Commons.

Welcome, Jenn!

Meet the Foundation’s first ever Research Fellow!

It is with great pleasure that I introduce you to the Flickr Foundation’s inaugural research fellow, Jenn. In her own words…

Hi I’m Jenn Phillips-Bacher, the Flickr Foundation’s first-ever Research Fellow. I’ve been a Flickr user since 2007 when my first public photos were taken on a point-and-shoot digital camera. Oh, how the quality of photos have improved since then! It’s an absolute marvel to be able to trawl decades worth of (ever-improving) photography, still, in one place.

Before joining Flickr Foundation, I was most recently a Product Manager at Wellcome Collection, working to make its library and archive collections accessible to as many people as possible. I’ve also recently been a content strategist at the UK’s Government Digital Service where I focussed on tagging and taxonomies to help people find stuff. I’ve also been a web editor, project manager, reference librarian and technology trainer, all within the GLAM (that’s galleries, libraries, archives and museums) world.

My modus operandi for the 20+ years of my career has been to 1) find interesting work to do with kind people and 2) labor for the public good. That’s why I am delighted and honored to be part of Flickr Foundation’s efforts to preserve and sustain our digital heritage.

So what does it mean to be a research fellow?

Given my career history, I’d never considered that I could be a Research Fellow. I used to think research fellowships were reserved for academics (“real” researchers), which I resolutely am not. I’m still figuring out what it does mean to be a research fellow, but here’s where I’ve settled for now: a research fellowship allows me to take time out of normal life for learning and thinking while offering a practical benefit to the Flickr Foundation. That means I’ll use my research skills honed as a librarian and product manager to seek out existing knowledge and expertise, connecting the dots along the way, in order to help shape the Flickr Foundation’s work.

As the fellowship progresses, I’ll write more about what it’s like to move from a digital practitioner role into a Research Fellow role.

My research focus

My research is aimed at the Content Mobility program where I’m specifically interested in how we might design a Data Lifeboat. Not only the logistics of creating a portable archive of any facet of Flickr, but also how to plan for a digital collection’s ‘good ending’. I’ve always been interested in the idea of digital weeding—removing digital collections that no longer serve their purpose, as librarians do with physical materials. As we become more aware of the environmental impact of any digital activity, including online access and long-term preservation, we need to be even more intentional with what we save and what we let go.

As a complementary bit of research, I’ll be digging into the carbon costs of digital collections. I’m curious to see whether there’s something useful to do here that would help the GLAM sector make carbon-conscious digital collection decisions. (If you or anyone you know is already doing this work, I’d love to meet you/them!)

What else? When not working, I can be found nosing around galleries and museums and perambulating around cities in search of human-friendly architecture and good cafes. And like anyone who’s ever lived in Chicago, I have Opinions on hot dogs.

Superdawg drive-in

Photo by jordanfischer, CC BY 2.0.