The architecture of a Data Lifeboat service

We’re starting to write code for our Data Lifeboat, and that’s pushed us to decide what the technical architecture looks like. What are the different systems and pieces involved in creating a Data Lifeboat? In this article I’m going to outline what we imagine that might look like.

We’re still very early in the prototyping stage of this work. Our next step is going to be building an end-to-end prototype of this design, and seeing how well it works.

Here’s the diagram we drew on the whiteboard last week:

Let’s step through it in detail.

First somebody has to initiate the creation of a Data Lifeboat, and choose the photos they want to include. There could be a number of ways to start this process: a command-line tool, a graphical web app, a REST API.

We’re starting to think about what those interfaces will look like, and how they’ll work. When somebody creates a Data Lifeboat, we need more information than just a list of photos. We know we’re going to need things like legal agreements, permission statements, and a description of why the Lifeboat was created. All this information needs to be collected at this stage.

However these interfaces work, it all ends in the same way: with a request to create a Data Lifeboat for a list of photos and their metadata from Flickr.

To take a list of photos and create a Data Lifeboat, we’ll have a new Data Lifeboat Creator service. This will call the Flickr API to fetch all the data from Flickr.com, and package it up into a new file. This could take a long time, because we need to make a lot of API calls! (Minutes, if not hours.)

We already have the skeleton of this service in the Commons Explorer, and we expect to reuse that code for the Data Lifeboat.

We are also considering creating an index of all the Data Lifeboats we’ve created – for example, “Photo X was added to Data Lifeboat Y on date Z”. This would be a useful tool for people wanting to look up Flickr URLs if the site ever goes away. “I have a reference to photo X, where did that end up after Flickr?”

When all the API calls are done, this service will eventually produce a complete, standalone Data Lifeboat which is ready to be stored!

When we create the Data Lifeboat, we’re imagining we’ll keep it on some temporary storage owned by the Flickr Foundation. Once the packaging is complete, the person or organization who requested it can download it to their permanent storage. Then it becomes their responsibility to make sure it’s kept safely – for example, creating backups or storing it in multiple geographic locations.

The Flickr Foundation isn’t going to run a single, permanent store of all Data Lifeboats ever created. That would turn us into another Single Point of Failure, which is something we’re keen to avoid!

There are still lots of details to hammer out at every step of this process, but thinking about the broad shape of the Data Lifeboat service has already been useful. It’s helped us get a consistent understanding of what the steps are, and exposed more questions for us to ponder as we keep building.

Data Lifeboat Update 3

March has been productive. The short version is it’s complicated but we’re exploring happily, and adjusting the scope in small ways to help simplify it. Let me summarise the main things we did this month.

Legal workshop

We welcomed two of our advisors—Neil from the Bodleian and Andrea from GLAM e-Lab—to our HQ to get into the nitty gritty of what a 50-year-old Data Lifeboat needs to accommodate. 

As we began the conversation, I centred us in the C.A.R.E. Principles and asked that we always keep them in our sights for this work. The main future challenges are settling around the questions of how identity and the right to be forgotten must be expressed, how Flickr account holders can or should be identified, and whether an external name resolver service of some kind could help us. We think we should develop policies for Flickr members (on consent to be in a Data Lifeboat), Data Lifeboat creators (on their obligations as creators), and Dock Operators (an operations manual & obligations for operating a dock). It’s possible there will also be some challenges ahead around database rights, but we don’t know enough yet to give a good update. We’d like a first-take legal framework of the Data Lifeboat system to be an outcome of these first six months.

Privacy & licensing

These are key concepts central to Flickr—privacy and licensing—and we must make sure we do our utmost to respect them in all our work. It would be irresponsible for us to jettison the desires encoded in those settings for our convenience, tempting though that may be. By that I mean, it would be easier for us to make Data Lifeboats that contained whatever photos from whomever, but we must respect the desires of Flickr creators in the creation process. 

There are still big and unanswered questions about consent, and how we get millions of Flickr members to agree to participate and give permission to allow their pictures to be put in other people’s Data Lifeboats. 

Extending the prototype Data Lifeboat sets 

Initially, we had planned to run this 6-month prototype stage with just one test set of images, which would be some or all of the Flickr Commons photographs. But in order to explore the challenges around privacy and licensing, we’ve decided to expand our set of working prototypes to also include the entire Library of Congress Flickr Commons account, and all the photos tagged with “flickrhq” (since that set is something the Flickr Foundation may decide to collect for its own archive and contains photographs from different Flickr members who also happen to have been Flickr staff and would therefore (theoretically) be more sympathetic to the consent question).

Visit to Greenwich

Ewa spotted that there was an exhibition of ambrotype photographic portraits of women in the RNLI at the Maritime Museum in Greenwich at the moment, so we decided to take a day trip to see the portraits and poke around the brilliant museum. We ended up taking a boat from Greenwich to Battersea which was a nice way to experience the Thames (and check out that boat’s life saving capabilities).

Day Out: Maritime Museum & Lifeboats

Day Out: Maritime Museum & Lifeboats

The Data Lifeboat creation process

I found myself needing to start sketching out what it could look like to actually create a Data Lifeboat, and particularly not via a command line, so we spent a while in front of a whiteboard kicking that off. 

At this point, we’re imagining a few key steps:

  1. The Query – “I want these photos” – is like a search. We could borrow from our existing Flinumeratr toy.
  2. The Results – Show the images, some metadata. But it’s hard to show information about the set in aggregate at this stage, e.g., how many of the contents are licensed in which way. This could form a manifest for the Data Lifeboat..
  3. Agreement – We think there’s a need for the Data Lifeboat creator to agree to certain terms. Simple, active language that echoes the CARE principles, API ToS, and Flickr Community Guidelines. We think this should also be included in the Data Lifeboat it’s connected with.
  4. README / Note to the Future – we love the idea that the Data Lifeboat creator could add a descriptive narrative at this point, about why they are making this lifeboat, and for whom, but we recognised that this may not get done at all, especially if it’s too complicated or time-consuming. This is also a good spot to describe or configure warnings, timers, or other conditions needed for future access. Thanks also to two of our other advisors – Commons members Mary Grace and Alan – who shared with us their organisation’s policies on acquisitions for reference.
  5. Packaging – This would be asynchronous and invisible to the creator; downloading everything in the background. We realised it could take days, especially if there are lots of Data Lifeboats being made at once.
  6. Ready! – The Data Lifeboat creator gets a note somehow about the Data Lifeboat being ready for download. We may need to consider keeping it available only for a short time(?).

Creation Schematic, 19th March

Emergency v Non-Emergency 

We keep coming up against this… 

The original concept of the Data Lifeboat is a response to the near-death experience that Flickr had in 2017 when its then-owner, Verizon/Yahoo, almost decided to vaporise it because they deemed it too expensive to sell (something known as “the cost of economic divestment”). So, in the event of that kind of emergency, we’d want to try to save as much of this unique collection as possible as quickly as possible, so we’d need a million lifeboats full of pictures created more or less simultaneously or certainly in a relatively short period of time. 

In the early days of this work, Alex said that the pressure of this kind of emergency would be the equivalent of being “hugged to death by the archivists,” as we all try— in very caring and responsible ways—to save as much as we can. And then there’s the bazillion-emergency-hits-to-the-API-connection problem—aka the “Thundering Herd” problem—which we do not yet have a solution for, and which is very likely to affect any other social media platforms that may also be curious to explore this concept.

We’re connecting with the Flickr.com team to start discussing how to address this challenge. We’re beginning to think about how emergency selection might work, as well as the present, and future, challenges of establishing the identity of photo subjects and account owners. The millions of lifeboats that would be created would surely need the support of the company to launch if they’re ever needed.

Data Lifeboat: Deeper research into the challenge of archiving social media objects

By Jenn Phillips-Bacher

For all of us at Flickr Foundation, the idea of Flickr as an archive in waiting inspires our core purpose. We believe the billions of photos that have amassed on Flickr in the last 20 years have potential to be the material of future historical research. With so much of our everyday lives being captured digitally and posted to public platforms, we – both the Flickr Foundation and the wider cultural heritage community – have begun figuring out how to proactively gather, make available, and preserve digital images and their metadata for the long term.

In this blog post, I’m setting my sights beyond technology to consider the institutional and social aspects that enable the collection of digital photography from online platforms.

It’s made of people

Our Data Lifeboat project is now underway. Its goal is to build a mechanism to make it possible to assemble and decentralize slivers of Flickr photos for potential future users. (You can read project update 1 and project update 2 for the background). The outcome of the first project phase will be one or more prototypes we will show to our Flickr Commons partners for feedback. We’re already looking ahead to the second phase where we will work with cultural heritage institutions within the wider Flickr Commons network to make sure that anything we put into production best suits cultural heritage institutions’ real-world needs.

We’ve been considering multiple possible use cases for creating, and importantly, docking a Data Lifeboat in a safe place. The two primary institutional use cases we see are:

  1. Cultural heritage institutions want to proactively collect born digital photography on topics relevant to their collections
  2. In an emergency situation, cultural heritage institutions (and maybe other Flickr members) want to save what they can from a sinking online platform – either photos they’ve uploaded or generously saving whatever they can. (And let me be clear: Flickr.com is thriving! But it’s better to design for a worst-case scenario than to find ourselves scrambling for a solution with no time to spare.)

We are working towards our Flickr Commons members (and other interested institutions) being able to accept Data Lifeboats as archival materials. For this to succeed, “dock” institutions will need to:

  • Be able to use it, and have the technology to accept it
  • Already have a view on collecting born digital photography, and ideally this type of media is included in their collection development strategy. (This is probably more important.)

This isn’t just a technology problem. It’s a problem made of everything else the technology is made of: people who work in cultural heritage institutions, their policies, organizational strategies, legal obligations, funding, commitment to maintenance, the willing consent of people who post their photos to online platforms and lots more.

To preserve born digital photos from the web requires the enthusiastic backing of institutions—which are fundamentally social creatures—to do what they’re designed to do, which is to save and ensure access to the raw material of future research.

Collecting social photography

I’ve been doing some background research to inform the early stages of Data Lifeboat development. I came across the 2020 Collecting Social Photography (CoSoPho) research project, which set out to understand how photography is used in social media in order to be able to develop methods for collection and transmission to future generations. Their report, ‘Connect to Collect: approaches to collecting social digital photography in museums and archives’, is freely available as PDF.

The project collaborators were:

  • The Nordic Museum / Nordiska Museet
  • Stockholm County Museum / Stockholms Läns Museum
  • Aalborg City Archives / Aalborg Stadsarkiv
  • The Finnish Museum of Photography / Finland’s Fotografiska Museum
  • Department of Social Anthropology, Stockholm University

The CoSoPho project was a response to the current state of digital social photography and its collection/acquisition – or lack thereof – by museums and archives.

Implicit to the team’s research is that digital photography from online platforms is worth collecting. Three big questions were centered in their research:

  1. How can data collection policies and practices be adapted to create relevant and accessible collections of social digital photography?
  2. How can digital archives, collection databases and interfaces be relevantly adapted – considering the character of the social digital photograph and digital context – to serve different stakeholders and end users?
  3. How can museums and archives change their role when collecting and disseminating, to increase user influence in the whole life circle of the vernacular photographic cultural heritage?

There’s a lot in this report that is relevant to the Data Lifeboat project. The team’s research focussed on ‘digital social photography’, taken to mean any born digital photos that are taken for the purpose of sharing on social media. It interrogates Flickr alongside Snapchat, Facebook, Instagram, as well as region-specific social media sites like IRC-Galleria (a very early 2000s Finnish social media platform).

I would consider Flickr a bit different to the other apps mentioned, only because it doesn’t address the other Flickr-specific use cases such as:

  • Showcasing photography as craft
  • Using Flickr as a public photo repository or image library where photos can be downloaded and re-used outside of Flickr, unlike walled garden apps like Instagram or Snapchat.

The ‘massification’ of images

The CoSoPho project highlighted the challenges of collecting digital photos of today while simultaneously digitizing analog images from the past, the latter of which cultural heritage institutions have been actively doing for many years. Anna Dahlgren describes this as a “‘massification’ of images online”. The complexities of digital social photos, with their continually changing and growing dynamic connections, combined with the unstoppable growth of social platforms, pose certain challenges for libraries, archives and museums to collect and preserve.

To collect digital photos requires a concerted effort to change the paradigm:

  • from static accumulation to dynamic connection
  • from hierarchical files to interlinked files
  • and from pre-selected quantities of documents to aggregation of unpredictably variable image and data objects.

Dahlgren argues that “…in order to collect and preserve digital cultural heritage, the infrastructure of memory institutions has to be decisively changed.”

The value of collecting and contributing

“Put bluntly, if images on Instagram, Facebook or any other open online platform should be collected by museums and archives what would the added value be? Or, put differently, if the images and texts appearing on these sites are already open and public, what is the role of the museum, or what is the added value of having the same contents and images available on a museum site?” (A. Dahlgren)

Those of us working in the cultural heritage sector can imagine many good responses to this question. At the Flickr Foundation, we look to our recent internet history and how many web platforms have been taken offline. Our digital lives are at risk of disappearing. Museums, libraries and archives have that long-term commitment to preservation. They are repositories of future knowledge, and expect to be there to provide access to it.

Cultural heritage institutions that choose to collect from social online spaces can forge a path for a multiplicity of voices within collections, moving beyond standardized metadata toward richer, more varied descriptions from the communities from which the photos are drawn. There is significant potential to collect in collaboration with the publics the institution serves. This is a great opportunity to design for a more inclusive ethics of care into collections.

But what about potential contributors whose photos are being considered for collection by institutions? What values might they apply to these collections?

CoSoPho uncovered useful insights about how people participating in community-driven collecting projects considered their own contributions. Contributors wanted to be selective about which of their photos would make it into a collection; this could be for aesthetic reasons (choosing the best, most representative photos) or concerns for their own or others’ anonymity. Explicit consent to include one’s photos in a future archive was a common theme – and one which we’re thinking deeply about.

Overall, people responded positively to the idea of cultural institutions collecting digital social photos – they too can be part of history!— and also think it’s important that the community from which those photos are drawn have a say in what is collected and how it’s made available. Future user researchers at Flickr Foundation might want to explore contributor sentiment even further.

What’s this got to do with Data Lifeboats?

As an intermediary between billions of Flickr photos and cultural heritage institutions, we need to create the possibilities for long-term preservation of this rich vein of digital history. These considerations will help us to design a system that works for Flickr members and museums and archives.

Adapting collection development practices

All signs point to cultural heritage institutions needing to prepare to take on born digital items. Many are already doing this as part of their acquisition strategies, but most often this born digital material comes entangled in a larger archival collection.

If institutions aren’t ready to proactively collect born digital material from the public web, this is a risk to the longevity of this type of knowledge. And if this isn’t a problem that currently matters to institutions, how can we convince them to save Flickr photos?

As we move into the next phase of the Data Lifeboat project, we want to find out:

  • Are Flickr Commons member institutions already collecting, or considering collecting, born digital material?
  • What kinds of barriers do they face?

Enabling consent and self-determination

CoSoPho’s research surfaced the critical importance of consent, ownership and self-determination in determining how public users/contributors engage with their role in creating a new digital archive.

  • How do we address issues of consent when preserving photos that belong to creators?
  • How do we create a system that allows living contributors to have a say in what is preserved, and how it’s presented?
  • How do we design a system that enables the informed collection of a living archive?
    Is there a form of donor agreement or an opt-in to encourage this ethics of care?

Getting choosy

With 50 billion Flickr photos, not all of them visible to the public or openly licensed, we are working from the assumption that the Data Lifeboat needs to enable selective collecting.

  • Are there acquisition practices and policies within Flickr Commons institutions that can inform how we enable users to choose what goes into a Data Lifeboat?
  • What policies for protecting data subjects in collections need to be observed?
  • Are there existing paradigms for public engagement for proactive, social collecting that the Data Lifeboat technology can enable?

Co-designing usable software

Cultural heritage institutions have massively complex technical environments with a wide variety of collection management systems, digital asset management systems and more. This complexity often means that institutions miss out on chances to integrate community-created content into their collections.

The CoSoPho research team developed a prototype for collecting digital social photography. That work was attempting to address some of these significant tech challenges, which Flickr Foundation is already considering:

  • Individual institutions need reliable, modern software that interfaces with their internal systems; few institutions have internal engineering capacity to design, build and maintain their own custom software
  • Current collection management systems don’t have a lot of room for community-driven metadata; this information is often wedged in to local data fields
  • Collection management systems lack the ability to synchronize data with social media platforms (and vice versa) if the data changes. That makes it more difficult to use third-party platforms for community description and collecting projects.

So there’s a huge opportunity for the Flickr Foundation to contribute software that works with this complexity to solve real challenges for institutions. Co-design–that is, a design process that draws on your professional expertise and institutional realities–is the way forward!

We need you!

We are working on the challenge of keeping Flickr photos visible for 100 years and we believe it’s essential that cultural heritage institutions are involved. Therefore, we want to make sure we’re building something that works for as many organizations as possible – both big and small – no matter where you are in your plans to collect born digital content from the web.

If you’re part of the Flickr Commons network already, we are planning two co-design workshops for Autumn 2024, one to be held in the US and the other likely to be in London. Keep your eyes peeled for Save-the-Date invitations, or let us know you’re interested, and we’ll be sure to keep you in the loop directly.

Data Lifeboat Update 2: More questions than answers

By Ewa Spohn

Thanks to the Digital Humanities Advancement Grant we were awarded by the National Endowment for the Humanities, our Data Lifeboat project (which is part of the Content Mobility Program) is now well and truly underway. The Data Lifeboat is our response to the challenge of archiving the 50 billion or so images currently on Flickr, should the service go down. It’s simply too big to archive as a whole, and we think that these shared histories should be available for the long term, so we’re exploring a decentralized approach. Find out more about the context for this work in our first blog post.

So, after our kick-off last month, we were left with a long list of open questions. That list became longer thanks to our first all-hands meeting that took place shortly afterwards! It grew again once we had met with the project user group – staff from the British Library, San Diego Air & Space Museum, and Congregation of Sisters of St Joseph – a small group representing the diversity of Flickr Commons members. Rather than being overwhelmed, we were buoyed by the obvious enthusiasm and encouragement across the group, all of whom agreed that this is very much an idea worth pursuing. 

As Mia Ridge from the British Library put it; “we need ephemeral collections to tell the story of now and give people who don’t currently think they have a role in preservation a different way of thinking about it”. And from Mary Grace of the Congregation of Sisters of St. Joseph in Canada, “we [the smaller institutions] don’t want to be the 3rd class passengers who drown first”. 

Software sketching

We’ve begun working on the software approach to create a Data Lifeboat, focussing on the data model and assessing existing protocols we may use to help package it. Alex and George started creating some small prototypes to test how we should include metadata, and have begun exploring what “social metadata” could be like – that’s the kind of metadata that can only be created on Flickr, and is therefore a required element in any Data Lifeboat (as you’ll see from the diagram below, it’s complex). 


Feb 2024: An early sketch of a Data Lifeboat’s metadata graph structure.

Thanks to our first set of tools, Flinumeratr and Flickypedia, we have robust, reusable code for getting photos and metadata from Flickr. We’ve done some experiments with JSON, XML, and METS as possible ways to store the metadata, and started to imagine what a small viewer that would be included in each Data Lifeboat might be like. 

Complexity of long-term licensing

Alongside the technical development we have started developing our understanding of the legal issues that a Data Lifeboat is going to have to navigate to avoid unintended consequences of long-term preservation colliding with licenses set in the present. We discussed how we could build care and informed participation into the infrastructure, and what the pitfalls might be. There are fiddly questions around creating a Data Lifeboat containing photos from other Flickr members. 

  • As the image creator, would you need to be notified if one of your images has been added to a Data Lifeboat? 
  • Conversely, how would you go about removing an image from a Data Lifeboat? 
  • What happens if there’s a copyright dispute regarding images in a Data Lifeboat that is docked somewhere else? 

We discussed which aspects of other legal and licensing models might apply to Data Lifeboats, given the need to maintain stewardship and access over the long term (100 years at least!), as well as the need for the software to remain usable over this kind of time horizon. This isn’t something that the world of software has ready answers for. 

  • Could Flickr.org offer this kind of service? 
  • How would we notify future users of the conditions of the license, let alone monitor the decay of licenses in existing Data Lifeboats over this kind of timescale? 

So many standards to choose from

We had planned to do a deep dive into the various digital asset management systems used by cultural institutions, but this turned out to be a trickier subject than we thought as there are simply too many approaches, tools, and cobbled-together hacks being used in cultural institutions. Everyone seems to be struggling with this, so it’s not clear (yet) how best to approach this. If you have any ideas, let us know!

This work is supported by the National Endowment for the Humanities.

Introducing Flickypedia, our first tool

Building a new bridge between Flickr and Wikimedia Commons

For the past four months, we’ve been working with the Culture & Heritage team at the Wikimedia Foundation — the non-profit that operates Wikipedia, Wikimedia Commons, and other Wikimedia free knowledge projects — to build Flickypedia, a new tool for bridging the gap between photos on Flickr and files on Wikimedia Commons. Wikimedia Commons is a free-to-use library of illustrations, photos, drawings, videos, and music. By contributing their photos to Wikimedia Commons, Flickr photographers help to illustrate Wikipedia, a free, collaborative encyclopedia written in over 300 languages. More than 1.7 billion unique devices visit Wikimedia projects every month.

We demoed the initial version at GLAM Wiki 2023 in Uruguay, and now that we’ve incorporated some useful feedback from the Wikimedia community, we’re ready to launch it. Flickypedia is now available at https://www.flickr.org/tools/flickypedia/, and we’re really pleased with the result. Our goal was to create higher quality records on Wikimedia Commons, with better connected data and descriptive information, and to make it easier for Flickr photographers to see how their photos are being used.

This project has achieved our original goals – and a couple of new ones we discovered along the way.

So what is Flickypedia?

An easy way to copy photos from Flickr to Wikimedia Commons

The original vision of Flickypedia was a new tool for copying photos from Flickr to Wikimedia Commons, a re-envisioning of the popular Flickr2Commons tool, which copied around 5.4M photos.

This new upload tool is what we built first, leveraging ideas from Flinumeratr, a toy we built for exploring Flickr photos. You start by entering a Flickr URL:

And then Flickypedia will find all photos at that URL, and show you the ones which are suitable for copying to Wikimedia Commons. You can choose which photos you want to upload:

Then you enter a title, a short description, and any categories you want to add to the photo(s):

Then you click “Upload”, and the photo(s) are copied to Wikimedia Commons. Once it’s done, you can leave a comment on the original Flickr photo, so the photographer can see the photo in its new home:

As well as the title and caption written by the uploader, we automatically populate a series of machine-readable metadata fields (“Structured Data on Commons” or “SDC”) based on the Flickr information – the original photographer, date taken, a link to the original, and so on. You can see the exact list of fields in our data modeling document. This should make it easier for Commons users to find the photos they need, and maintain the link to the original photo on Flickr.

This flow has a little more friction than some other Flickr uploading tools, which is by design. We want to enable high-quality descriptions and metadata for carefully selected photos; not just bulk copying for the sake of copying. Our goal is to get high quality photos on Wikimedia Commons, with rich metadata which enables them to be discovered and used – and that’s what Flickypedia enables.

Reducing risk and responsible licensing

Flickr photographers can choose from a variety of licenses, and only some of them can be used on Wikimedia Commons: CC0, Public Domain, CC BY and CC BY-SA. If it’s any other license, the photo shouldn’t be on Wikimedia Commons, according to its licensing policy.

As we were building the Flickypedia uploader, we took the opportunity to emphasize the need for responsible licensing – when you select your photographs, it checks the licenses, and doesn’t allow you to copy anything that doesn’t have a Commons-compatible license:

This helps to reduce risk for everyone involved with Flickr and Wikimedia Commons.

Better duplicate detection

When we looked at the feedback on existing Flickr upload tools, there was one bit of overwhelming feedback: people want better duplicate detection. There are already over 11 million Flickr photos on Wikimedia Commons, and if a photo has already been copied, it doesn’t need to be copied again.

Wikimedia Commons already has some duplicate detection. It’ll spot if you upload a byte-for-byte identical file, but it can’t detect duplicates if the photo has been subtly altered – say, converted to a different file format, or a small border cropped out.

It turns out that there’s no easy way to find out if a given Flickr photo is in Wikimedia Commons. Although most Flickr upload tools will embed that metadata somewhere, they’re not consistent about it. We found at least four ways to spot possible duplicates:

  • You could look for a Flickr URL in the structured data (the machine-readable metadata)
  • You could look for a Flickr URL in the Wikitext (the human-readable description)
  • You could look for a Flickr ID in the filename
  • Or Flickypedia could know that it had already uploaded the photo

And even looking for matching Flickr URLs can be difficult, because there are so many forms of Flickr URLs – here are just some of the varieties of Flickr URLs we found in the existing Wikimedia Commons data:

(And this is without some of the smaller variations, like trailing slashes and http/https.)

We’d already built a Flickr URL parser as part of Flinumeratr, so we were able to write code to recognise these URLs – but it’s a fairly complex component, and that only benefits Flickypedia. We wanted to make it easier for everyone.

So we did!

We proposed (and got accepted) a new Flickr Photo ID property. This is a new field in the machine-readable structured data, which can contain the numeric ID. This is a clean, unambiguous pointer to the original photo, and dramatically simplifies the process of looking for existing Flickr photos.

When Flickypedia uploads a new photo to Flickr, it adds this new property. This should make it easier for other tools to find Flickr photos uploaded with Flickypedia, and skip re-uploading them.

Backfillr Bot: Making Flickr metadata better for all Flickr photos on Commons

That’s great for new photos uploaded with Flickypedia – but what about photos uploaded with other tools, tools that don’t use this field? What about the 10M+ Flickr photos already on Wikimedia Commons? How do we find them?

To fix this problem, we created a new Wikimedia Commons bot: Flickypedia Backfillr Bot. It goes back and fills in structured data on Flickr photos on Commons, including the Flickr Photo ID property. It uses our URL parser to identify all the different forms of Flickr URLs.

This bot is still in a preliminary stage—waiting for approval from the Wikimedia Commons community—but once granted, we’ll be able to improve the metadata for every Flickr photo on Wikimedia Commons. And in addition, create a hook that other tools can use – either to fill in more metadata, or search for Flickr photos.

Sydney Harbour Bridge, from the Museums of History New South Wales. No known copyright restrictions.

Flickypedia started as a tool for copying photos from Flickr to Wikimedia Commons. From the very start, we had ideas about creating stronger links between the two – the “say thanks” feature, where uploaders could leave a comment for the original Flickr photographer – but that was only for new photos.

Along the way, we realized we could build a proper two-way bridge, and strengthen the connection between all Flickr photos on Wikimedia Commons, not just those uploaded with Flickypedia.

We think this ability to follow a photo around the web is really important – to see where it’s come from, and to see where it’s going. A Flickr photo isn’t just an image, it comes with a social context and history, and being uploaded to Wikimedia Commons is the next step in its journey. You can’t separate an image from its context.

As we start to focus on Data Lifeboat, we’ll spend even more time looking at how to preserve the history of a photo – and Flickypedia has given us plenty to think about.

If you want to use Flickypedia to upload some photos to Wikimedia Commons, visit www.flickr.org/tools/flickypedia.

If you want to look at the source code, go to github.com/Flickr-Foundation/flickypedia.

Introducing flinumeratr, our first toy

by Alex

Today we’re pleased to release Flinumeratr, our first toy. You enter a Flickr URL, and it shows you a list of photos that you’d see at that URL:

This is the first engineering step towards what we’ll be building for the rest of this quarter: Flickypedia, a new tool for copying Creative Commons-licensed photos from Flickr to Wikimedia Commons.

As part of Flickypedia, we want to make it easy to select photos from Flickr that are suitable for Wikimedia Commons. You enter a Flickr URL, and Flickypedia will work out what photos are available. This “Flickr URL enumerator”, or “Flinumeratr”, is a proof-of-concept of that idea. It knows how to recognise a variety of URL types, including individual photos, albums, galleries, and a member’s photostream.

We call it a “toy” quite deliberately – it’s a quick thing, not a full-featured app. Keeping it small means we can experiment, try things quickly, and learn a lot in a short amount of time. We’ll build more toys as we have more ideas. Some of those ideas will be reused in bigger projects, and others will be dropped.

Flinumeratr is a playground for an idea for Flickypedia, but it’s also been a context for starting to develop our approach to software development. We’ve been able to move quickly – this is only my fourth day! – but starting a brand new project is always the easy bit. Maintaining that pace is the hard part.

We’re all learning how to work together, I’m dusting off my knowledge of the Flickr API, and we’re establishing some basic coding practices. Things like a test suite, documentation, checks on pull requests, and other guard rails that will help us keep moving. Setting those up now will be much easier than trying to retrofit them later. There’s plenty more we have to decide, but we’re off to a good start.

Under the hood, Flinumeratr is a Python web app written in Flask. We’re calling the Flickr API with the httpx library, and testing everything with pytest and vcrpy. The latter in particular has been so helpful – it “records” interactions with the Flickr API so I can replay them later in our test suite. If you’d like to see more, all our source code is on GitHub.

You can try Flinumeratr at https://flinumeratr.glitch.me. Please let us know what you think!

When Past Meets Predictive: An interview with the curators of ‘A Generated Family of Man’

by Tori McKenna, Oxford Internet Institute

Design students, Juwon Jung and Maya Osaka, the inaugural cohort of Flickr Foundation’s New Curators program, embarked on a journey exploring what happens when you interface synthetic image production with historic archives.

This blog post marks the release of Flickr Foundation’s A Generated Family of Man, the third iteration in a series of reinterpretations of the 1955 MoMA photography exhibition, The Family of Man.

Capturing the reflections, sentiments and future implications raised by Jung and Osaka, these working ‘field notes’ function as a snapshot in time of where we stand as users, creators and curators facing computed image generation. At a time when Artificial Intelligence and Large Language Models are still in their infancy, yet have been recently made widely accessible to internet users, this experiment is by no means an exhaustive analysis of the current state of play. However, by focusing on a single use-case, Edward Steichen’s The Family of Man, Jung and Osaka were able to reflect in greater detail and specificity over a smaller selection of images — and the resultant impact of image generation on this collection.

Observations from this experiment are phrased as a series of conversations, or ‘interfaces’ with the ‘machine’.

Interface 1: ‘That’s not what I meant’

If the aim of image generation is verisimilitude, the first observation to remark upon when feeding captions into image generation tools is there are often significant discrepancies and deviations from the original photographs. AI produces images based on most-likely scenarios, and it became evident from certain visual elements that the generator was ‘filling in’ what the machine ‘expects’. For example, when replicating the photograph of an Austrian family eating a meal, the image generator resorted to stock food and dress types. In order to gain greater accuracy, as Jung explained, “we needed to find key terms that might ‘trick’ the algorithm”. These included supplementing with descriptive prompts of details (e.g. ‘eating from a communal bowl in the centre of the table’), as well as more subjective categories gleaned from the curators interpretations of the images (’working-class’, ‘attractive’, ‘melancholic’). As Osaka remarked, “the human voice in this process is absolutely necessary”. This constitutes a talking with the algorithm, a back-and-forth dialogue to produce true-to-life images, thus further centering the role of the prompt generator or curator.

This experiment was not about producing new fantasies, but to test how well the generator could reproduce historical context or reinterpret archival imagery. Adding time-period prompts, such as “1940s-style”, result in approximations based on the narrow window of historical content within the image generator’s training set. “When they don’t have enough data from certain periods AI’s depiction can be skewed”, explains Jung. This risks reflecting or reinforcing biased or incomplete representations of the period at hand. When we consider that more images were produced in the last 20 years than the last 200 years, image generators have a far greater quarry to ‘mine’ from the contemporary period and, as we saw, often struggle with historical detail.

Key take-away:
Generated images of the past are only as good as their training bank of images, which themselves are very far from representative of historical accuracy. Therefore, we ought to develop a set of best practices for projects that seek communion between historic images or archives and generated content.

Interface 2: ‘I’m not trying to sell you anything’

In addition to synthetic image generation, Jung & Osaka also experimented with synthetic caption generation: deriving text from the original images of The Family of Man. The generated captions were far from objective or purely descriptive. As Osaka noted, “it became clear the majority of these tools were developed for content marketing and commercial usage”, with Jung adding, “there was a cheesy, Instagram-esque feel to the captions with the overuse of hashtags and emojis”. Not only was this outdated style instantly transparent and ‘eyeroll-inducing’ for savvy internet users, but in some unfortunate cases, the generator wholly misrepresented the context. In Al Chang’s photo of a grief-stricken America soldier being comforted by his fellow troops in Korea, the image generator produced the following tone-deaf caption:

“Enjoying a peaceful afternoon with my best buddy 🐶💙 #dogsofinstagram #mananddog #bestfriendsforever” (there was no dog in the photograph).

When these “Instagram-esque” captions were fed back into image generation, naturally they produced overly positive, dreamy, aspirational images that lacked the ‘bite’ of the original photographs – thus creating a feedback loop of misrecognition and misunderstood sentiment.

The image and caption generators that Jung & Osaka selected were free services, in order to test what the ‘average user’ would most likely first encounter in synthetic production. This led to another consideration around the commercialism of such tools, as the internet adage goes, “if its free, you’re the product”. Using free AI services often means relinquishing input data, a fact that might be hidden in the fine print. “One of the dilemmas we were internally facing was ‘what is actually happening to these images when we upload them’?” as Jung pondered, “are we actually handing these over to the generators’ future data-sets?”. “It felt a little disrespectful to the creator”, according to Osaka, “in some cases we used specific prompts that emulate the style of particular photographs. It’s a grey area, but perhaps this could even be an infringement on their intellectual property”.

Key take-away:
The majority of synthetic production tools are built with commercial uses in mind. If we presume there are very few ‘neutral’ services available, we must be conscious of data ownership and creator protection.

Interface 3: ‘I’m not really sure how I feel about this’

The experiment resulted in hundreds of synthetic guesses, which induced surprising feelings of guilt among the curators. “In a sense, I felt almost guilty about producing so many images”, reports Jung, with e-waste and resource intensive processing power front of mind. “But we can also think about this another way” Osaka continues, “the originals, being in their analogue form, were captured with such care and consideration. Even their selection for the exhibition was a painstaking, well-documented process”.

We might interpret this as simply a nostalgic longing for finiteness of bygone era, and our disillusionment at today’s easy, instant access. But perhaps there is something unique to synthetic generation here: the more steps the generator takes from the original image, the more degraded the original essence, or meaning, becomes. In this process, not only does the image get further from ‘truth’ in a representational sense, but also in terms of original intention of the creator. If the underlying sense of warmth and cooperation in the original photographs disappears along the generated chain, is there a role for image generation in this context at all? “It often feels like something is missing”, concludes Jung, “at its best, synthetic image generation might be able to replicate moments from the past, but is this all that a photograph is and can be?”

Key take-away: Intention and sentiment are incredibly hard to reproduce synthetically. Human empathy must first be deployed to decipher the ‘purpose’ or background of the image. Naturally, human subjectivity will be input.

Our findings

Our journey into synthetic image generation underscores the indispensable role of human intervention. While the machine can be guided towards accuracy by the so-called ‘prompt generator’, human input is still required to flesh out context where the machine may be lacking in historic data.

At its present capacity, while image generation can approximate visual fidelity, it falters when it attempts to appropriate sentiment and meaning. The uncanny distortions we see in so many of the images of A Generated Family of Man. Monstrous fingers, blurred faces, melting body parts are now so common to artificially generated images they’ve become almost a genre in themselves. These appendages and synthetic ad-libs contravene our possible human identification with the image. This lack of empathic connection, the inability to bridge across the divide, is perhaps what feels so disquieting when we view synthetic images.

As we have seen, when feeding these images into caption generators to ‘read’ the picture, only humans can reliably extract meaning from these images. Trapped within this image-to-text-to-image feedback loop, as creators or viewers we’re ultimately left calling out to the machine: Once More, with Feeling!

We hope projects like this spark the flourishing of similar experiments for users of image generators to the critical and curious about the current state of artificial “intelligence”.

Find out more about A Generated Family of Man in our New Curators program area.

Making A Generated Family of Man: Revelations about Image Generators

Juwon Jung | Posted 29 September 2023

I’m Juwon, here at the Flickr Foundation for the summer this year. I’m doing a BA in Design at Goldsmiths. There’s more background on this work in the first blog post on this project that talks about the experimental stages of using AI image and caption generators.

“What would happen if we used AI image generators to recreate The Family of Man?”

When George first posed this question in our office back in June, we couldn’t really predict what we would encounter. Now that we’ve wrapped up this uncanny yet fascinating summer project, it’s time to make sense out of what we’ve discovered, learned, and struggled with as we tried to recreate this classic exhibition catalogue.

Bing Image Creator generates better imitations when humans write the directions

We used the Bing Image Creator throughout the project and now feel quite familiar with its strengths and weaknesses. There were a few instances where the Bing Image Creator would produce surprisingly similar photographs to the originals when we wrote captions, as can be seen below:

Here are the caption iterations we made for the image of the judge (shown above, on the right page of the book):

1st iteration:
A grainy black and white portrait shot taken in the 1950s of an old judge. He has light grey hair and bushy eyebrows and is wearing black judges robes and is looking diagonally past the camera with a glum expression. He is sat at a desk with several thick books that are open. He is holding a page open with one hand. In his other hand is a pen. 

2nd iteration:
A grainy black and white portrait shot taken in the 1950s of an old judge. His body is facing towards the camera and he has light grey hair that is short and he is clean shaven. He is wearing black judges robes and is looking diagonally past the camera with a glum expression. He is sat at a desk with several thick books that are open. 

3rd iteration:
A grainy black and white close up portrait taken in the 1950s of an old judge. His body is facing towards the camera and he has light grey hair that is short and he is clean shaven. He is wearing black judges robes and is looking diagonally past the camera with a glum expression. He is sat at a desk with several thick books that are open. 

Bing Image Creator is able to demonstrate such surprising capabilities only when the human user accurately directs it with sharp prompts. Since Bing Image Creator uses natural language processing to generate images, the ‘prompt’ is an essential component to image generation. 

Human description vs AI-generated interpretation

We can compare human-written captions to the AI-generated captions made by another tool we used, Image-to-Caption. Since the primary purpose of Image-to-Caption.io is to generate ‘engaging’ captions for social media content, the AI-generated captions generated from this platform contained cheesy descriptors, hashtags, and emojis.

Using screenshots from the original catalogue, we fed images into that tool and watched as captions came out. This non-sensical response emerged for the same picture of the judge:

“In the enchanted realm of the forest, where imagination takes flight and even a humble stick becomes a magical wand. ✨🌳 #EnchantedForest #MagicalMoments #ImaginationUnleashed”

As a result, all of the images generated from AI captions looked like they were from the early Instagram-era in 2010; highly polished with strong, vibrant color filters. 

Here’s a selection of images generated using AI prompts from Image-to-Caption.io

Ethical implications of generated images?

As we compared all of these generated  images, it was our natural instinct to instantly wonder about the actual logic or dataset that the generative algorithm was operating upon. There were also certain instances where the Bing Image Creator would not be able to generate the correct ethnicity of the subject matter in the photograph, despite the prompt clearly specifying the ethnicity (over the span of 4-5 iterations).

Here are some examples of ethnicity not being represented as directed: 

What’s under the hood of these technologies?

What does this really mean though? I wanted to know more about the relationship between these observations and the underlying technology of the image generators, so I looked into the DALL-E 2 model (which is used in Bing Image Creator). 

DALL-E 2 and most other image generation tools today use the diffusion model to generate a new image that conveys the same, if not the most similar, semantic information of the input caption. In order to correctly match the visual semantic information to the corresponding textual semantic information, (e.g. matching the image of an apple to the word apple) these generative models are trained with large subsets of images and image descriptions online. 

Open AI has admitted that the “technology is constantly evolving, and DALL-E 2 has limitations” in their informational video about DALL-E 2.  

Such limitations include:

  • If the data used to train the model has been flawed and contains images that are incorrectly labeled, it may produce an image that doesn’t correspond to the text prompt. (e.g. if there are more images of a plane matched with the word car, the model can produce an image of a plane from the prompt ‘car’) 
  • The model may exhibit representational bias if it hasn’t been trained enough on a certain subject (e.g. producing an image of any kind of monkey rather than the species from the prompt ‘howler monkey’) 

From this brief research, I realized that these subtle errors of Bing Image Creator shouldn’t be simply overlooked. Whether or not Image Creator is producing relatively more errors for certain prompts could signify that, in some instances, the generated images may reflect the current visual biases, stereotypes, or assumptions that exist in our world today. 

A revealing experiment for our back cover

After having worked with very specific captions for hoped-for outcomes, we decided to zoom way out to create a back cover for our book. Instead of anything specific, we spent a short period after lunch one day experimenting with very general captioning to see the raw outputs. Since the theme of The Family of Man is the oneness of mankind and humanity, we tried entering the short words, “human,” “people,” and “human photo” in the Bing Image Creator.

These are the very general images returned to us: 

What do these shadowy, basic results really mean?
Is this what we, humans, reduce down to in the AI’s perspective? 

Staring at these images on my laptop in the Flickr Foundation headquarters, we were all stunned by the reflections of us created by the machine. Mainly consisting of elementary, undefined figures, the generated images representing the word “humans” ironically conveyed something that felt inherently opposite. 

This quick experiment at the end of the project revealed to us that perhaps having simple, general words as prompts instead of thorough descriptions may most transparently reveal how these AI systems fundamentally see and understand our world.

A Generated Family of Man is just the tip of the iceberg.

These findings aren’t concrete, but suggest possible hypotheses and areas of image generation technology that we can conduct further research on. We would like to invite everyone to join the Flickr Foundation on this exciting journey, to branch out from A Generated Family of Man and truly pick the brains of these newly introduced machines. 

Here are the summarizing points of our findings from A Generated Family of Man:
  • The abilities of Bing Image Creator to generate images with the primary aim of verisimilitude is impressive when the prompt (image caption) is either written by humans or accurately denotes the semantic information of the image.
  • In certain instances, the Image Creator performed relatively more errors when determining the ethnicity of the subject matter. This may indicate the underlying visual biases or stereotypes of the datasets the Image Creator was trained with.
  • When entering short, simple words related to humans into the Image Creator, it responded with undefined, cartoon-like human figures. Using such short prompts may reveal how the AI fundamentally sees our world and us. 

Open questions to consider

Using these findings, I thought that changing certain parameters of the investigation could make interesting starting points of new investigations, if we spent more time at the Flickr Foundation, or if anyone else wanted to continue the research. Here are some different parameters that can be explored:

  • Frequency of iteration: increase the number of trials of prompt modification or general iterations to create larger data sets for better analysis.
  • Different subject matter: investigate specific photography subjects that will allow an acute analysis on narrower fields (e.g. specific types of landscapes, species, ethnic groups).
  • Image generator platforms: look into other image generator softwares to observe distinct qualities for differing platforms.

How exciting would it be if different groups of people from all around the world participated in a collective activity to evaluate the current status of synthetic photography, and really analyze the fine details of these models? Maybe that wouldn’t scientifically reverse-engineer these models but even from qualitative investigations, findings emerge. What more will we be able to find? Will there be a way to match, cross-compare the qualitative and even quantitative investigations to deduce a solid (perhaps not definite) conclusion? And if these investigations were to take place in intervals of time, which variables will change? 

To gain inspiration for these questions, take a look at the full collection of images of A Generated Family of Man on Flickr!

Creating A Generated Family of Man

Author: Maya Osaka

Find out about the process that went into creating A Generated Family of Man, the third volume of A Flickr of Humanity.

A Flickr of Humanity is the first project in the New Curators program, revisiting and reinterpreting The Family of Man, an exhibition held at MoMa in 1955. The exhibition showcased 503 photographs from 68 countries, celebrating universal aspects of the human experience. It was a declaration of solidarity following the Second World War. 

For our third volume of A Flickr of Humanity we decided to explore the new world of generative AI using Microsoft Bing’s Image Creator to regenerate The Family of Man catalog (30th Anniversary Edition). The aim of the project was to investigate synthetic image generation to create a ‘companion publication’ to the original, and that will act as a timestamp, to showcase the state of generative AI in 2023.

Project Summary

  1. We created new machine-generated versions of photographs from The Family of Man by writing a caption for each image and passing it through Microsoft Bing’s Image Creator. These images will be referred to as Human Mediated Images (HMI.)
  2. We fed screenshots of the original photographs into ImageToCaption, an AI-powered caption generator which produces cheesy Instagramesque captions, including emojis and hashtags. These computed captions were then passed into Bing’s Image Creator to generate an image only mediated by computers. These images will be referred to as AI-generated Images (AIGI).

We curated a selection of these generated images and captions into the new publication, A Generated Family of Man.

Image generation process

It is important to note that we decided to use free AI generators because we wanted to explore the most accessible generative AI.

Generating images was time-consuming. In our early experiments, we generated several iterations of each photograph to try and get it as close to the original as possible. We’d vary the caption in each iteration to work towards a better attempt. We decided it would be more interesting to limit our caption refinements so we could see and show a less refined output. We decided to set a limit of two caption-writing iterations for the HMIs.

For the AIGIs we chose one caption from the three from the first set of generated responses. We’d use the selected caption to do one iteration of image generation, unless the caption was blocked, in which case we would pick another generated caption and try that. 

Once we had a good sense of how much labour was required to generate these images, we set an initial target to generate half of the images in the original publication. The initial image generation process, in which we spawned roughly 250 of the original photographs took around 4 weeks. We then had roughly 500 generated images with (about half HMIs and half AIGIs), and we could begin the layout work.

Making the publication

The majority of the photographs featured in The Family of Man are still in copyright so we were unable to feature the original photographs in our publication. That’s apart from the two Dorothea Lange photographs we decided to feature, and which have no known copyright. 

We decided to design the publication to act as a ‘companion publication’ to the original catalog. As we progressed making the layout, we imagined the ideal situation: the reader would have an original The Family of Man catalogue to hand to compare and contrast the original photographs and generated images side by side. With this in mind we designed the layout of the publication as an echo of the original, to streamline this kind of comparison.

It was important to demonstrate the distinctions between HMI and AIGI versions of the original images, so in some cases we shifted the layout to allow this.

Identifying HMIs and AIGIs

There was a lot of discussion around whether a reader would identify an image as an HMI or AIGI. All of the HMI images are black and white—because “black and white” and “grainy” were key human inputs in our captions to get the style right—while most of the AIGI images came out in colour. That in itself is an easy way to identify most of the images. We made the choice to use different typefaces on the captions too.

It is fascinating to compare the HMI and AIGI imagery, and we wanted to share that in the publication. So, in some cases, we’ve included both image types so readers can compare. Most of the image pairs can be identified because they share the same shape and size. All HMIs also sit on the left hand side of their paired AIGI. 

In both cases we decided that a subtle approach might be more entertaining as it would leave it in the readers hands to interpret or guess which images are which.

To watermark, or not to watermark?

Another issue that came up was around how to make it clear which images are AI-generated as there are a few images that are actual photographs. All AI images generated by Bing’s Image Creator come out with a watermark in the bottom left corner. As we made the layout, some of the original watermarks were cropped or moved out of the frame, so we decided to add the watermarks back into the AI-generated images in the bottom left corner so there is a way to identify which images are AI-generated.

Captions and quotes

In the original The Family of Man catalog, each image has a caption to show the photographer’s name, the country the photograph was taken in, and any organizations  the photograph is associated with. There are also quotes that are featured throughout the book. 

For A Generated Family of Man we decided to use the same typefaces and font sizes as the original publication. 

We decided to display the captions that were used to generate the images because we wanted to illustrate our inputs, and also those that were computer-generated. Our captions are much longer than the originals, so to prevent the pages from looking too cluttered, we added captions to a small selection of images. We decided to swap out the original quotes for quotes that are more relevant to the 21st century.

Below you can see some example pages from A Generated Family of Man.

Reflection

I had never really thought about AI that much before working on this project. I’ve spent weeks generating hundreds of images and I’ve gotten familiar with communicating with Bing’s Image Creator. I’ve been impressed by what it can do while being amused and often horrified by the weird humans it generates. It feels strange to be able to produce an image in a matter of seconds that is of such high quality, especially when we look at images that are not photo-realistic but done in an illustrative style. In ‘On AI-Generated Works, Artists, and Intellectual Property ‘, Ryan Merkley says ‘There is little doubt that these new tools will reshape economies, but who will benefit and who will be left out?’. As a designer it makes me feel a little worried about my future career as it feels almost inevitable, especially in a commercial setting, that AI will leave many visual designers redundant. 

Generative AI is still in its infancy (Bing’s Image Creator was only announced and launched in late 2022!) and soon enough it will be capable of producing life-like images that are indistinguishable from the real thing. If it isn’t already. For this project we used Bing’s Image Creator, but it would be interesting to see how this project would turn out if we used another image generator such as MidJourney, which many consider to be at the top of its game. 

There are bound to be many pros and cons to being able to generate flawless images and I am simultaneously excited and terrified to see what the future holds within the field of generative AI and AI technology at large.

Kickstarting A Generated Family of Man: Experimenting with Synthetic Photography 

Juwon Jung | Posted 18 July 2023

Ever since we created our Version 2 of A Flickr of Humanity, we’ve been brainstorming different ways to develop this project at the Flickr Foundation headquarters. Suddenly, we came across the question: what would happen if we used AI image generators to recreate The Family of Man

  • What could this reveal about the current generative AI models and their understanding of photography?
  • How might it create a new interpretation of The Family of Man exhibition?
  • What issues or problems would we encounter with this uncanny approach?

 

We didn’t know the answers to these questions, or what we might even find, so we decided to jump on board for a new journey to Version 3. (Why not?!)

We split our research into three main stages:

  1. Research into different AI image generators
  2. Exploring machine-generated image captions
  3. Challenges of using source photography responsibly in AI projects

And, we decided to try and see if we could use the current captioning and image generation technologies to fully regenerate The Family of Man for our Version 3.

 

Stage 1. Researching into different AI image generator softwares

Since the rapid advancements of generative artificial intelligence in the last couple of years, hundreds of image-generating applications, such as DALL-E 2 or Midjourney, have been launched. In the initial research stage, we tested different platforms by creating short captions of roughly ten images from The Family of Man and observing the resulting outputs.

Stage 1 Learnings: 

  • Image generators are better at creating photorealistic images of landscapes, objects, and animals than close-up shots of people. 
  • Most image generators, especially those that are free, have caps on the numbers of images that can be produced in a day, slowing down production speed. 
  • Some captions had to be altered because they violated terms and policies of the platforms; certain algorithms would censor prompts with potential to create unethical, explicit images (e.g. Section A photo caption – the word “naked” could not be used for Microsoft Bing)

We decided to use Microsoft Bing’s Image generator for this project because it produced images with highest quality (across all image categories) with most flexible limits on the quantity of images that could be generated. We’ve tested other tools including Dezgo, Veed.io, Canva, and Picsart

 

Stage 2. Exploring image captions: AI Caption Generators

Image generators today primarily operate based on text prompts. This realisation meant we should explore caption generation software in more depth. There was much less variety in the caption-generating platforms compared to image generators. The majority of the websites we found seemed intended for social media use. 

Experiment 1: Human vs machine captions

Here’s a series of experiments done by rearranging and comparing different types of captions—human-written and artificially generated—with images to observe how it alters the images generated, their different expression and, in some cases, meaning: 

Stage 2 Learnings: 

  • It was quite difficult to find a variety of caption generating software that generated different styles of captions because most platforms only generated “cheesy” social media captions, 
  • In the platforms that generated other styles of captions (not for social media), we found the depth and accuracy of the description was really limited, for example, “a mountain range with mountains.”

 

Stage 3. Challenges of using AI to experiment with photography?!

Since both the concept and process of using AI to regenerate  The Family of Man is experimental, we encountered several dilemmas along the way:

1. Copyright Issues with Original Photo Use 

  • It’s very difficult to obtain proper permission to use photos from the original publication of The Family of Man since the exhibition contains photos from 200+ photographers in different locations and for different publications. Hence, we’ve decided to not include the original photos of The Family of Man in the Version 3 publication.
  • This is disappointing because having the original photo alongside the generated versions would allow us to create a direct visual comparison between authentic and synthetic photographs.
  • All original photos of The Family of Man used in this blog post were photographed using the physical catalogue in our office.

2. Caption Generation 

  • Even during the process of generating captions, we are required to plug in the original photo of The Family of Man so we’ve had to take screenshots of the online catalogue available in The Internet Archive. This can still be a violation of the copyrights policies because we’re adopting the image within our process, even if we don’t explicitly display the original image. We also have a copy of The Family of Man publication purchased by the Flickr Foundation here at the office.

 

4. Moving Forward..

Keeping these dilemmas in mind, we will try our best to show respect to the original photographs and photographers throughout our project. We’ll also continuously repeat this process/experimentation to the rest of the images in The Family of Man to create a new Version 3 in our A Flickr of Humanity project.