juwon, Author at Flickr Foundation

I’m Juwon, here at the Flickr Foundation for the summer this year. I’m doing a BA in Design at Goldsmiths. There’s more background on this work in the first blog post on this project that talks about the experimental stages of using AI image and caption generators.

“What would happen if we used AI image generators to recreate The Family of Man?”

When George first posed this question in our office back in June, we couldn’t really predict what we would encounter. Now that we’ve wrapped up this uncanny yet fascinating summer project, it’s time to make sense out of what we’ve discovered, learned, and struggled with as we tried to recreate this classic exhibition catalogue.

Bing Image Creator generates better imitations when humans write the directions

We used the Bing Image Creator throughout the project and now feel quite familiar with its strengths and weaknesses. There were a few instances where the Bing Image Creator would produce surprisingly similar photographs to the originals when we wrote captions, as can be seen below:

Original The Family of Man (top) and our imitation, A Generated Family of Man (bottom)

Here are the caption iterations we made for the image of the judge (shown above, on the right page of the book):

1st iteration:
A grainy black and white portrait shot taken in the 1950s of an old judge. He has light grey hair and bushy eyebrows and is wearing black judges robes and is looking diagonally past the camera with a glum expression. He is sat at a desk with several thick books that are open. He is holding a page open with one hand. In his other hand is a pen.

2nd iteration:
A grainy black and white portrait shot taken in the 1950s of an old judge. His body is facing towards the camera and he has light grey hair that is short and he is clean shaven. He is wearing black judges robes and is looking diagonally past the camera with a glum expression. He is sat at a desk with several thick books that are open.

3rd iteration:
A grainy black and white close up portrait taken in the 1950s of an old judge. His body is facing towards the camera and he has light grey hair that is short and he is clean shaven. He is wearing black judges robes and is looking diagonally past the camera with a glum expression. He is sat at a desk with several thick books that are open.

Bing Image Creator is able to demonstrate such surprising capabilities only when the human user accurately directs it with sharp prompts. Since Bing Image Creator uses natural language processing to generate images, the ‘prompt’ is an essential component to image generation.

We can compare human-written captions to the AI-generated captions made by another tool we used, Image-to-Caption. Since the primary purpose of Image-to-Caption.io is to generate ‘engaging’ captions for social media content, the AI-generated captions generated from this platform contained cheesy descriptors, hashtags, and emojis.

Using screenshots from the original catalogue, we fed images into that tool and watched as captions came out. This non-sensical response emerged for the same picture of the judge:

“In the enchanted realm of the forest, where imagination takes flight and even a humble stick becomes a magical wand. ✨🌳 #EnchantedForest #MagicalMoments #ImaginationUnleashed”

As a result, all of the images generated from AI captions looked like they were from the early Instagram-era in 2010; highly polished with strong, vibrant color filters.

Here’s a selection of images generated using AI prompts from Image-to-Caption.io:

Ethical implications of generated images?

As we compared all of these generated images, it was our natural instinct to instantly wonder about the actual logic or dataset that the generative algorithm was operating upon. There were also certain instances where the Bing Image Creator would not be able to generate the correct ethnicity of the subject matter in the photograph, despite the prompt clearly specifying the ethnicity (over the span of 4-5 iterations).

Here are some examples of ethnicity not being represented as directed:

Comparison between the original Migrant Mother and 6 iterations we made on Bing Image Creator. Despite specifying the ethnicity of the mother as white/caucasian, the Image Creator didn’t produce a corresponding photo.

Comparison between an original Family of Man photo and 6 iterations we made on Bing Image Creator. Despite specifying the ethnicity of one of the boys as white/caucasian, the Image Creator didn’t produce a corresponding photo.

What’s under the hood of these technologies?

What does this really mean though? I wanted to know more about the relationship between these observations and the underlying technology of the image generators, so I looked into the DALL-E 2 model (which is used in Bing Image Creator).

DALL-E 2 and most other image generation tools today use the diffusion model to generate a new image that conveys the same, if not the most similar, semantic information of the input caption. In order to correctly match the visual semantic information to the corresponding textual semantic information, (e.g. matching the image of an apple to the word apple) these generative models are trained with large subsets of images and image descriptions online.

Open AI has admitted that the “technology is constantly evolving, and DALL-E 2 has limitations” in their informational video about DALL-E 2.

Such limitations include:

If the data used to train the model has been flawed and contains images that are incorrectly labeled, it may produce an image that doesn’t correspond to the text prompt. (e.g. if there are more images of a plane matched with the word car, the model can produce an image of a plane from the prompt ‘car’)
The model may exhibit representational bias if it hasn’t been trained enough on a certain subject (e.g. producing an image of any kind of monkey rather than the species from the prompt ‘howler monkey’)

From this brief research, I realized that these subtle errors of Bing Image Creator shouldn’t be simply overlooked. Whether or not Image Creator is producing relatively more errors for certain prompts could signify that, in some instances, the generated images may reflect the current visual biases, stereotypes, or assumptions that exist in our world today.

A revealing experiment for our back cover

After having worked with very specific captions for hoped-for outcomes, we decided to zoom way out to create a back cover for our book. Instead of anything specific, we spent a short period after lunch one day experimenting with very general captioning to see the raw outputs. Since the theme of The Family of Man is the oneness of mankind and humanity, we tried entering the short words, “human,” “people,” and “human photo” in the Bing Image Creator.

These are the very general images returned to us:

20 images resulting from typing in “people” in Bing Image Creator

20 images resulting from typing in “human” in Bing Image Creator

20 images resulting from typing in “human photo” in Bing Image Creator

What do these shadowy, basic results really mean?
Is this what we, humans, reduce down to in the AI’s perspective?

Staring at these images on my laptop in the Flickr Foundation headquarters, we were all stunned by the reflections of us created by the machine. Mainly consisting of elementary, undefined figures, the generated images representing the word “humans” ironically conveyed something that felt inherently opposite.

This quick experiment at the end of the project revealed to us that perhaps having simple, general words as prompts instead of thorough descriptions may most transparently reveal how these AI systems fundamentally see and understand our world.

A Generated Family of Man is just the tip of the iceberg.

These findings aren’t concrete, but suggest possible hypotheses and areas of image generation technology that we can conduct further research on. We would like to invite everyone to join the Flickr Foundation on this exciting journey, to branch out from A Generated Family of Man and truly pick the brains of these newly introduced machines.

**Here are the summarizing points of our findings from A Generated Family of Man:**

The abilities of Bing Image Creator to generate images with the primary aim of verisimilitude is impressive when the prompt (image caption) is either written by humans or accurately denotes the semantic information of the image.
In certain instances, the Image Creator performed relatively more errors when determining the ethnicity of the subject matter. This may indicate the underlying visual biases or stereotypes of the datasets the Image Creator was trained with.
When entering short, simple words related to humans into the Image Creator, it responded with undefined, cartoon-like human figures. Using such short prompts may reveal how the AI fundamentally sees our world and us.

Open questions to consider

Using these findings, I thought that changing certain parameters of the investigation could make interesting starting points of new investigations, if we spent more time at the Flickr Foundation, or if anyone else wanted to continue the research. Here are some different parameters that can be explored:

Frequency of iteration: increase the number of trials of prompt modification or general iterations to create larger data sets for better analysis.
Different subject matter: investigate specific photography subjects that will allow an acute analysis on narrower fields (e.g. specific types of landscapes, species, ethnic groups).
Image generator platforms: look into other image generator softwares to observe distinct qualities for differing platforms.

How exciting would it be if different groups of people from all around the world participated in a collective activity to evaluate the current status of synthetic photography, and really analyze the fine details of these models? Maybe that wouldn’t scientifically reverse-engineer these models but even from qualitative investigations, findings emerge. What more will we be able to find? Will there be a way to match, cross-compare the qualitative and even quantitative investigations to deduce a solid (perhaps not definite) conclusion? And if these investigations were to take place in intervals of time, which variables will change?

To gain inspiration for these questions, take a look at the full collection of images of A Generated Family of Man on Flickr!

Ever since we created our Version 2 of A Flickr of Humanity, we’ve been brainstorming different ways to develop this project at the Flickr Foundation headquarters. Suddenly, we came across the question: what would happen if we used AI image generators to recreate The Family of Man?

What could this reveal about the current generative AI models and their understanding of photography?
How might it create a new interpretation of The Family of Man exhibition?
What issues or problems would we encounter with this uncanny approach?

We didn’t know the answers to these questions, or what we might even find, so we decided to jump on board for a new journey to Version 3. (Why not?!)

We split our research into three main stages:

Research into different AI image generators
Exploring machine-generated image captions
Challenges of using source photography responsibly in AI projects

And, we decided to try and see if we could use the current captioning and image generation technologies to fully regenerate The Family of Man for our Version 3.

Diagram of experimentation process using caption and image generators to recreate the original Family of Man photos.

Stage 1. Researching into different AI image generator softwares

Since the rapid advancements of generative artificial intelligence in the last couple of years, hundreds of image-generating applications, such as DALL-E 2 or Midjourney, have been launched. In the initial research stage, we tested different platforms by creating short captions of roughly ten images from The Family of Man and observing the resulting outputs.

Process of initial image generation process using image generators

Results of image generation (A – mixed landscapes and people)

Results of image generation (B – landscape)

Results of image generation (C – people)

Stage 1 Learnings:

Image generators are better at creating photorealistic images of landscapes, objects, and animals than close-up shots of people.
Most image generators, especially those that are free, have caps on the numbers of images that can be produced in a day, slowing down production speed.
Some captions had to be altered because they violated terms and policies of the platforms; certain algorithms would censor prompts with potential to create unethical, explicit images (e.g. Section A photo caption – the word “naked” could not be used for Microsoft Bing)

We decided to use Microsoft Bing’s Image generator for this project because it produced images with highest quality (across all image categories) with most flexible limits on the quantity of images that could be generated. We’ve tested other tools including Dezgo, Veed.io, Canva, and Picsart.

Stage 2. Exploring image captions: AI Caption Generators

Image generators today primarily operate based on text prompts. This realisation meant we should explore caption generation software in more depth. There was much less variety in the caption-generating platforms compared to image generators. The majority of the websites we found seemed intended for social media use.

Experiment 1: Human vs machine captions

Here’s a series of experiments done by rearranging and comparing different types of captions—human-written and artificially generated—with images to observe how it alters the images generated, their different expression and, in some cases, meaning:

Human & machine made captions of The Family of Man photos (A)

Human & machine made captions of The Family of Man photos (B)

Stage 2 Learnings:

It was quite difficult to find a variety of caption generating software that generated different styles of captions because most platforms only generated “cheesy” social media captions,
In the platforms that generated other styles of captions (not for social media), we found the depth and accuracy of the description was really limited, for example, “a mountain range with mountains.”

Stage 3. Challenges of using AI to experiment with photography?!

Since both the concept and process of using AI to regenerate The Family of Man is experimental, we encountered several dilemmas along the way:

1. Copyright Issues with Original Photo Use

It’s very difficult to obtain proper permission to use photos from the original publication of The Family of Man since the exhibition contains photos from 200+ photographers in different locations and for different publications. Hence, we’ve decided to not include the original photos of The Family of Man in the Version 3 publication.
This is disappointing because having the original photo alongside the generated versions would allow us to create a direct visual comparison between authentic and synthetic photographs.
All original photos of The Family of Man used in this blog post were photographed using the physical catalogue in our office.

2. Caption Generation

Even during the process of generating captions, we are required to plug in the original photo of The Family of Man so we’ve had to take screenshots of the online catalogue available in The Internet Archive. This can still be a violation of the copyrights policies because we’re adopting the image within our process, even if we don’t explicitly display the original image. We also have a copy of The Family of Man publication purchased by the Flickr Foundation here at the office.

4. Moving Forward..

Keeping these dilemmas in mind, we will try our best to show respect to the original photographs and photographers throughout our project. We’ll also continuously repeat this process/experimentation to the rest of the images in The Family of Man to create a new Version 3 in our A Flickr of Humanity project.

A Flickr of Humanity (Right); Family of Man (Left)

Preface

In the first week of starting our summer placement at Flickr Foundation, Maya and I were tasked with an exciting project of working on A Flickr of Humanity. The original A Flickr of Humanity publication was created as part of a class exercise by students in California State University, Sacramento, supervised by Nick Shepard, assistant professor for photography. Inspired by the MoMA Family of Man exhibition, 5 groups of students were tasked with curating a selection of photos using Flickr representing the following themes: COVID-19, Love, Embers and Ashes, Women, and Spectrum. Once we showed Ben MacAskill, President & COO of Flickr, a copy of the publication, he loved it so much he asked us to arrange 250 copies for the upcoming Tugboat Institute summit where he was due to present the Flickr Foundation (amongst other things). Yay!

But, before sending the publication to the printers, we had to check the licensing of every image to make sure the publication wasn’t violating any copyrights. It’s important to the Flickr Foundation to do the best we can to present licenses and licensed work as correctly as we can.

That’s when we embarked on the crazy journey with Nick and George to create A Flickr of Humanity Version 2! We had a week to identify copyright restrictions and sources of 212 images and to replace roughly ⅕ of images missing the source or having licensing issues.

Day 1: Finding image source information

After creating an image index spreadsheet with Nick’s help, we began proofreading the list of photographer’s names and locating the image’s URL on Flickr . This involved a lot of scrolling down photo feeds and spotting images. Using the metadata of the photos in Flickr, we also logged the Creative Commons (CC) license information in the spreadsheet to make sure that we could use all photos for the final V2 publication. We worked closely with Nick throughout the project, despite the 8-hour time difference.

To make it easier to visualise the full publication, we photocopied the spreads from the publication and laid them out on the office floor. This came in especially handy for tagging images that needed to be replaced or writing down editorial notes.

40 photocopied A3-sized spreads neatly laid out on grey carpet floor in the office.

Print layout of A Flickr of Humanity V1 in Flickr Foundation HQ

Close-up of printed spreads

Maya labelling V1 images with copyrights issues with pink post-its 🙂

Partial screengrab of google spreadsheet image index

Day 2: Finishing the initial image index

When we checked the spreadsheet in the morning, we were left with a pleasant surprise: Nick had completed almost half of the missing image sources, including those that we were unable to find during the first day! Feeling optimistic, we continued our work of completing the initial image index.

Day 3: Replacing images and creating new sections

Now that we had finally come to the finished index containing information on 200+ images, we realised the majority of copyrighted photos came from the Embers and Ashes section. This meant most of those photos needed replacements.

We took the opportunity to create a new version of the section focusing on California’s nature and wildfires, and continued to replace images in other sections.

Day 4: New index creation and wrapping up!

To finish up, we created a new index with the updated page numbers and order of photos and continued to swap out any copyrighted photos. Once we were finished, Nick kindly wrapped up the final details of the index and took charge of printing the copies in the U.S.

Take a look at our A Flickr of Humanity project page to read more about the original inspirations of the publication and the Foundation’s future vision to expand Flickr as a curation tool.

Many thanks to Nick Shepard and George Oates for helping us throughout the process and the students of California State University, Sacramento for their amazing work on Version 1.

Maya Osaka

I am a second year BA Design student studying at Goldsmiths, University of London with the honour of being one of the first design interns at Flickr Foundation! Check out my work at mayaosaka.com!

Juwon Jung

I’m an interactive designer specializing in creating digital products with emerging technologies. To learn more, visit juwonjung.cargo.site 🙂

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Making A Generated Family of Man: Revelations about Image Generators

“What would happen if we used AI image generators to recreate The Family of Man?”

Bing Image Creator generates better imitations when humans write the directions

Human description vs AI-generated interpretation

“In the enchanted realm of the forest, where imagination takes flight and even a humble stick becomes a magical wand. ✨🌳 #EnchantedForest #MagicalMoments #ImaginationUnleashed”

Ethical implications of generated images?

What’s under the hood of these technologies?