Data Lifeboat Update 2: More questions than answers

By Ewa Spohn

Thanks to the Digital Humanities Advancement Grant we were awarded by the National Endowment for the Humanities, our Data Lifeboat project (which is part of the Content Mobility Program) is now well and truly underway. The Data Lifeboat is our response to the challenge of archiving the 50 billion or so images currently on Flickr, should the service go down. It’s simply too big to archive as a whole, and we think that these shared histories should be available for the long term, so we’re exploring a decentralized approach. Find out more about the context for this work in our first blog post.

So, after our kick-off last month, we were left with a long list of open questions. That list became longer thanks to our first all-hands meeting that took place shortly afterwards! It grew again once we had met with the project user group – staff from the British Library, San Diego Air & Space Museum, and Congregation of Sisters of St Joseph – a small group representing the diversity of Flickr Commons members. Rather than being overwhelmed, we were buoyed by the obvious enthusiasm and encouragement across the group, all of whom agreed that this is very much an idea worth pursuing. 

As Mia Ridge from the British Library put it; “we need ephemeral collections to tell the story of now and give people who don’t currently think they have a role in preservation a different way of thinking about it”. And from Mary Grace of the Congregation of Sisters of St. Joseph in Canada, “we [the smaller institutions] don’t want to be the 3rd class passengers who drown first”. 

Software sketching

We’ve begun working on the software approach to create a Data Lifeboat, focussing on the data model and assessing existing protocols we may use to help package it. Alex and George started creating some small prototypes to test how we should include metadata, and have begun exploring what “social metadata” could be like – that’s the kind of metadata that can only be created on Flickr, and is therefore a required element in any Data Lifeboat (as you’ll see from the diagram below, it’s complex). 


Feb 2024: An early sketch of a Data Lifeboat’s metadata graph structure.

Thanks to our first set of tools, Flinumeratr and Flickypedia, we have robust, reusable code for getting photos and metadata from Flickr. We’ve done some experiments with JSON, XML, and METS as possible ways to store the metadata, and started to imagine what a small viewer that would be included in each Data Lifeboat might be like. 

Complexity of long-term licensing

Alongside the technical development we have started developing our understanding of the legal issues that a Data Lifeboat is going to have to navigate to avoid unintended consequences of long-term preservation colliding with licenses set in the present. We discussed how we could build care and informed participation into the infrastructure, and what the pitfalls might be. There are fiddly questions around creating a Data Lifeboat containing photos from other Flickr members. 

  • As the image creator, would you need to be notified if one of your images has been added to a Data Lifeboat? 
  • Conversely, how would you go about removing an image from a Data Lifeboat? 
  • What happens if there’s a copyright dispute regarding images in a Data Lifeboat that is docked somewhere else? 

We discussed which aspects of other legal and licensing models might apply to Data Lifeboats, given the need to maintain stewardship and access over the long term (100 years at least!), as well as the need for the software to remain usable over this kind of time horizon. This isn’t something that the world of software has ready answers for. 

  • Could Flickr.org offer this kind of service? 
  • How would we notify future users of the conditions of the license, let alone monitor the decay of licenses in existing Data Lifeboats over this kind of timescale? 

So many standards to choose from

We had planned to do a deep dive into the various digital asset management systems used by cultural institutions, but this turned out to be a trickier subject than we thought as there are simply too many approaches, tools, and cobbled-together hacks being used in cultural institutions. Everyone seems to be struggling with this, so it’s not clear (yet) how best to approach this. If you have any ideas, let us know!

This work is supported by the National Endowment for the Humanities.

NEH logo

Data Lifeboat Update 2: More questions than answers

By Ewa Spohn

Thanks to the Digital Humanities Advancement Grant we were awarded by the National Endowment for the Humanities, our Data Lifeboat project (which is part of the Content Mobility Program) is now well and truly underway. The Data Lifeboat is our response to the challenge of archiving the 50 billion or so images currently on Flickr, should the service go down. It’s simply too big to archive as a whole, and we think that these shared histories should be available for the long term, so we’re exploring a decentralized approach. Find out more about the context for this work in our first blog post.

So, after our kick-off last month, we were left with a long list of open questions. That list became longer thanks to our first all-hands meeting that took place shortly afterwards! It grew again once we had met with the project user group – staff from the British Library, San Diego Air & Space Museum, and Congregation of Sisters of St Joseph – a small group representing the diversity of Flickr Commons members. Rather than being overwhelmed, we were buoyed by the obvious enthusiasm and encouragement across the group, all of whom agreed that this is very much an idea worth pursuing. 

As Mia Ridge from the British Library put it; “we need ephemeral collections to tell the story of now and give people who don’t currently think they have a role in preservation a different way of thinking about it”. And from Mary Grace of the Congregation of Sisters of St. Joseph in Canada, “we [the smaller institutions] don’t want to be the 3rd class passengers who drown first”. 

Software sketching

We’ve begun working on the software approach to create a Data Lifeboat, focussing on the data model and assessing existing protocols we may use to help package it. Alex and George started creating some small prototypes to test how we should include metadata, and have begun exploring what “social metadata” could be like – that’s the kind of metadata that can only be created on Flickr, and is therefore a required element in any Data Lifeboat (as you’ll see from the diagram below, it’s complex). 


Feb 2024: An early sketch of a Data Lifeboat’s metadata graph structure.

Thanks to our first set of tools, Flinumeratr and Flickypedia, we have robust, reusable code for getting photos and metadata from Flickr. We’ve done some experiments with JSON, XML, and METS as possible ways to store the metadata, and started to imagine what a small viewer that would be included in each Data Lifeboat might be like. 

Complexity of long-term licensing

Alongside the technical development we have started developing our understanding of the legal issues that a Data Lifeboat is going to have to navigate to avoid unintended consequences of long-term preservation colliding with licenses set in the present. We discussed how we could build care and informed participation into the infrastructure, and what the pitfalls might be. There are fiddly questions around creating a Data Lifeboat containing photos from other Flickr members. 

  • As the image creator, would you need to be notified if one of your images has been added to a Data Lifeboat? 
  • Conversely, how would you go about removing an image from a Data Lifeboat? 
  • What happens if there’s a copyright dispute regarding images in a Data Lifeboat that is docked somewhere else? 

We discussed which aspects of other legal and licensing models might apply to Data Lifeboats, given the need to maintain stewardship and access over the long term (100 years at least!), as well as the need for the software to remain usable over this kind of time horizon. This isn’t something that the world of software has ready answers for. 

  • Could Flickr.org offer this kind of service? 
  • How would we notify future users of the conditions of the license, let alone monitor the decay of licenses in existing Data Lifeboats over this kind of timescale? 

So many standards to choose from

We had planned to do a deep dive into the various digital asset management systems used by cultural institutions, but this turned out to be a trickier subject than we thought as there are simply too many approaches, tools, and cobbled-together hacks being used in cultural institutions. Everyone seems to be struggling with this, so it’s not clear (yet) how best to approach this. If you have any ideas, let us know!

This work is supported by the National Endowment for the Humanities.

NEH logo

A Flickr of Humanity: Who is The Family of Man?

Author: Maya Osaka (Design Intern) Posted July 10th 2023

Please enjoy a progress report on our R&D as we continue to develop the A Flickr of Humanity project. It’s a deep dive into the catalogue of the 1955 The Family of Man exhibition.

The Family of Man was an exhibition held at MoMA in 1955.

Organized by Edward Steichen, the acclaimed photographer, curator, and director of MoMA’s Department of Photography, the exhibition showcased 503 photographs from 68 countries. It celebrated universal aspects of the human experience, and was a declaration of solidarity following on from the Second World War. Photos from the exhibition were published as a physical catalog, and it’s largely considered a photographic classic.

Tasked with doing some research into The Family of Man I spent some time really looking at the book.

(The Family of Man 30th Anniversary Edition, 1986)

What I mean by ‘really looking at it’ is, instead of just flicking through the pages and briefly glancing at the photos I took the time to really take in each image, and to notice the narrative told through the photographs and how Steichen chose to curate the images to portray this narrative. From this experience I was able to see a clear order/narrative to the book which I listed in a spreadsheet. Each photo credits the photographer, where it was taken and which client or publication it was for (e.g. Life Magazine).

The introduction in the book explains that the exhibition was “conceived as a mirror of the universal elements and emotions in the everydayness of life—as a mirror of the essential oneness of mankind throughout the world.”

As I explored the book, I found myself wanting to answer the following questions:

  1. Where were the photographers from?
  2. Where were the photos taken?
  3. How many female photographers were involved?
  4. Who were the most featured photographers? 

In order to answer these questions I created a master index of the photographs.

This shows where they appear in the book, the country depicted, the photographer and which organization the image is associated with or was made for. From this ‘master’ spreadsheet I compiled three more views:

Here is what I discovered:

46% of the photos were taken in the USA (vs the rest of the world).

Out of 484 images depicted in The Family of Man 30th Anniversary Edition (1986), 220 are from the USA. That’s 46% of all the photos. The most heavily featured countries after America were: France (32 images), Germany (21 images) and England (15 images). All in Europe. Compared to America’s 46%, France, the runner up, makes up only 7% of the total number of images. 

The image is a screenshot of a section of the photos by geography spreadsheet.

 

75% of the images were shot in North America or Europe. 
  • Northern America: 231 images (out of which 220 are from the USA)
  • Europe: 128 images
  • Asia: 69 images (including 12 images shot in Russia)
  • Africa: 24
  • South America: 12
  • Oceania: 8
  • Arctic: 3
  • Australia: 2

At this stage I will note that as Russia spans across Asia and Europe, Russia’s 12 images have been included within Asia’s statistics (not Europe). Also the infographic excludes 3 images taken in the Arctic as they did not explicitly state which part of the Arctic they were taken in.

The image is a screenshot of a section of the photos by geography spreadsheet.

56% of the photographers were American.

Out of 251 known photographers, 155 were American. That is 56% of the total number of photographers. The most common nationalities that followed were: German (17), British and French (12 each), and 15 photographers were unknown. It is important to note that some of the photographers were multinationals and in these instances their birth nationality was counted. Information on the photographer’s nationalities were collected by searching up their name on the internet and looking for credible sources.

The image is a screenshot of a section of the photographer’s biographical data  spreadsheet. 

17% of the photographers were female.

Out of the 251 known photographers 48 were women. That is 17% of the total number of photographers. 

Note: There was one photograph that was credited to Diane and Allan Arbus. I counted them as two separate individuals (one male, one female).

The image is a screenshot of the photographer’s biographical data  spreadsheet. 

Which photographers were featured most?

  1. Wayne Miller (11 photos)
  2. Henri Cartier-Bresson (9 photos)
  3. Alfred Einstaedt (8 photos), Dmitri Kessell (8 photos), Dorothea Lange (8 photos), Nat Farbman (8 photos), Ruth Orkin (8 photos). 

The image is a screenshot of the most featured photographers spreadsheet. 

Conclusions

  1. The majority of photos were shot in the US and Europe. 
  2. More than half of the photographers were American.
  3. Most of the photographers were men.
  4. Among the top 10 most featured photographers were three women (Dorothea Lange, Ruth Orkin and Margaret Bourke White).

Where are the lost photos?

On the back of The Family of Man (30th Anniversary Edition, 1986) it is stated that all 503 images from the original exhibition are showcased within the book. However, after checking through the book multiple times the number of images that I have counted (excluding the introduction images featuring images of the exhibition itself and a portrait of Steichen) are 484. This means there are 19 images that are missing.

This mystery is currently being solved by my fellow intern, Juwon Jung, who, as I write this, is cross referencing the original MoMa exhibition master checklist with the book. We will keep you posted on whether this mystery gets solved!

Creating the Infographics

While collecting this data, I began to think about how this data could be visualized. Datasets on a spreadsheet are boring to look at and can struggle to effectively communicate what they mean. So I decided to create an infographic to showcase the datasets. 

Creating the infographics posed many creative challenges, especially because this was one of my first attempts at this sort of data visualization. One of the key challenges was to create visuals that are eye-catching but simple to read and communicate a clear message. In this case: that a disproportionately large amount of the photos and photographers are of or from the USA and the majority of photographers were men.

In order to draw attention to those facts, I used a combination of techniques: Firstly the statistics that I wanted to draw the most attention to are the brightest shade of pink. (The pink that was chosen is the same pink as the Flickr Foundation logo). Secondly, the pie chart and bar chart’s proportions are accurate and highlight just how disproportionate the statistics are. A comment next to each chart states a percentage that further highlights the point that is being made. 

George Oates (Executive Director at Flickr.org)—who has extensive experience working in data visualisation—helped a lot with perfecting the look of the infographic. (Thanks George!)

Below you can see how the graphics evolved.
*Note that the statistics on previous versions are not accurate!