Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Manual and new case study from BFI #432

Open
wants to merge 105 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
105 commits
Select commit Hold shift + click to select a range
06df171
Create Case_study.md
digitensions Feb 19, 2024
91d4251
Update Case_study.md
digitensions Feb 19, 2024
57189d4
Update Case_study.md
digitensions Feb 19, 2024
ed30cb7
Update Case_study.md
digitensions Feb 19, 2024
7b67a82
Update Case_study.md
digitensions Feb 19, 2024
70f60ca
Update Case_study.md
digitensions Feb 19, 2024
c1009bc
Update Case_study.md
digitensions Feb 19, 2024
d560eec
Update Case_study.md
digitensions Feb 19, 2024
382061a
Update Case_study.md
digitensions Feb 19, 2024
7a1518f
Update Case_study.md
digitensions Feb 19, 2024
6af0e10
Update Case_study.md
digitensions Feb 19, 2024
ab6094e
Update Case_study.md
digitensions Feb 19, 2024
10cdb86
Update Case_study.md
digitensions Feb 19, 2024
e5086e2
Update Case_study.md
digitensions Feb 19, 2024
ce94d40
Update Case_study.md
digitensions Feb 19, 2024
0709d92
Update Case_study.md
digitensions Feb 19, 2024
91eb2ef
Update Case_study.md
digitensions Feb 19, 2024
23b656e
Update Case_study.md
digitensions Feb 19, 2024
aef832f
Update Case_study.md
digitensions Feb 19, 2024
f374a74
Update Case_study.md
digitensions Feb 19, 2024
191c5b8
Update Case_study.md
digitensions Feb 19, 2024
0d1450b
Update Case_study.md
digitensions Feb 19, 2024
07df454
Update Case_study.md
digitensions Feb 19, 2024
084925b
Update Case_study.md
digitensions Feb 19, 2024
af21eba
Update Case_study.md
digitensions Feb 19, 2024
0afbf39
Update Case_study.md
digitensions Feb 19, 2024
32d1077
Update Case_study.md
digitensions Feb 19, 2024
b31ccda
Update Case_study.md
digitensions Feb 19, 2024
615f78d
Update Case_study.md
digitensions Feb 19, 2024
a785a29
Update Case_study.md
digitensions Feb 19, 2024
b070188
Update Case_study.md
digitensions Feb 19, 2024
0706572
Update Case_study.md
digitensions Feb 19, 2024
1c1d2e7
Update Case_study.md
digitensions Feb 19, 2024
e5f50c9
Update Case_study.md
digitensions Feb 19, 2024
152b334
Update Case_study.md
digitensions Feb 19, 2024
70ad59f
Update Case_study.md
digitensions Feb 19, 2024
e18e266
Update Case_study.md
digitensions Feb 19, 2024
de03f16
Update Case_study.md
digitensions Feb 19, 2024
2f071ed
Update Case_study.md
digitensions Feb 19, 2024
d372683
Update Case_study.md
digitensions Feb 19, 2024
dc6d9a8
Update Case_study.md
digitensions Feb 19, 2024
182f3b1
Update Case_study.md
digitensions Feb 19, 2024
05cfc86
Update Case_study.md
digitensions Feb 28, 2024
52a1dd9
Update Case_study.md
digitensions Feb 28, 2024
1874ef0
Update Case_study.md
digitensions Feb 28, 2024
4db031c
Update Case_study.md
digitensions Feb 28, 2024
9033a3d
Update Case_study.md
digitensions Feb 28, 2024
b959546
Update Case_study.md
digitensions Feb 28, 2024
95de52d
Update Case_study.md
digitensions Feb 28, 2024
3ab1952
Update Case_study.md
digitensions Feb 28, 2024
9c8b3e2
Update Case_study.md
digitensions Feb 28, 2024
affc385
Update Case_study.md
digitensions Feb 28, 2024
a13bbd0
Update Case_study.md
digitensions Feb 28, 2024
1f9bce0
Update Case_study.md
digitensions Feb 28, 2024
1c1e232
Update Case_study.md
digitensions Mar 5, 2024
53c1d14
Update Case_study.md
stephenmcconnachie Mar 6, 2024
f33ed87
Update Case_study.md
stephenmcconnachie Mar 6, 2024
d499e96
Update Case_study.md
stephenmcconnachie Mar 6, 2024
2b2d3ca
Update Case_study.md
stephenmcconnachie Mar 6, 2024
7e7b865
Update Case_study.md
stephenmcconnachie Mar 6, 2024
bcd89e4
Merge pull request #2 from stephenmcconnachie/patch-4
digitensions Mar 7, 2024
7f0a90f
Merge pull request #1 from stephenmcconnachie/patch-3
digitensions Mar 7, 2024
5eeae06
Merge pull request #3 from stephenmcconnachie/patch-5
digitensions Mar 7, 2024
03870f3
Merge pull request #4 from stephenmcconnachie/patch-6
digitensions Mar 7, 2024
4c0f991
Merge pull request #5 from stephenmcconnachie/patch-7
digitensions Mar 7, 2024
5caa4ee
Update Case_study.md
digitensions Mar 7, 2024
d7a5939
Update User_Manual.md
digitensions Mar 7, 2024
6061dc2
Update User_Manual.md
digitensions Mar 7, 2024
0f4e63d
Update User_Manual.md
digitensions Mar 7, 2024
7490c52
Update User_Manual.md
digitensions Mar 7, 2024
3928717
Update User_Manual.md
digitensions Mar 7, 2024
6048c80
Update User_Manual.md
digitensions Mar 7, 2024
3e0d770
Update User_Manual.md
digitensions Mar 7, 2024
4742e9c
Update User_Manual.md
digitensions Mar 7, 2024
9983b2b
Update User_Manual.md
digitensions Mar 7, 2024
1842df5
Update User_Manual.md
digitensions Mar 7, 2024
3b9bbfc
Update User_Manual.md
digitensions Mar 7, 2024
cd1ebb4
Update User_Manual.md
digitensions Mar 7, 2024
3a8d404
Update User_Manual.md
digitensions Mar 7, 2024
656f4c3
Update User_Manual.md
digitensions Mar 7, 2024
4dd0d3b
Update User_Manual.md
digitensions Mar 7, 2024
bb0aa27
Update User_Manual.md
digitensions Mar 7, 2024
9e9ca24
Update Case_study.md
digitensions Apr 9, 2024
080ce97
Update Case_study.md
digitensions Apr 9, 2024
5e08699
Update User_Manual.md
digitensions Apr 9, 2024
4b8deee
Update User_Manual.md
digitensions Apr 9, 2024
1bd00b0
Update User_Manual.md
digitensions Apr 9, 2024
69522a5
Update Case_study.md
digitensions Apr 9, 2024
69ac58b
Update User_Manual.md
digitensions Apr 9, 2024
913f81a
Update Case_study.md
digitensions Apr 12, 2024
4fd2998
Update Case_study.md
digitensions Apr 12, 2024
f286b08
Update Case_study.md
digitensions Apr 12, 2024
89e35cf
Update Case_study.md
digitensions Apr 12, 2024
63e3ab4
Update Case_study.md
digitensions Apr 12, 2024
46d7b32
Merge branch 'MediaArea:main' into master
digitensions Apr 12, 2024
291e0e3
Update Case_study.md
digitensions Apr 12, 2024
224a7dd
Update Case_study.md
stephenmcconnachie Apr 15, 2024
24bb62a
Merge branch 'MediaArea:main' into master
digitensions Dec 6, 2024
d8deb5e
Update Case_study.md
digitensions Dec 6, 2024
0e2d282
Typos.
digitensions Dec 6, 2024
82ffb27
Merge branch 'MediaArea:main' into master
digitensions Jan 8, 2025
dce67af
Update Case_study.md
digitensions Jan 8, 2025
e2f4611
Update Case_study.md
digitensions Jan 8, 2025
cc29648
Update Case_study.md
digitensions Jan 8, 2025
69c9d12
Correct typos from Reto's review
digitensions Jan 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Correct typos from Reto's review
digitensions authored Jan 8, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit 69c9d122e7d2023d8f5ccc6887d41d348c889b6b
8 changes: 4 additions & 4 deletions Doc/Case_study.md
Original file line number Diff line number Diff line change
@@ -51,7 +51,7 @@ For each image sequence processed the metadata of the first DPX is collected and

Next the first file within the image sequence is checked against a DPX policy created using [Media Area's MediaConch software](https://mediaarea.net/MediaConch) - ([BFI's DPX policy](https://github.com/bfidatadigipres/dpx_encoding/blob/main/rawcooked_dpx_policy.xml)). If it passes then we know it can be encoded by RAWcooked and by our current licence. Any that fail are assessed for possible RAWcooked licence expansion or possible anomalies in the DPX.

The frame pixel size and colourspace of the sequence are used to calculate the potential reduction rate of the RAWcooked encode based on previous reduction experience. We make an assumption that 2K RGB will always be atleast one third smaller, so calculate a 1.3TB sequence will make a 1TB FFV1 Matroska. For 2K Luma and all 4K we must assume that very small size reductions could occur so map 1TB to 1TB. This step is necessary to control file ingest sizes to our Digital Preservation Infrastructure where we currently have a maximum verifiable ingest file size of 1TB. Where a sequence is over 1TB we have Python scripts to split that DPX sequence across additional folders depending on total size.
The frame pixel size and colourspace of the sequence are used to calculate the potential reduction rate of the RAWcooked encode based on previous reduction experience. We make an assumption that 2K RGB will always be at least one third smaller, so calculate a 1.3TB sequence will make a 1TB FFV1 Matroska. For 2K Luma and all 4K we must assume that very small size reductions could occur so map 1TB to 1TB. This step is necessary to control file ingest sizes to our Digital Preservation Infrastructure where we currently have a maximum verifiable ingest file size of 1TB. Where a sequence is over 1TB we have Python scripts to split that DPX sequence across additional folders depending on total size.

| RAWcooked 2K RGB | RAWcooked Luma & RAWcooked 4K |
| -------------------- | ----------------------------- |
@@ -60,7 +60,7 @@ The frame pixel size and colourspace of the sequence are used to calculate the p

### <a name="muxing">Encoding the image sequence</a>

To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into one. Most imporantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file and embedded into the Matroska wrapper. This ensures that when decoded the retrieved sequence can be verified as bit-identical to the original source sequence.
To encode our image sequences we use the ```--all``` flag released in RAWcooked v21. This flag was a sponsorship development by [NYPL](https://www.nypl.org/), and sees several preservation essential flags merged into one. Most importantly it includes the creation of checksum hashes for every image file in the sequence, with this data being saved into the RAWcooked reversibility file and embedded into the Matroska wrapper. This ensures that when decoded the retrieved sequence can be verified as bit-identical to the original source sequence.

Our RAWcooked encode command:
```
@@ -228,7 +228,7 @@ It decodes the FFV1 Matroska back to it's original form as a DPX image sequence,

We began using RAWcooked to convert 3 petabytes of 2K DPX sequence data to FFV1 Matroska for our *Unlocking Film Heritage* project. This lossless compression to FFV1 has saved us an estimated 1600TB of storage space, which has saved thousands of pounds of additional magnetic storage tape purchases. Undoubtedly this software offers amazing financial incentives with all the benefits of open standards and open-source tools. It also creates a viewable video file of an otherwise invisible DPX scan, so useful for viewing the unseen technology of film.

Today, our workflow runs 24/7 performing automated encoding of business-as-usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered. This is usually indicated in error logs or when an image sequences doesn't make it to our Digital Preservation Infrastructure. Most often this is caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg while encoding a specific DPX scan. There can be many differences found in DPX metadata depending on the scanning technology used. Where errors are found by our automations these are reported to an error log named after the image seqeuence.
Today, our workflow runs 24/7 performing automated encoding of business-as-usual DPX sequences with relatively little overview. There is a need for manual intervention when repeated errors are encountered. This is usually indicated in error logs or when an image sequences doesn't make it to our Digital Preservation Infrastructure. Most often this is caused by a new image sequence 'flavour' that we do not have covered by our RAWcooked licence, or sometimes it can indicate a problem with either RAWcooked or FFmpeg while encoding a specific DPX scan. There can be many differences found in DPX metadata depending on the scanning technology used. Where errors are found by our automations these are reported to an error log named after the image sequence.

Our 2K workflows could run multiple parallel processes with good efficiency, seeing as many as 32 concurrent encodings running at once against a single storage device. This was before we implemented the ```--all``` command which calculates checksums adding them to the reversibility data and runs a checksum comparison of the Matroska after encoding has completed which expands the encoding process. When introducing this command we reduced our concurrency, particularly as our workflow introduced a final ```--check``` pass against the Matroska file that automated the deletion of the DPX sequence, when successful. We also expanded our storage devices for RAWcooking and currently have 8 storage devices (a mix of Isilon, QNAPs and G-Rack NAS) generally set for between 2 and 8 concurrent encodings with the aim of not exceeding 32.

@@ -250,7 +250,7 @@ A separate 2K solo and parallel encoding test revealed much quicker encoding tim
* Parallel 2K RGB 16-bit DPX (367 GB) - MKV duration 11:34 - encoding time 2:40:00 - MKV was 27.6% smaller than the DPX
* Parallel 2K RGB 16-bit DPX (325 GB) - MKV duration 10:15 - encoding time 2:21:00 - MKV was 24.4% smaller than the DPX

It provides us with great reassurance to implement the ```--all``` command and we remain highly satisfied with RAWcooked encoding of DPX sequences despite the reduction in our concurrent encodings. The embedded DPX hashes which ```--all``` includes are critical for long-term preservation of the digitised film. In addition there are checksums embedded in the slices of every video frame (up to 576 checksums *per* video frame) allowing granular analysis of any problems found with digital FFV1 preservation files, should they arise. This is thanks to the FFV1 codec, and it allows us to pinpoint exactly where digital damage may have ocurred. This means we can easily replace the impacted DPX files using our duplicate preservation copies. Open-source RAWcooked, FFV1 and Matroska allow open access to their source code which means reduced likelihood of obsolescence long into the future. Finally, we plan to begin testing RAWcooked encoding of TIFF image sequences with the intention of encoding DCDM image sequences to FFV1 also.
It provides us with great reassurance to implement the ```--all``` command and we remain highly satisfied with RAWcooked encoding of DPX sequences despite the reduction in our concurrent encodings. The embedded DPX hashes which ```--all``` includes are critical for long-term preservation of the digitised film. In addition there are checksums embedded in the slices of every video frame (up to 576 checksums *per* video frame) allowing granular analysis of any problems found with digital FFV1 preservation files, should they arise. This is thanks to the FFV1 codec, and it allows us to pinpoint exactly where digital damage may have occurred. This means we can easily replace the impacted DPX files using our duplicate preservation copies. Open-source RAWcooked, FFV1 and Matroska allow open access to their source code which means reduced likelihood of obsolescence long into the future. Finally, we plan to begin testing RAWcooked encoding of TIFF image sequences with the intention of encoding DCDM image sequences to FFV1 also.

### <a name="tests">Useful test approaches</a>