-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Documentation and notes that don't need to be in the source directly will go into the wiki. 😄
Notes about progress, lessons learned, links to references.
TimC - Created a simple 3-slide PowerPoint for source material. Setup a basic Java project in Intellij IDEA; hacked up a Reader class to read the source1.pptx file. Used code snippets from Baeldung (#3) to read and extract data. After that success, extended the code to modify slides and write out a revised presentation. Used code snippets from Tutorials Point (#2) reference, with some changes due to API updates in POI 5.0.0 versus 3.x. (Note: method is now ill-named as "Reader", but this is just some temporary PoC code.)
I was able to list all slide layouts, modified slide order (move #3 -> #2), added a slide (with a layout), and extracted all Picture names (w/ bytes[], but discarded the data for now). Tried on a more complex example slide deck (45MB) and noted that attached audio files (.mp3 and .wav) were also listed in the "Pictures" list.
- Extract a pre-determined list of fields
- Dive deeper into the slide "name" field -- the one that is not easily accessible from the PPT GUI (see python-pptx #671)
- Updated the source deck XML manually to include a name element with UUID; then re-zipped
- Embedded name UUID
<p:cSld name="cSld-name-9C3C9787-305F-4251-9A7D-2B0E5B0DC7E6">
- Successfully read in this source, manipulated it, and then read in the resulting "target1.pptx"
- Takeaway: UUID name moved WITH the slide as predicted (into position 2)
Read in a PPTX file: ./target1.pptx
Reading: ./target1.pptx
slide = null; name=Slide1
slide = Originally Slide #3; name=cSld-name-9C3C9787-305F-4251-9A7D-2B0E5B0DC7E6
slide = Slide2; name=Slide3
slide = Click to edit Master title style; name=Slide4
picture name: image1.png
picture format: PNG
picture name: image2.svg
picture format: SVG
picture name: image3.png
picture format: PNG
picture name: image4.jpg
picture format: JPEG
picture name: media1.mp3
picture format: null
layout = Name: /ppt/slideLayouts/slideLayout1.xml - Content Type: application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml
layout = Name: /ppt/slideLayouts/slideLayout10.xml - Content Type: application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml
layout = Name: /ppt/slideLayouts/slideLayout11.xml - Content Type: application/vnd.openxmlformats-officedocument.presentationml.slideLayout+xml
... inclusive 2-9 ...
Wrote to target2.pptx!
- Experiment with python-pptx, especially around the name field.
- Setup a Python virtualenv (
pyenv virtualenv pyppt
), updated pip, installed python-pptx. - Ran sample script successfully to create a basic PPTX file from code.
- TIL that VS Code has a built-in Jupyter notebook with #%% denoting "cells" that can be run individually.
- Opened the source1.pptx, added a unique name to the first slide:
ppt.slides[1].name="TC-NewID"
- In the resulting target source1-pyout.pptx file, after expanding, saw
<p:cSld name="TC-NewID">
in the XML! Yay!
- Apache POI HowTo for Shapes
-
TutorialsPoint : Apache POI PPT - Quick Guide
- Dated 2017, but only a few methods so far seem to have changed.
-
Baeldung : Creating a MS PowerPoint Presentation in Java
- At time of writing this, POI 5.0.0 is current, Baeldung example used 3.17
- Learned the object interface for "modern" PowerPoint (2007+ OOXML file format) with .pptx extension are XMLSlideShow, XSLFSlide and XSLFTextShape