From e33ec812f07d9b2173fdb769d10b510ce545734d Mon Sep 17 00:00:00 2001 From: Muhammad Muqarrab Date: Mon, 13 Jan 2025 15:14:58 +0500 Subject: [PATCH] updated content extraction and manipulation examples --- .../document-content-extraction/_index.md | 12 -------- .../extract-modify-document-content/_index.md | 10 ------- .../remove-content-documents/_index.md | 29 ------------------- 3 files changed, 51 deletions(-) diff --git a/content/english/python-net/content-extraction-and-manipulation/document-content-extraction/_index.md b/content/english/python-net/content-extraction-and-manipulation/document-content-extraction/_index.md index a2057bf36b..28f149e014 100644 --- a/content/english/python-net/content-extraction-and-manipulation/document-content-extraction/_index.md +++ b/content/english/python-net/content-extraction-and-manipulation/document-content-extraction/_index.md @@ -44,18 +44,6 @@ for paragraph in doc.get_child_nodes(doc.is_paragraph, True): text += paragraph.get_text() ``` -## Extracting Images - -To extract images from the document: - -```python -for shape in doc.get_child_nodes(doc.is_shape, True): - if shape.has_image: - image = shape.image_data.to_bytes() - with open("image.png", "wb") as f: - f.write(image) -``` - ## Managing Formatting Preserving formatting during extraction: diff --git a/content/english/python-net/content-extraction-and-manipulation/extract-modify-document-content/_index.md b/content/english/python-net/content-extraction-and-manipulation/extract-modify-document-content/_index.md index 09d365f951..54d6a3e7a7 100644 --- a/content/english/python-net/content-extraction-and-manipulation/extract-modify-document-content/_index.md +++ b/content/english/python-net/content-extraction-and-manipulation/extract-modify-document-content/_index.md @@ -40,16 +40,6 @@ for para in doc.get_child_nodes(asposewords.NodeType.PARAGRAPH, True): print(text) ``` -## Modifying Text - -You can modify text by directly setting the text of runs or paragraphs: - -```python -for para in doc.get_child_nodes(asposewords.NodeType.PARAGRAPH, True): - if "old_text" in para.get_text(): - para.get_runs().get(0).set_text("new_text") -``` - ## Working with Formatting Aspose.Words allows you to work with formatting styles: diff --git a/content/english/python-net/content-extraction-and-manipulation/remove-content-documents/_index.md b/content/english/python-net/content-extraction-and-manipulation/remove-content-documents/_index.md index 27fb1c7638..db886c835b 100644 --- a/content/english/python-net/content-extraction-and-manipulation/remove-content-documents/_index.md +++ b/content/english/python-net/content-extraction-and-manipulation/remove-content-documents/_index.md @@ -51,19 +51,6 @@ for paragraph in doc.get_child_nodes(aw.NodeType.PARAGRAPH, True): paragraph.get_range().replace(text_to_remove, replacement, False, False) ``` -## Replacing Text - -Sometimes, you might want to replace certain text with new content. Here's an example of how to do it: - -```python -text_to_replace = "old text" -new_text = "new text" - -for paragraph in doc.get_child_nodes(aw.NodeType.PARAGRAPH, True): - if text_to_replace in paragraph.get_text(): - paragraph.get_range().replace(text_to_replace, new_text, False, False) -``` - ## Removing Images If you need to remove images from the document, you can use a similar approach. First, identify the images and then remove them: @@ -94,22 +81,6 @@ for section in doc.sections: doc.remove_child(section) ``` -## Find and Replace with Regex - -Regular expressions offer a powerful way to find and replace content: - -```python -import re - -pattern = r"\b\d{4}\b" # Example: Replace four-digit numbers -replacement = "****" - -for paragraph in doc.get_child_nodes(aw.NodeType.PARAGRAPH, True): - text = paragraph.get_text() - new_text = re.sub(pattern, replacement, text) - paragraph.get_range().text = new_text -``` - ## Extracting Specific Content Sometimes, you might need to extract specific content from a document: