Skip to content

Commit

Permalink
updated content extraction and manipulation examples
Browse files Browse the repository at this point in the history
  • Loading branch information
muqarrab-aspose committed Jan 13, 2025
1 parent 7e86ccf commit e33ec81
Show file tree
Hide file tree
Showing 3 changed files with 0 additions and 51 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -44,18 +44,6 @@ for paragraph in doc.get_child_nodes(doc.is_paragraph, True):
text += paragraph.get_text()
```

## Extracting Images

To extract images from the document:

```python
for shape in doc.get_child_nodes(doc.is_shape, True):
if shape.has_image:
image = shape.image_data.to_bytes()
with open("image.png", "wb") as f:
f.write(image)
```

## Managing Formatting

Preserving formatting during extraction:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,16 +40,6 @@ for para in doc.get_child_nodes(asposewords.NodeType.PARAGRAPH, True):
print(text)
```

## Modifying Text

You can modify text by directly setting the text of runs or paragraphs:

```python
for para in doc.get_child_nodes(asposewords.NodeType.PARAGRAPH, True):
if "old_text" in para.get_text():
para.get_runs().get(0).set_text("new_text")
```

## Working with Formatting

Aspose.Words allows you to work with formatting styles:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,19 +51,6 @@ for paragraph in doc.get_child_nodes(aw.NodeType.PARAGRAPH, True):
paragraph.get_range().replace(text_to_remove, replacement, False, False)
```

## Replacing Text

Sometimes, you might want to replace certain text with new content. Here's an example of how to do it:

```python
text_to_replace = "old text"
new_text = "new text"

for paragraph in doc.get_child_nodes(aw.NodeType.PARAGRAPH, True):
if text_to_replace in paragraph.get_text():
paragraph.get_range().replace(text_to_replace, new_text, False, False)
```

## Removing Images

If you need to remove images from the document, you can use a similar approach. First, identify the images and then remove them:
Expand Down Expand Up @@ -94,22 +81,6 @@ for section in doc.sections:
doc.remove_child(section)
```

## Find and Replace with Regex

Regular expressions offer a powerful way to find and replace content:

```python
import re

pattern = r"\b\d{4}\b" # Example: Replace four-digit numbers
replacement = "****"

for paragraph in doc.get_child_nodes(aw.NodeType.PARAGRAPH, True):
text = paragraph.get_text()
new_text = re.sub(pattern, replacement, text)
paragraph.get_range().text = new_text
```

## Extracting Specific Content

Sometimes, you might need to extract specific content from a document:
Expand Down

0 comments on commit e33ec81

Please sign in to comment.