Skip to content

Function `baltic.loadNewick`

Barney Potter edited this page Oct 14, 2024 · 1 revision

loadNewick() Function

Description

The loadNewick() function in BALTIC is used to load a tree from a Newick file and process it into a BALTIC tree object. This function can handle various Newick formats and provides options for date extraction and time calibration.

Syntax

def loadNewick(tree_path, tip_regex='\|([0-9]+\-[0-9]+\-[0-9]+)', date_fmt='%Y-%m-%d',
               variableDate=True, absoluteTime=False, verbose=False, sortBranches=True)

Parameters

  • tree_path (str or file-like object): The path to the Newick file or a file-like object containing the Newick formatted tree.
  • tip_regex (str): A regular expression to extract dates from tip names. Default is '|([0-9]+-[0-9]+-[0-9]+)'.
  • date_fmt (str): The date format for the extracted dates. Default is '%Y-%m-%d'.
  • variableDate (bool): If True, allows for variable date formats. Default is True.
  • absoluteTime (bool): If True, converts the tree to absolute time using the tip dates encoded in tip names. Default is False.
  • verbose (bool): If True, prints verbose output during the process. Default is False.
  • sortBranches (bool): If True, sorts the branches of the tree after loading. Default is True.

Return Value

  • tree: The BALTIC tree object created from the Newick file.

Functionality

  1. Opens and reads the Newick file.
  2. Identifies the tree string within the file.
  3. Calls make_tree() to parse the tree string into a BALTIC tree object.
  4. Traverses the tree to set up internal structures.
  5. Optionally sorts the branches of the tree.
  6. If absoluteTime is True, extracts dates from tip names and calibrates the tree.
  7. Returns the processed BALTIC tree object.

Use Cases

  1. Loading phylogenetic trees from Newick files into BALTIC for analysis or visualization.
  2. Importing trees generated by other phylogenetic software that output Newick format.
  3. Processing time-calibrated trees with date information encoded in tip names.
  4. Quickly creating BALTIC tree objects from standard Newick files.

Example

import baltic as bt
import matplotlib.pyplot as plt

# Load a simple Newick tree
tree = bt.loadNewick("path/to/simple_tree.newick", verbose=True)

# Print basic tree information
print(f"Number of tips: {len(tree.getExternal())}")
print(f"Number of internal nodes: {len(tree.getInternal())}")

# Visualize the tree
fig, ax = plt.subplots(figsize=(10, 8))
tree.plotTree(ax)
tree.addText(ax)
plt.show()

# Load a tree with dates in tip names and set absolute time
dated_tree = bt.loadNewick("path/to/dated_tree.newick",
                           tip_regex='\|([0-9]{4}-[0-9]{2}-[0-9]{2})',
                           date_fmt='%Y-%m-%d',
                           absoluteTime=True,
                           verbose=True)

# Print time span of the tree
tip_times = [tip.absoluteTime for tip in dated_tree.getExternal()]
print(f"Tree spans from {min(tip_times):.2f} to {max(tip_times):.2f}")

# Visualize the time-calibrated tree
fig, ax = plt.subplots(figsize=(12, 8))
dated_tree.plotTree(ax)
dated_tree.addText(ax, x_attr=lambda n: n.absoluteTime)
ax.set_xlabel("Time")

plt.show()

Notes

  • The function can handle both file paths and file-like objects, allowing for flexibility in input sources.
  • The tip_regex parameter is crucial when working with dated tips. Ensure it correctly matches the date format in your tip names.
  • When absoluteTime is True, the function will attempt to extract dates from tip names and calibrate the tree. This requires that tip names contain date information in a consistent format.
  • The variableDate parameter allows for some flexibility in date parsing, which can be useful for trees with inconsistent date formats.
  • If date extraction fails when absoluteTime is True, the function will raise an AssertionError.
  • The sortBranches option is useful for ensuring a consistent layout of the tree, especially for visualization purposes.
  • This function is often used as the first step in BALTIC workflows involving Newick-formatted trees.
Clone this wiki locally