Skip to content

Commit

Permalink
Merge pull request #293 from google/docs-updates
Browse files Browse the repository at this point in the history
Docs updates
  • Loading branch information
DonBraulio authored Oct 31, 2023
2 parents bd9630f + 6bb9f13 commit 1b6b604
Show file tree
Hide file tree
Showing 12 changed files with 952 additions and 1,292 deletions.
2 changes: 1 addition & 1 deletion .git-hooks/pre-commit
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
#
files_to_check="docs/src/recipes/*.ipynb"
files_to_check+=" docs/src/user_guide.ipynb"
files_to_check+=" docs/src/tutorials/getting_started.ipynb"
files_to_check+=" docs/src/getting_started.ipynb"


for path in `git diff --name-only --staged $files_to_check`
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test_notebooks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ jobs:
# * can be used and all .ipynb files in that dir will be tested sequentially
path:
- user_guide
- getting_started
- recipes/*
- tutorials/anomaly_detection_supervised
- tutorials/anomaly_detection_unsupervised
- tutorials/bank_fraud_detection_with_tfdf
- tutorials/getting_started
- tutorials/heart_rate_analysis
- tutorials/loan_outcomes_prediction
- tutorials/m5_competition
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,17 +91,17 @@ Check the [Getting Started tutorial](https://temporian.readthedocs.io/en/stable/

## Next steps

New users should refer to the [3 minutes to Temporian](https://temporian.readthedocs.io/en/stable/3_minutes/) page, which provides a
New users should refer to the [Getting Started](https://temporian.readthedocs.io/en/stable/getting_started/) guide, which provides a
quick overview of the key concepts and operations of Temporian.

After reading the 3 minute guide, visit the [User Guide](https://temporian.readthedocs.io/en/stable/user_guide/) for a deep dive into
After that, visit the [User Guide](https://temporian.readthedocs.io/en/stable/user_guide/) for a deep dive into
the major concepts, operators, conventions, and practices of Temporian. For a
hands-on learning experience, work through the [Tutorials](https://temporian.readthedocs.io/en/stable/tutorials/) or refer to the [API
reference](https://temporian.readthedocs.io/en/stable/reference/).

## Documentation

The documentation 📚 is available at [temporian.readthedocs.io](https://temporian.readthedocs.io/en/stable/). The [3 minutes to Temporian ⏰️](https://temporian.readthedocs.io/en/stable/3_minutes/) is the best way to start.
The documentation 📚 is available at [temporian.readthedocs.io](https://temporian.readthedocs.io/en/stable/). The [Getting Started guide](https://temporian.readthedocs.io/en/stable/getting_started/) is the best way to start.

## Contributing

Expand Down
2 changes: 1 addition & 1 deletion docs/.readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@ build:
- pip install -r docs/src/tutorials/requirements.txt

pre_build:
- tools/run_notebooks.sh docs/src/getting_started.ipynb
- tools/run_notebooks.sh docs/src/user_guide.ipynb
- tools/run_notebooks.sh $(ls docs/src/recipes/*.ipynb)
- tools/run_notebooks.sh docs/src/tutorials/getting_started.ipynb
# These are too slow
# - tools/run_notebooks.sh docs/src/tutorials/*.ipynb

Expand Down
2 changes: 1 addition & 1 deletion docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ extra_css:
# Navigation bar
nav:
- Home: index.md
- 3 minutes to Temporian: 3_minutes.md
- Getting Started: getting_started.ipynb
- User Guide: user_guide.ipynb
- Recipes: recipes/
- Tutorials: tutorials/
Expand Down
69 changes: 0 additions & 69 deletions docs/src/3_minutes.md

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,23 @@
"id": "b8dd3ccc",
"metadata": {},
"source": [
"# Getting Started with Temporian\n",
"# Getting Started\n",
"\n",
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google/temporian/blob/last-release/docs/src/tutorials/getting_started.ipynb)"
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google/temporian/blob/last-release/docs/src/tutorials/getting_started.ipynb)\n",
"\n",
"Temporian is an open-source Python library for preprocessing and feature engineering temporal data, to get it ready for machine learning applications 🤖.\n",
"\n",
"This guide will introduce you to the basics of the library, including how to:\n",
"- Create an `EventSet` and use it.\n",
"- Visualize input/output data using `EventSet.plot()` and interactive plots.\n",
"- Convert back and forth between `EventSet` and pandas `DataFrame`.\n",
"- Transform an `EventSet` by using **operators**.\n",
"- Work with `indexes`.\n",
"- Use common operators like `glue`, `resample`, `lag`, moving windows and arithmetics.\n",
"\n",
"If you're interested in a topic that is not included here, we provide links to other parts of the documentation on the final section, to continue learning.\n",
"\n",
"By reading this guide, you will learn how to implement a processing pipeline with Temporian, to get your data ready to train machine learning models by using straightforward operations and avoiding common mistakes."
]
},
{
Expand Down Expand Up @@ -51,17 +65,95 @@
"import numpy as np"
]
},
{
"cell_type": "markdown",
"id": "5e9bf8e3-07fd-4e4d-bdb8-d32a48a366c7",
"metadata": {},
"source": [
"## Part 1: Events and EventSets\n",
"\n",
"Events are the basic unit of data in Temporian. They consist of a timestamp and a set of feature values. Events are not handled individually, but are instead grouped together into **[`EventSets`](https://temporian.readthedocs.io/en/stable/user_guide/#events-and-eventsets)**.\n",
"\n",
"The main data structure in Temporian is the **[`EventSet`](https://temporian.readthedocs.io/en/stable/user_guide/#events-and-eventsets)**, and it represents **[multivariate and multi-index time sequences](https://temporian.readthedocs.io/en/stable/user_guide/#what-is-temporal-data)**. Let's break that down:\n",
"\n",
"- **multivariate:** indicates that each event in the time sequence holds several feature values.\n",
"- **multi-index:** indicates that the events can represent hierarchical data, and be therefore grouped by one or more of their features' values.\n",
"- **time sequence:** indicates that the events are not necessarily sampled at a uniform rate (in which case we would call it a *time series*).\n",
"\n",
"You can create an `EventSet` from a pandas DataFrame, NumPy arrays, CSV files, and more. Here is an example containing only 3 events and 2 features:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9cfb0cb1",
"metadata": {},
"outputs": [],
"source": [
"evset = tp.event_set(\n",
" timestamps=[1, 2, 3],\n",
" features={\n",
" \"feature_1\": [10, 20, 30],\n",
" \"feature_2\": [False, False, True],\n",
" },\n",
")\n",
"evset"
]
},
{
"cell_type": "markdown",
"id": "c8267798",
"metadata": {},
"source": [
"An `EventSet` can hold one or several time sequences, depending on its index.\n",
"\n",
"- If it has no index (e.g: above case), an `EventSet` holds a single multivariate time sequence.\n",
"- If it has one (or more) indexes, the events are grouped by their index values. This means that the `EventSet` will hold one multivariate time sequence for each unique value (or unique combination of values) of its indexes.\n",
"\n",
"Operators are applied on each time sequence of an `EventSet` independently. Indexing is the primary way to handle rich and complex databases. For instance, in a retail database, you can index on customers, stores, products, etc.\n",
"\n",
"The following example will create one sequence for `blue` events, and another one for `red` ones, by specifying that one of the features is an `index`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "46ebad7f-f4b4-4850-bc12-8552e55a3f6b",
"metadata": {},
"outputs": [],
"source": [
"# EventSet with indexes\n",
"evset = tp.event_set(\n",
" timestamps=[\"2023-02-04\", \"2023-02-06\", \"2023-02-07\", \"2023-02-07\"],\n",
" features={\n",
" \"feature_1\": [0.5, 0.6, np.nan, 0.9],\n",
" \"feature_2\": [\"red\", \"blue\", \"red\", \"blue\"],\n",
" \"feature_3\": [10.0, -1.0, 5.0, 5.0],\n",
" },\n",
" indexes=[\"feature_2\"],\n",
")\n",
"evset"
]
},
{
"cell_type": "markdown",
"id": "effc4483-9a1a-4e21-b376-3ed188ced821",
"metadata": {},
"source": [
"See the last part of this tutorial to see some examples using `indexes` and operators."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "18cc96f7",
"metadata": {},
"source": [
"## Example Data\n",
"### Example Data\n",
"\n",
"This minimal data consists of just one `signal` with a `timestamp` for each sample.\n",
"For the following examples, we will generate some fake data which consists of a `signal` with a `timestamp` for each sample.\n",
"\n",
"The signal is a periodic sinusoidal `season` with a slight positive slope in the long run, which we call `trend`. Plus the ubiquitous `noise`."
"The signal is composed of a periodic `season` (sine wave), with a slight positive slope which we call `trend`. Plus the ubiquitous `noise`. We will include all these components as separate features, together with the resulting `signal`."
]
},
{
Expand Down Expand Up @@ -98,14 +190,11 @@
"id": "3f156949",
"metadata": {},
"source": [
"## Part 1: Loading Data\n",
"### Creating an EventSet from a DataFrame\n",
"\n",
"Any kind of signal is represented in Temporian as a **collection of events**, using the `EventSet` object.\n",
"As mentioned in the previous section, any kind of signal is represented in Temporian as a **collection of events**, using the `EventSet` object.\n",
"\n",
"In this case there's no `indexes` because we only have one sequence.\n",
"\n",
"Indices could be useful if we had multiple signals in parallel.\n",
"For example, imagine that we needed to work with signals from multiple sensor devices, or represent the sales from many stores or products: we could separate them by setting the correct features as indexes for each one."
"In this case there's no `indexes` because we only have one sequence. In the third part we'll learn how to use them and why they can be useful."
]
},
{
Expand Down Expand Up @@ -328,7 +417,7 @@
"### Exporting outputs from Temporian\n",
"You may need to use this data in different ways for downstream tasks, like training a model using whatever library you need. \n",
"\n",
"If you can't use the data directly from Temporian, you can always go back to a pandas DataFrame:"
"If you can't use the data directly from Temporian, you can always go back to a pandas `DataFrame`:"
]
},
{
Expand All @@ -347,7 +436,7 @@
"id": "de46f604-8d28-4d56-a83d-a13f1073b6b8",
"metadata": {},
"source": [
"## Part 3: Using an index\n",
"## Part 3: Using indexes\n",
"This is the final important concept to get from this introduction.\n",
"\n",
"Indexes are useful to handle multiple signals in parallel (as mentioned at the top of this notebook).\n",
Expand Down Expand Up @@ -454,34 +543,33 @@
"## Summary\n",
"\n",
"Congratulations! You now have the basic concepts needed to create a data preprocessing pipeline with Temporian:\n",
"- Defining an **EventSet** and using **operators** on it.\n",
"- Combine **features** using **select** and **glue**.\n",
"- Coverting data back and forth between Temporian's **EventSet** and pandas **DataFrames**.\n",
"- Visualizing input/output data using **EventSet.plot()**.\n",
"- Operating and plotting with an **index**.\n",
"- Defining an `EventSet` and using **operators** on it.\n",
"- Combine features using `select` and `glue`.\n",
"- Converting data back and forth between Temporian's `EventSet` and pandas `DataFrames`.\n",
"- Visualizing input/output data using `EventSet.plot()`.\n",
"- Operating and plotting with `indexes`.\n",
"\n",
"### Other important details\n",
"\n",
"To keep it short and concise, there are interesting concepts that were not mentioned above:\n",
"\n",
"- You might as well use **datetimes** to specify the timestamps. Learn more about it on the [**Time Units** section of the User Guide](https://temporian.readthedocs.io/en/latest/user_guide/#time-units). There are many [**calendar operators**](https://temporian.readthedocs.io/en/stable/reference/temporian/operators/calendar/calendar_day_of_month/) available when working with date timestamps.\n",
"- Temporian can handle **non-uniform samplings** just as easily (non-equal distance between event timestamps). Read more about the data representation on the **[User Guide's introduction](https://temporian.readthedocs.io/en/latest/user_guide/)** or check the [**sampling** section](https://temporian.readthedocs.io/en/latest/user_guide/#sampling).\n",
"- Temporian is **strict on the feature types** when applying operations, to avoid potentially silent errors or memory issues. Check the [User Guide's **casting** section](https://temporian.readthedocs.io/en/latest/user_guide/#casting) section to learn how to tackle those cases.\n",
"- We only used moving average here, but there are a bunch of other [**moving window**](https://temporian.readthedocs.io/en/stable/reference/temporian/operators/window/moving_count/) operators, frequently useful for time sequences manipulation.\n",
"- Check the [**Time Units** section of the User Guide](https://temporian.readthedocs.io/en/latest/user_guide/#time-units). There are many [**calendar operators**](https://temporian.readthedocs.io/en/stable/reference/temporian/operators/calendar/calendar_day_of_month/) available when working with datetimes.\n",
"- To combine or operate with events from different sampling sources (potentially non-uniform samplings) check the [**sampling** section of the User Guide](https://temporian.readthedocs.io/en/stable/user_guide/#sampling).\n",
"- Temporian is **strict on the feature data types** when applying operations, to avoid potentially silent errors or memory issues. Check the [User Guide's **casting** section](https://temporian.readthedocs.io/en/latest/user_guide/#casting) section to learn how to tackle those cases.\n",
"\n",
"### Next Steps\n",
"- The [**Recipes**](https://temporian.readthedocs.io/en/latest/recipes/) are short and self-contained examples showing how to use Temporian in typical use cases.\n",
"- Try the more advanced [**tutorials**](https://temporian.readthedocs.io/en/latest/tutorials/) to continue learning by example about all these topics and more!\n",
"- The [**Recipes**](https://temporian.readthedocs.io/en/stable/recipes/) are short and self-contained examples showing how to use Temporian in typical use cases.\n",
"- Try the more advanced [**tutorials**](https://temporian.readthedocs.io/en/stable/tutorials/) to continue learning by example about all these topics and more!\n",
"- Learn how Temporian is **ready for production**, using [**graph mode**](https://temporian.readthedocs.io/en/stable/user_guide/#eager-mode-vs-graph-mode) or [Apache Beam](https://temporian.readthedocs.io/en/stable/tutorials/temporian_with_beam/).\n",
"\n",
"- We could only cover a small fraction of **[all available operators](https://temporian.readthedocs.io/en/stable/reference/temporian/operators/add_index/)**.\n",
"- We put a lot of ❤️ in the **[User Guide](https://temporian.readthedocs.io/en/latest/user_guide/)**, so make sure to check it out 🙂."
"- We put a lot of ❤️ in the **[User Guide](https://temporian.readthedocs.io/en/stable/user_guide/)**, so make sure to check it out 🙂."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "73b77f3e-dcdf-4be0-9c15-c625cb4e297e",
"cell_type": "markdown",
"id": "cebffed7",
"metadata": {},
"outputs": [],
"source": []
}
],
Expand All @@ -501,7 +589,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
"version": "3.10.13"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit 1b6b604

Please sign in to comment.