Skip to content

Commit

Permalink
updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
CaptainDario committed May 24, 2021
1 parent f8ce192 commit 8e1ffdb
Show file tree
Hide file tree
Showing 31 changed files with 198 additions and 112 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ fixed:
- loading always returns numpy arrays
- katakana encoded with "KE", etc. are now converted to ケ, etc.
- some of the empty images are not loaded anymore

------------------------------------------------------------
## v 2.0:
features:
- multi processed loading of the data is now possible
Expand Down
2 changes: 1 addition & 1 deletion docs/build/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: b4df5a0d0a262592f76b97dceec0e8f6
config: 810224c7089468ea62856d4685943570
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file modified docs/build/.doctrees/CHANGELOG.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/README.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/build/.doctrees/etl_character_groups.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/etl_data_reader.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/getting_started.doctree
Binary file not shown.
Binary file modified docs/build/.doctrees/index.doctree
Binary file not shown.
6 changes: 3 additions & 3 deletions docs/build/API.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>API &#8212; ETL_data_reader 2.0 documentation</title>
<title>API &#8212; ETL_data_reader 2.1 documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/classic.css" type="text/css" />

Expand Down Expand Up @@ -34,7 +34,7 @@ <h3>Navigation</h3>
<li class="right" >
<a href="getting_started.html" title="Getting started"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">ETL_data_reader 2.0 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">ETL_data_reader 2.1 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">API</a></li>
</ul>
</div>
Expand Down Expand Up @@ -107,7 +107,7 @@ <h3>Navigation</h3>
<li class="right" >
<a href="getting_started.html" title="Getting started"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">ETL_data_reader 2.0 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">ETL_data_reader 2.1 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">API</a></li>
</ul>
</div>
Expand Down
21 changes: 18 additions & 3 deletions docs/build/CHANGELOG.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>ETL Data Reader : Changelog &#8212; ETL_data_reader 2.0 documentation</title>
<title>ETL Data Reader : Changelog &#8212; ETL_data_reader 2.1 documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/classic.css" type="text/css" />

Expand All @@ -30,7 +30,7 @@ <h3>Navigation</h3>
<li class="right" >
<a href="indices_and_tables.html" title="Indices and tables"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">ETL_data_reader 2.0 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">ETL_data_reader 2.1 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">ETL Data Reader : Changelog</a></li>
</ul>
</div>
Expand All @@ -42,6 +42,20 @@ <h3>Navigation</h3>

<div class="section" id="etl-data-reader-changelog">
<h1>ETL Data Reader : Changelog<a class="headerlink" href="#etl-data-reader-changelog" title="Permalink to this headline"></a></h1>
<div class="section" id="v-2-1">
<h2>v 2.1<a class="headerlink" href="#v-2-1" title="Permalink to this headline"></a></h2>
<p>features:</p>
<ul class="simple">
<li><p>parameter to save all images and labels to disk</p></li>
</ul>
<p>fixed:</p>
<ul class="simple">
<li><p>loading always returns numpy arrays</p></li>
<li><p>katakana encoded with “KE”, etc. are now converted to ケ, etc.</p></li>
<li><p>some of the empty images are not loaded anymore</p></li>
</ul>
</div>
<hr class="docutils" />
<div class="section" id="v-2-0">
<h2>v 2.0:<a class="headerlink" href="#v-2-0" title="Permalink to this headline"></a></h2>
<p>features:</p>
Expand Down Expand Up @@ -77,6 +91,7 @@ <h2>v 1.0:<a class="headerlink" href="#v-1-0" title="Permalink to this headline"
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">ETL Data Reader : Changelog</a><ul>
<li><a class="reference internal" href="#v-2-1">v 2.1</a></li>
<li><a class="reference internal" href="#v-2-0">v 2.0:</a></li>
<li><a class="reference internal" href="#v-1-0">v 1.0:</a></li>
</ul>
Expand Down Expand Up @@ -119,7 +134,7 @@ <h3>Navigation</h3>
<li class="right" >
<a href="indices_and_tables.html" title="Indices and tables"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">ETL_data_reader 2.0 documentation</a> &#187;</li>
<li class="nav-item nav-item-0"><a href="index.html">ETL_data_reader 2.1 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">ETL Data Reader : Changelog</a></li>
</ul>
</div>
Expand Down
49 changes: 26 additions & 23 deletions docs/build/README.html
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>ETL_data_reader &#8212; ETL_data_reader 2.0 documentation</title>
<title>ETLCDB_data_reader &#8212; ETL_data_reader 2.1 documentation</title>
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/classic.css" type="text/css" />

Expand All @@ -26,8 +26,8 @@ <h3>Navigation</h3>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">ETL_data_reader 2.0 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">ETL_data_reader</a></li>
<li class="nav-item nav-item-0"><a href="index.html">ETL_data_reader 2.1 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">ETLCDB_data_reader</a></li>
</ul>
</div>

Expand All @@ -36,19 +36,19 @@ <h3>Navigation</h3>
<div class="bodywrapper">
<div class="body" role="main">

<div class="section" id="etl-data-reader">
<h1>ETL_data_reader<a class="headerlink" href="#etl-data-reader" title="Permalink to this headline"></a></h1>
<p>A python package for conveniently loading the ETL data set.
<div class="section" id="etlcdb-data-reader">
<h1>ETLCDB_data_reader<a class="headerlink" href="#etlcdb-data-reader" title="Permalink to this headline"></a></h1>
<p>A python package for conveniently loading the ETLCDB.
The complete documentation including the API can be found <a class="reference external" href="https://captaindario.github.io/ETL_data_reader/build/index.html">here</a>.</p>
<div class="section" id="intro">
<h2>Intro<a class="headerlink" href="#intro" title="Permalink to this headline"></a></h2>
<p>The ETL data set is a collection of roughly 1.600.000 handwritten characters.
<p>The ETLCDB is a collection of roughly 1.600.000 handwritten characters.
Notably it includes Japanese Kanji, Hiragana and Katakana.
The data set can be found <a class="reference external" href="http://etlcdb.db.aist.go.jp/">on the ETL website</a> (a registration is needed to download the data set).
The data set can be found <a class="reference external" href="http://etlcdb.db.aist.go.jp/">on the ETLCDB website</a> (a registration is needed to download the data set).
<span class="raw-html-m2r"><br/></span>
Because the data set is stored in a custom data structure it can be hard to load.
This python package provides an easy way to load this data set and filter entries.<span class="raw-html-m2r"><br/></span>
An example of using this package can be found in my application: <a class="reference external" href="https://github.com/CaptainDario/DaKanjiRecognizer">DaKanjiRecognizer</a>. There it was used for <a class="reference external" href="https://github.com/CaptainDario/DaKanjiRecognizer-ML">training an CNN to recognize Japanese kanji characters</a>.<span class="raw-html-m2r"><br/></span>
An example of using this package can be found in my application: <a class="reference external" href="https://github.com/CaptainDario/DaKanji-mobile">DaKanji</a>. There it was used for <a class="reference external" href="https://github.com/CaptainDario/DaKanjiRecognizer-ML">training an CNN to recognize hand written Japanese characters, numbers and roman letters</a>.<span class="raw-html-m2r"><br/></span>
General information about the data set can be found in the table below.</p>
<table class="docutils align-default">
<colgroup>
Expand Down Expand Up @@ -174,13 +174,16 @@ <h2>Intro<a class="headerlink" href="#intro" title="Permalink to this headline">
</tr>
</tbody>
</table>
<p><strong>Caution:</strong> <span class="raw-html-m2r"><br></span>
The ETL6 and ETL7 data set parts have labels which are saved in roman letters.
As an example: “け” is stored as “ke”.</p>
<p><strong>Note:</strong> <span class="raw-html-m2r"><br></span>
The ETL6 and ETL7 parts include half width katakana which are stored as roman letters.
As an example: “ケ” is stored as “ke”.
Those are automatically converted from this package.
Also full width numbers and letters are converted when using the package.
Example: 0 -&gt; 0 and A -&gt; A</p>
</div>
<div class="section" id="setup">
<h2>Setup<a class="headerlink" href="#setup" title="Permalink to this headline"></a></h2>
<p>First download the wheel from the <a class="reference external" href="https://github.com/CaptainDario/ETL_data_reader/releases">releases page</a>.
<p>First download the wheel from the <a class="reference external" href="https://github.com/CaptainDario/ETLCDB_data_reader/releases">releases page</a>.
Now install the wheel with:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>pip install .<span class="se">\p</span>ath<span class="se">\t</span>o<span class="se">\e</span>tl_data_reader_CaptainDario-2.0-py3-none-any.whl
</pre></div>
Expand All @@ -189,7 +192,7 @@ <h2>Setup<a class="headerlink" href="#setup" title="Permalink to this headline">
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>pip install https://github.com/CaptainDario/ETL_data_reader/releases/download/2.0/etl_data_reader_CaptainDario-2.0-py3-none-any.whl
</pre></div>
</div>
<p>Assuming you already have <a class="reference external" href="http://etlcdb.db.aist.go.jp/obtaining-etl-character-database">downloaded the ETL data set</a>.
<p>Assuming you already have <a class="reference external" href="http://etlcdb.db.aist.go.jp/obtaining-etl-character-database">downloaded the ETLCDB</a>.
You have to do some renaming of the data set folders and files.
First rename the folders like this:</p>
<ul class="simple">
Expand All @@ -203,7 +206,7 @@ <h2>Setup<a class="headerlink" href="#setup" title="Permalink to this headline">
<li><p>ETL_data_setETLXETLX_Y <span class="raw-html-m2r"><br/></span>
(<em>X and Y are numbers</em>)</p></li>
</ul>
<p>On the <a class="reference external" href="http://etlcdb.db.aist.go.jp/file-formats-and-sample-unpacking-code">ETL website</a> is also a file called “euc_co59.dat” provided. This <strong>file should also be included in the “data set”-folder</strong> on the same level as the data set part folders.</p>
<p>On the <a class="reference external" href="http://etlcdb.db.aist.go.jp/file-formats-and-sample-unpacking-code">ETLCDB website</a> is also a file called “euc_co59.dat” provided. This <strong>file should also be included in the “data set”-folder</strong> on the same level as the data set part folders.</p>
<p>The folder structure should look like this now: <span class="raw-html-m2r"><br/></span></p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>ETL_data_set_folder <span class="o">(</span>main folder<span class="o">)</span>
<span class="p">|</span> euc_co59.dat
Expand Down Expand Up @@ -239,7 +242,7 @@ <h2>Usage<a class="headerlink" href="#usage" title="Permalink to this headline">
<p>To load the data set you need an <code class="docutils literal notranslate"><span class="pre">ETLDataReader</span></code>-instance.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">path_to_data_set</span> <span class="o">=</span> <span class="s2">&quot;the\path</span><span class="se">\t</span><span class="s2">o</span><span class="se">\t</span><span class="s2">he\data\set&quot;</span>

<span class="n">reader</span> <span class="o">=</span> <span class="n">etldr</span><span class="o">.</span><span class="n">ETLDataReader</span><span class="p">(</span><span class="n">path_to_data_set</span><span class="p">)</span>
<span class="n">reader</span> <span class="o">=</span> <span class="n">etldr</span><span class="o">.</span><span class="n">DataReader</span><span class="p">(</span><span class="n">path_to_data_set</span><span class="p">)</span>
</pre></div>
</div>
<p>where <code class="docutils literal notranslate"><span class="pre">path_to_data_set</span></code> should be the path to the main folder of your data set copy.<span class="raw-html-m2r"><br/></span>
Expand Down Expand Up @@ -283,7 +286,7 @@ <h3>Load the whole data set<a class="headerlink" href="#load-the-whole-data-set"
<span class="n">imgs</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">reader</span><span class="o">.</span><span class="n">read_dataset_whole</span><span class="p">(</span><span class="n">include</span><span class="p">)</span>
</pre></div>
</div>
<p>This will load all <em>roman</em> and <em>symbol</em> characters from the whole ETL data set.</p>
<p>This will load all <em>roman</em> and <em>symbol</em> characters from the whole ETLCDB.</p>
</div>
<div class="section" id="load-the-whole-data-set-using-multiple-processes">
<h3>Load the whole data set using multiple processes<a class="headerlink" href="#load-the-whole-data-set-using-multiple-processes" title="Permalink to this headline"></a></h3>
Expand All @@ -295,11 +298,11 @@ <h3>Load the whole data set using multiple processes<a class="headerlink" href="
<span class="n">imgs</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">reader</span><span class="o">.</span><span class="n">read_dataset_whole</span><span class="p">(</span><span class="n">include</span><span class="p">,</span> <span class="mi">16</span><span class="p">)</span>
</pre></div>
</div>
<p>This will load all <em>roman</em> and <em>symbol</em> characters from the whole ETL data set using 16 processes.</p>
<p>This will load all <em>roman</em> and <em>symbol</em> characters from the whole ETLCDB using 16 processes.</p>
<div class="section" id="note-filtering-data-set-entries">
<h4><strong>Note: filtering data set entries</strong><a class="headerlink" href="#note-filtering-data-set-entries" title="Permalink to this headline"></a></h4>
<p>As the examples above already showed the loading of data set entries can be restricted to certain groups.
Those groups can be seen in: <a class="reference external" href="https://captaindario.github.io/ETL_data_reader/build/etl_character_groups.html">etl_character_groups.py</a></p>
Those groups can be seen in: <a class="reference external" href="https://captaindario.github.io/ETLCDB_data_reader/build/etl_character_groups.html">etl_character_groups.py</a></p>
</div>
<div class="section" id="note-processing-the-images-while-loading">
<h4><strong>Note: processing the images while loading</strong><a class="headerlink" href="#note-processing-the-images-while-loading" title="Permalink to this headline"></a></h4>
Expand All @@ -323,7 +326,7 @@ <h2>Limitations<a class="headerlink" href="#limitations" title="Permalink to thi
<li><p>image</p></li>
<li><p>label of the image</p></li>
</ul>
<p>of every ETL data set entry.</p>
<p>of every ETLCDB entry.</p>
<p>However this package should be easily extendable to add support for accessing the other data.</p>
</div>
<div class="section" id="development-notes">
Expand Down Expand Up @@ -381,7 +384,7 @@ <h2>Additional Notes<a class="headerlink" href="#additional-notes" title="Permal
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">ETL_data_reader</a><ul>
<li><a class="reference internal" href="#">ETLCDB_data_reader</a><ul>
<li><a class="reference internal" href="#intro">Intro</a></li>
<li><a class="reference internal" href="#setup">Setup</a></li>
<li><a class="reference internal" href="#usage">Usage</a><ul>
Expand Down Expand Up @@ -438,8 +441,8 @@ <h3>Navigation</h3>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="nav-item nav-item-0"><a href="index.html">ETL_data_reader 2.0 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">ETL_data_reader</a></li>
<li class="nav-item nav-item-0"><a href="index.html">ETL_data_reader 2.1 documentation</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">ETLCDB_data_reader</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
Expand Down
17 changes: 17 additions & 0 deletions docs/build/_sources/CHANGELOG.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,23 @@
ETL Data Reader : Changelog
===========================

v 2.1
-----

features:


* parameter to save all images and labels to disk

fixed:


* loading always returns numpy arrays
* katakana encoded with "KE", etc. are now converted to ケ, etc.
* some of the empty images are not loaded anymore

----

v 2.0:
------

Expand Down
Loading

0 comments on commit 8e1ffdb

Please sign in to comment.