Quick start¶
Getting the code and installation¶
Get the source code from the Github repository with running git clone
.
[snaga@localhost tmp]$ git clone https://github.com/snaga/Hecatoncheir.git
Cloning into 'Hecatoncheir'...
remote: Counting objects: 3482, done.
remote: Compressing objects: 100% (237/237), done.
remote: Total 3482 (delta 176), reused 19 (delta 19), pack-reused 3226
Receiving objects: 100% (3482/3482), 864.87 KiB | 1.09 MiB/s, done.
Resolving deltas: 100% (2534/2534), done.
[snaga@localhost tmp]$ cd Hecatoncheir/
[snaga@localhost Hecatoncheir]$ ls
LICENSE build.bat dm-import-csv env.sh
QuickStart.md build.sh dm-import-datamapping requirements.txt
README.md demo dm-run-profiler setup.py
README.oracle dm-attach-file dm-run-server src
bin dm-export-repo dm-verify-results
[snaga@localhost Hecatoncheir]$
Then, install them with running pip install .
.
[snaga@localhost Hecatoncheir]$ sudo /usr/local/bin/pip install .
Processing /disk/disk1/snaga/Hecatoncheir/Hecatoncheir
Requirement already satisfied: jinja2==2.8 in /usr/local/lib/python2.7/site-packages (from hecatoncheir==0.8)
Requirement already satisfied: MarkupSafe in /usr/local/lib/python2.7/site-packages (from jinja2==2.8->hecatoncheir==0.8)
Installing collected packages: hecatoncheir
Running setup.py install for hecatoncheir ... done
Successfully installed hecatoncheir-0.8
[snaga@localhost Hecatoncheir]$
If the following commands get installed properly, the installation is succeeded.
[snaga@localhost tmp]$ ls /usr/local/bin/dm-*
/usr/local/bin/dm-attach-file /usr/local/bin/dm-import-datamapping
/usr/local/bin/dm-dump-xls /usr/local/bin/dm-run-profiler
/usr/local/bin/dm-export-repo /usr/local/bin/dm-run-server
/usr/local/bin/dm-import-csv /usr/local/bin/dm-verify-results
[snaga@localhost tmp]$
Collecting metadata and profiling data¶
Let’s try collecting metadata and profiling your tables. Here, for example, we are going to pick CUSTOMER
table on SCOTT
schema on our Oracle database.
To collect metadata and profile data, use the dm-run-profiler
command.
Run dm-run-profiler
command with specifying database type, tns name, user name, password and the target table name (SCOTT.CUSTOMER
).
[snaga@localhost tmp]$ dm-run-profiler --dbtype oracle --tnsname orcl --user scott --pass tiger SCOTT.CUSTOMER
[2017-05-09 12:38:07] INFO: TNS info: scott@orcl
[2017-05-09 12:38:07] INFO: Connecting the database.
[2017-05-09 12:38:07] INFO: Connected to the database.
[2017-05-09 12:38:07] INFO: The repository has been initialized.
[2017-05-09 12:38:07] INFO: The repository file `repo.db' has been opened.
[2017-05-09 12:38:07] INFO: ----------------------------------------------
[2017-05-09 12:38:07] INFO: Parallel degree for table scan: 0
[2017-05-09 12:38:07] INFO: Skipping table profiling: False
[2017-05-09 12:38:07] INFO: Row count profiling: True
[2017-05-09 12:38:07] INFO: Skippig column profiling: False
[2017-05-09 12:38:07] INFO: Maximum row count to enable column profiling: 100,000,000 rows
[2017-05-09 12:38:07] INFO: Min/Max values: True
[2017-05-09 12:38:07] INFO: Number of null values: True
[2017-05-09 12:38:07] INFO: Top-N most/least freq values: 10 values
[2017-05-09 12:38:07] INFO: Column cardinality: True
[2017-05-09 12:38:07] INFO: Data validation: False
[2017-05-09 12:38:07] INFO: Obtaining sample records: True
[2017-05-09 12:38:07] INFO: ----------------------------------------------
[2017-05-09 12:38:07] INFO: Profiling on 1 tables.
[2017-05-09 12:38:07] INFO: Profiling SCOTT.CUSTOMER: start
[2017-05-09 12:38:07] INFO: Data types: start
[2017-05-09 12:38:07] INFO: Data types: end
[2017-05-09 12:38:07] INFO: Row count: start
[2017-05-09 12:38:07] INFO: Row count: end (28)
[2017-05-09 12:38:07] INFO: Sample rows: start
[2017-05-09 12:38:07] INFO: Sample rows: end
[2017-05-09 12:38:07] INFO: Number of nulls: start
[2017-05-09 12:38:07] INFO: Number of nulls: end
[2017-05-09 12:38:07] INFO: Min/Max values: start
[2017-05-09 12:38:07] INFO: Min/Max values: end
[2017-05-09 12:38:07] INFO: Most/Least freq values(1/2): start
[2017-05-09 12:38:07] INFO: Most/Least freq values(2/2): start
[2017-05-09 12:38:07] INFO: Most/Least freq values: end
[2017-05-09 12:38:07] INFO: Column cardinality: start
[2017-05-09 12:38:07] INFO: Column cardinality: end
[2017-05-09 12:38:07] INFO: Record validation: start
[2017-05-09 12:38:07] INFO: Record validation: no validation rule
[2017-05-09 12:38:07] INFO: Profiling SCOTT.CUSTOMER: end
[2017-05-09 12:38:07] INFO: Profiling errors have occured on 1/0 tables.
[2017-05-09 12:38:07] INFO: Completed profiling 1 tables.
[snaga@localhost tmp]$
Once collecting metadata and profiling data get completed, these will be stored in the repository file. (by default, repo.db
)
[snaga@localhost tmp]$ ls -l repo.db
-rw-r--r-- 1 snaga snaga 35840 May 9 12:38 repo.db
[snaga@localhost tmp]$
Exporting to the HTML files¶
To export collected metadata and data profile to the HTML files, use dm-export-repo
command.
By running dm-export-repo
command with specifying the repository file and the output directory, it exports HTML files to the ouput directory. By default, it exports as HTML files.
[snaga@localhost tmp]$ dm-export-repo repo.db html
[2017-05-09 12:39:10] INFO: Created the output directory `html'.
[2017-05-09 12:39:10] INFO: The repository file `repo.db' has been opened.
[2017-05-09 12:39:10] INFO: Generated html/orcl.SCOTT.CUSTOMER.html.
[2017-05-09 12:39:10] INFO: Generated html/orcl.SCOTT.html.
[2017-05-09 12:39:10] INFO: Generated html/validation-valid.html.
[2017-05-09 12:39:10] INFO: Generated html/validation-invalid.html.
[2017-05-09 12:39:10] INFO: Generated html/index.html.
[2017-05-09 12:39:10] INFO: Generated html/index-tags.html.
[2017-05-09 12:39:10] INFO: Generated html/index-schemas.html.
[2017-05-09 12:39:10] INFO: Generated html/glossary.html.
[2017-05-09 12:39:10] INFO: Copied the static directory to `html'.
[snaga@localhost tmp]$ ls -l html
total 140
-rw-rw-r-- 1 snaga snaga 5111 May 9 12:39 glossary.html
-rw-rw-r-- 1 snaga snaga 6037 May 9 12:39 index-schemas.html
-rw-rw-r-- 1 snaga snaga 5612 May 9 12:39 index-tags.html
-rw-rw-r-- 1 snaga snaga 6037 May 9 12:39 index.html
-rw-rw-r-- 1 snaga snaga 79360 May 9 12:39 orcl.SCOTT.CUSTOMER.html
-rw-rw-r-- 1 snaga snaga 6040 May 9 12:39 orcl.SCOTT.html
drwxr-xr-x 4 snaga snaga 4096 May 6 17:39 static
-rw-rw-r-- 1 snaga snaga 4466 May 9 12:39 validation-invalid.html
-rw-rw-r-- 1 snaga snaga 4704 May 9 12:39 validation-valid.html
[snaga@localhost tmp]$
By opening those HTML files with your web browser, you can see those metadata coming from the data dictionaries and data profiles coming from your actual tables.