Command references¶
dm-attach-file¶
dm-attach-file
command attaches or removes files to/from the data sets (tags/schemas) and tables. Also, it shows a list of the files attached to the data sets/tables.
Usage: dm-attach-file [repo file] [target] [command]
Targets:
tag:[tag label]
schema:[schema name]
table:[table name]
Commands:
list
add [file]
rm [file]
Options:
--help Print this help.
repo file
should be a file name of the repository.
target
specifies the target data set or table with tag
, schema
or table
with its name.
command
specifies the action for the data set or table with list
, add
or rm
with a file name.
dm-dump-xls¶
dm-dump-xls
command dumps Excel sheets in the CSV format.
It takes a file name of the Excel book and a name of the sheet or an index of the sheet (1,2,3…) to be converted, and dumps the sheet to STDOUT in the CSV format.
Usage: dm-dump-xls <filename.xls> [<sheet name> | <sheet index>]
Options:
-e STRING Output encoding (default: utf-8)
--help Print this help.
-e
specifies the encoding of the output CSV data.
dm-export-repo¶
dm-export-repo
exports metadata, statistics and other supplimental data stored in the repository as a data catalog in specific format (c.f. HTML).
The repository data can be exported in the following forms.
- HTML files
- CSV files
- JSON files
Usage: dm-export-repo [options...] [repo file] [output directory]
Options:
--format <STRING> Output format. (html, json or csv)
--help Print this help.
Options for HTML format:
--tags <TAG>[,<TAG>] Tag names to be shown on the top page.
--schemas <SCHEMA>[,<SCHEMA>] Schema names to be shown on the top page.
--template <STRING> Directory name for template files.
Options for CSV format:
--encoding <STRING> Character encoding for output files.
repo file
should be a file name of the repository.
output directory
should be a directory name for the output.
--format
specifies the output format. By default, html
.
--tags
specifies the tag names which must be appeared on the home page of the data catalog. (by default, tag names will be sorted by the name)
--schemas
specifies the schema names which must be appeared on the home page of the data catalog. (by default, schema names will be sorted by the name)
--templates
specifies a directory name which contains the template files for generating html files for data catalog.
--encoding
specifies character encoding for the output csv files. By default, utf-8
.
dm-import-csv¶
dm-import-csv
command imports supplimental metadata of the tables/columns and other additional information from csv files.
It can import following CSV files.
- Table Metadata CSV
- Column Metadata CSV
- Schema Comment CSV
- Tag Comment CSV
- Business Glossary CSV
- Data Validation Rule CSV
Usage: dm-import-csv [repo file] [csv file]
Options:
-E, --encoding=STRING Encoding of the CSV file (default: sjis)
--help Print this help.
repo file
should be a file name of the repository.
-E, --encoding
specifies the input encoding of the CSV files. By default, it is sjis
.
See “CSV file format reference” for more information about each CSV format.
dm-import-datamapping¶
dm-import-datamapping
command imports data mapping information to the repository from the CSV file.
Usage: dm-import-datamapping [repo file] [csv file]
Options:
-E, --encoding=STRING Encoding of the CSV file (default: sjis)
--help Print this help.
repo file
should be a file name of the repository.
-E, --encoding
specifies the input encoding of the CSV files. By default, it is sjis
.
See “CSV file format reference” for more information about the CSV format.
dm-repo-cmd¶
dm-repo-cmd
command allows you to manipulate table data in the repository directly.
Usage: dm-repo-cmd [options...] [repo file | connetion string] [cmd] [args...]
Commands:
init
ls
rm <db.schema.table>
Options:
--help Print this help.
init
initializes the repository.
ls
shows a list of table names (in db.schema.table form) in the repository.
rm
removes table data in the repository by supplying table names (in db.schema.table form).
dm-run-profiler¶
dm-run-profiler
command connects to the database, collects metadata and data profiles, and store those results in the repository. And validates the data with pre-defined validation rules.
Usage: dm-run-profiler [option...] [schema[.table]] ...
Options:
--dbtype=TYPE Database type
--host=STRING Host name
--port=INTEGER Port number
--dbname=STRING Database name
--tnsname=STRING TNS name (Oracle only)
--user=STRING User name
--pass=STRING User password
--credential=STRING Credential file name (BigQuery only)
-P=INTEGER Parallel degree of table scan
-o=FILENAME Output file
--batch=FILENAME Batch execution
--enable-validation Enable record/column/SQL validations
--enable-sample-rows Enable collecting sample rows. (default)
--disable-sample-rows Disable collecting sample rows.
--skip-table-profiling Skip table (and column) profiling
--skip-column-profiling Skip column profiling
--column-profiling-threshold=INTEGER
Threshold number of rows to skip profiling
columns
--timeout=NUMBER Query timeout in seconds (default:no timeout)
--help Print this help.
--dbtype
specifies the database type. It should be oracle
, mssql
, pgsql
or mysql
. Use pgsql
with specifying the port number for Amazon Redshift.
--host
specifies a host name to connect to the database.
--port
specifies a port number to connect to the database.
--dbname
specifies the database name to connect.
--tnsname
specifies a TNS name when connecting with TNS name. (Oracle only)
--user
specifies an user name to connect to the database.
--pass
specifies the password to connect to the database.
--credential
specifies a credential file (json) for connecting to the BigQuery database.
-P
specifies the degree of parallel scan.
-o
specifies a file name of the repository.
--batch
specifies a file name containing multiple table names for batch processing.
--enable-validation
enables the data validation.
--enable-sample-rows
enables collecting sample records (up to 10 records). (default)
--disable-sample-rows
disables collecting sample records.
--skip-table-profiling
disables profiling tables and columns.
--skip-column-profiling
disables profiling columns.
--column-profiling-threshold
specifies max number of table records to perform column profiling.
--timeout
specifies query timeout in seconds. If query execution exeeds this parameter, the query will be cancelled and profiling the table will fail.
dm-run-server¶
dm-run-server
command launches a web server to accept accessing to the repository data through the network.
It allows you to
- View the repository data without exporting as HTML files.
- View changes in the repository immediately.
- Edit several data in the repository (cf. comments, tags, etc) directly using the browser.
Usage: dm-run-server [repo file] [port]
Options:
--help Print this help.
repo file
should be a file name of the repository.
port
should be a port number to connect to the server. (by default, it is 8080.)
dm-verify-results¶
dm-verify-results
verifies the data condition by scanning validation results in the repository.
It exits with the exit code 1
if there are invalid results, otherwise 0
.
Usage: dm-verify-results [repo file]
Options:
--help Print this help.
repo file
should be a file name of the repository.