Command references¶
dm-attach-file¶
dm-attach-file command attaches or removes files to/from the data sets (tags/schemas) and tables. Also, it shows a list of the files attached to the data sets/tables.
Usage: dm-attach-file [repo file] [target] [command]
Targets:
tag:[tag label]
schema:[schema name]
table:[table name]
Commands:
list
add [file]
rm [file]
Options:
--help Print this help.
repo file should be a file name of the repository.
target specifies the target data set or table with tag, schema or table with its name.
command specifies the action for the data set or table with list, add or rm with a file name.
dm-dump-xls¶
dm-dump-xls command dumps Excel sheets in the CSV format.
It takes a file name of the Excel book and a name of the sheet or an index of the sheet (1,2,3…) to be converted, and dumps the sheet to STDOUT in the CSV format.
Usage: dm-dump-xls <filename.xls> [<sheet name> | <sheet index>]
Options:
-e STRING Output encoding (default: utf-8)
--help Print this help.
-e specifies the encoding of the output CSV data.
dm-export-repo¶
dm-export-repo exports metadata, statistics and other supplimental data stored in the repository as a data catalog in specific format (c.f. HTML).
The repository data can be exported in the following forms.
- HTML files
- CSV files
- JSON files
Usage: dm-export-repo [options...] [repo file] [output directory]
Options:
--format <STRING> Output format. (html, json or csv)
--help Print this help.
Options for HTML format:
--tags <TAG>[,<TAG>] Tag names to be shown on the top page.
--schemas <SCHEMA>[,<SCHEMA>] Schema names to be shown on the top page.
--template <STRING> Directory name for template files.
Options for CSV format:
--encoding <STRING> Character encoding for output files.
repo file should be a file name of the repository.
output directory should be a directory name for the output.
--format specifies the output format. By default, html.
--tags specifies the tag names which must be appeared on the home page of the data catalog. (by default, tag names will be sorted by the name)
--schemas specifies the schema names which must be appeared on the home page of the data catalog. (by default, schema names will be sorted by the name)
--templates specifies a directory name which contains the template files for generating html files for data catalog.
--encoding specifies character encoding for the output csv files. By default, utf-8.
dm-import-csv¶
dm-import-csv command imports supplimental metadata of the tables/columns and other additional information from csv files.
It can import following CSV files.
- Table Metadata CSV
- Column Metadata CSV
- Schema Comment CSV
- Tag Comment CSV
- Business Glossary CSV
- Data Validation Rule CSV
Usage: dm-import-csv [repo file] [csv file]
Options:
-E, --encoding=STRING Encoding of the CSV file (default: sjis)
--help Print this help.
repo file should be a file name of the repository.
-E, --encoding specifies the input encoding of the CSV files. By default, it is sjis.
See “CSV file format reference” for more information about each CSV format.
dm-import-datamapping¶
dm-import-datamapping command imports data mapping information to the repository from the CSV file.
Usage: dm-import-datamapping [repo file] [csv file]
Options:
-E, --encoding=STRING Encoding of the CSV file (default: sjis)
--help Print this help.
repo file should be a file name of the repository.
-E, --encoding specifies the input encoding of the CSV files. By default, it is sjis.
See “CSV file format reference” for more information about the CSV format.
dm-repo-cmd¶
dm-repo-cmd command allows you to manipulate table data in the repository directly.
Usage: dm-repo-cmd [options...] [repo file | connetion string] [cmd] [args...]
Commands:
init
ls
rm <db.schema.table>
Options:
--help Print this help.
init initializes the repository.
ls shows a list of table names (in db.schema.table form) in the repository.
rm removes table data in the repository by supplying table names (in db.schema.table form).
dm-run-profiler¶
dm-run-profiler command connects to the database, collects metadata and data profiles, and store those results in the repository. And validates the data with pre-defined validation rules.
Usage: dm-run-profiler [option...] [schema[.table]] ...
Options:
--dbtype=TYPE Database type
--host=STRING Host name
--port=INTEGER Port number
--dbname=STRING Database name
--tnsname=STRING TNS name (Oracle only)
--user=STRING User name
--pass=STRING User password
--credential=STRING Credential file name (BigQuery only)
-P=INTEGER Parallel degree of table scan
-o=FILENAME Output file
--batch=FILENAME Batch execution
--enable-validation Enable record/column/SQL validations
--enable-sample-rows Enable collecting sample rows. (default)
--disable-sample-rows Disable collecting sample rows.
--skip-table-profiling Skip table (and column) profiling
--skip-column-profiling Skip column profiling
--column-profiling-threshold=INTEGER
Threshold number of rows to skip profiling
columns
--timeout=NUMBER Query timeout in seconds (default:no timeout)
--help Print this help.
--dbtype specifies the database type. It should be oracle, mssql, pgsql or mysql. Use pgsql with specifying the port number for Amazon Redshift.
--host specifies a host name to connect to the database.
--port specifies a port number to connect to the database.
--dbname specifies the database name to connect.
--tnsname specifies a TNS name when connecting with TNS name. (Oracle only)
--user specifies an user name to connect to the database.
--pass specifies the password to connect to the database.
--credential specifies a credential file (json) for connecting to the BigQuery database.
-P specifies the degree of parallel scan.
-o specifies a file name of the repository.
--batch specifies a file name containing multiple table names for batch processing.
--enable-validation enables the data validation.
--enable-sample-rows enables collecting sample records (up to 10 records). (default)
--disable-sample-rows disables collecting sample records.
--skip-table-profiling disables profiling tables and columns.
--skip-column-profiling disables profiling columns.
--column-profiling-threshold specifies max number of table records to perform column profiling.
--timeout specifies query timeout in seconds. If query execution exeeds this parameter, the query will be cancelled and profiling the table will fail.
dm-run-server¶
dm-run-server command launches a web server to accept accessing to the repository data through the network.
It allows you to
- View the repository data without exporting as HTML files.
- View changes in the repository immediately.
- Edit several data in the repository (cf. comments, tags, etc) directly using the browser.
Usage: dm-run-server [repo file] [port]
Options:
--help Print this help.
repo file should be a file name of the repository.
port should be a port number to connect to the server. (by default, it is 8080.)
dm-verify-results¶
dm-verify-results verifies the data condition by scanning validation results in the repository.
It exits with the exit code 1 if there are invalid results, otherwise 0.
Usage: dm-verify-results [repo file]
Options:
--help Print this help.
repo file should be a file name of the repository.