CSM-DATA¶
Objective
- Understand what the csm-data CLI is and its capabilities
- Learn how to use the various command groups for different data management tasks
- Explore common use cases and workflows
- Master integration with CosmoTech platform services
What is csm-data?¶
csm-data
is a powerful Command Line Interface (CLI) bundled inside the CosmoTech Acceleration Library (CoAL). It provides a comprehensive set of commands designed to streamline interactions with various services used within a CosmoTech platform.
The CLI is organized into several command groups, each focused on specific types of data operations:
- api: Commands for interacting with the CosmoTech API
- store: Commands for working with the CoAL datastore
- s3-bucket-*: Commands for S3 bucket operations (download, upload, delete)
- adx-send-runnerdata: Command for sending runner data to Azure Data Explorer
- az-storage-upload: Command for uploading to Azure Storage
Getting Help
You can get detailed help for any command using the --help
flag:
csm-data --help
csm-data api --help
csm-data api run-load-data --help
Why use csm-data?¶
Standardized Interactions¶
The csm-data
CLI provides tested, standardized interactions with multiple services used in CosmoTech simulations. This eliminates the need to:
- Write custom code for common data operations
- Handle authentication and connection details for each service
- Manage error handling and retries
- Deal with format conversions between services
Environment Variable Support¶
Most commands support environment variables, making them ideal for:
- Integration with orchestration tools like
csm-orc
- Use in Docker containers and cloud environments
- Secure handling of credentials and connection strings
- Consistent configuration across development and production
Workflow Automation¶
The commands are designed to work together in data processing pipelines, enabling you to:
- Download data from various sources
- Transform and process the data
- Store results in different storage systems
- Send data to visualization and analysis services
Command Groups and Use Cases¶
API Commands¶
The api
command group facilitates interaction with the CosmoTech API, allowing you to work with scenarios, datasets, and other API resources.
Runner Data Management¶
Download run data | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
This command: - Downloads scenario parameters and datasets from the CosmoTech API - Writes parameters as JSON and/or CSV files - Fetches associated datasets
Common Use Case
This command is particularly useful in container environments where you need to initialize your simulation with data from the platform. The environment variables are typically set by the platform when launching the container.
Twin Data Layer Operations¶
Load files to Twin Data Layer | |
---|---|
1 2 3 4 5 |
|
Send files to Twin Data Layer | |
---|---|
1 2 3 4 5 |
|
These commands facilitate working with the Twin Data Layer, allowing you to: - Load data from the Twin Data Layer to local files - Send local files to the Twin Data Layer
Storage Commands¶
The s3-bucket-*
commands provide a simple interface for working with S3-compatible storage:
Download from S3 bucket | |
---|---|
1 2 3 4 5 6 7 |
|
Upload to S3 bucket | |
---|---|
1 2 3 4 5 6 7 |
|
Delete from S3 bucket | |
---|---|
1 2 3 4 5 6 |
|
Environment Variables
All these commands support environment variables for credentials and connection details, making them secure and easy to use in automated workflows:
export AWS_ENDPOINT_URL="https://s3.example.com"
export AWS_ACCESS_KEY_ID="access-key-id"
export AWS_SECRET_ACCESS_KEY="secret-access-key"
export CSM_DATA_BUCKET_NAME="my-bucket"
Azure Data Explorer Integration¶
The adx-send-runnerdata
command enables sending runner data to Azure Data Explorer for analysis and visualization:
Send runner data to ADX | |
---|---|
1 2 3 4 5 6 7 8 9 |
|
This command:
- Creates tables in ADX based on CSV files in the dataset and/or parameters folders
- Ingests the data into those tables
- Adds a run
column with the runner ID for tracking
- Optionally waits for ingestion to complete
Table Creation
This command will create tables in ADX based on the CSV file names and headers. Ensure your CSV files have appropriate headers and follow naming conventions suitable for ADX tables.
Datastore Commands¶
The store
command group provides tools for working with the CoAL datastore:
Load CSV folder into datastore | |
---|---|
1 2 3 |
|
Dump datastore to S3 | |
---|---|
1 2 3 4 5 6 |
|
These commands allow you to: - Load data from CSV files into the datastore - Dump datastore contents to various destinations (S3, Azure, PostgreSQL) - List tables in the datastore - Reset the datastore
Common Workflows and Integration Patterns¶
Runner Data Processing Pipeline¶
A common workflow combines multiple commands to create a complete data processing pipeline:
Complete data processing pipeline | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
Integration with csm-orc¶
The csm-data
commands integrate seamlessly with csm-orc
for orchestration:
run.json for csm-orc | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
Best Practices and Tips¶
Environment Variables
Use environment variables for sensitive information and configuration that might change between environments:
# API connection
export CSM_ORGANIZATION_ID="o-organization"
export CSM_WORKSPACE_ID="w-workspace"
export CSM_SCENARIO_ID="s-scenario"
# Paths
export CSM_DATASET_ABSOLUTE_PATH="/path/to/dataset"
export CSM_PARAMETERS_ABSOLUTE_PATH="/path/to/parameters"
# ADX connection
export AZURE_DATA_EXPLORER_RESOURCE_URI="https://adx.example.com"
export AZURE_DATA_EXPLORER_RESOURCE_INGEST_URI="https://ingest-adx.example.com"
export AZURE_DATA_EXPLORER_DATABASE_NAME="my-database"
Error Handling
Most commands will exit with a non-zero status code on failure, making them suitable for use in scripts and orchestration tools that check exit codes.
Logging
Control the verbosity of logging with the --log-level
option:
csm-data --log-level debug api run-load-data ...
Extending csm-data¶
If the existing commands don't exactly match your needs, you have several options:
- Use as a basis: Examine the code of similar commands and use it as a starting point for your own scripts
- Combine commands: Use shell scripting to combine multiple commands into a custom workflow
- Environment variables: Customize behavior through environment variables without modifying the code
- Contribute: Consider contributing enhancements back to the CoAL project
Conclusion¶
The csm-data
CLI provides a powerful set of tools for managing data in CosmoTech platform environments. By leveraging these commands, you can:
- Streamline interactions with platform services
- Automate data processing workflows
- Integrate with orchestration tools
- Focus on your simulation logic rather than data handling
Whether you're developing locally or deploying to production, csm-data
offers a consistent interface for your data management needs.