.. _Configtoml: The Config.toml File ==================== Rather than passing individual parameters when starting Driverless AI, admins can instead reference a config.toml file. This file includes all possible configuration options that would otherwise be specified in the ``nvidia-docker run`` command. Simply place this file in a folder on the container (for example, in /tmp), then set the desired environment variables. After all of the environment variables are set, start Driverless AI using the following command: :: nvidia-docker run \ --rm \ -u `id -u`:`id -g` \ -e DRIVERLESS_AI_CONFIG_FILE_PATH="tmp/config.toml" \ -v `pwd`/data:/data \ -v `pwd`/log:/log \ -v `pwd`/license:/license \ -v `pwd`/tmp:/tmp \ opsh2oai/h2oai-runtime Sample config.toml File ----------------------- :: # ---------------------------------------------------------------------------- # DRIVERLESS AI CONFIGURATION FILE # # # This file is authored in TOML (see https://github.com/toml-lang/toml) # # The variables in this file can be overriden by corresponding environment # variables named DRIVERLESS_AI_* (e.g. "max_cores" can be overridden by # the environment variable "DRIVERLESS_AI_MAX_CORES"). # # ---------------------------------------------------------------------------- # IP address and port for Driverless AI HTTP server. ip = "127.0.0.1" port = 12345 # Max number of CPU cores to use per experiment. Set to <= 0 to use all cores. max_cores = 0 # Number of GPUs to use per model training task. Set to -1 for all GPUs. # Currently n_gpus!=1 disables GPU locking, so is only recommended for single experiments and single users. # Ignored if GPUs disabled or no GPUs on system. num_gpus = 1 # Which gpu_id to start with # If use CUDA_VISIBLE_DEVICES=... to control GPUs, gpu_id=0 is still the first in that list of devices. # E.g. if CUDA_VISIBLE_DEVICES="4,5" then gpu_id_start=0 will refer to the device #4. gpu_id_start = 0 # Maximum number of workers for DriverlessAI server pool (only 1 needed currently) max_workers = 1 # Minimum amount of disk space in GB needed to run experiments. # Experiments will fail if this limit is crossed. disk_limit_gb = 5 # Minimum amount of system memory in GB needed to start experiments memory_limit_gb = 5 # IP address and port of process proxy. process_server_ip = "127.0.0.1" process_server_port = 8080 # IP address and port of H2O instance. h2o_ip = "127.0.0.1" h2o_port = 54321 # Data directory. All application data and files related datasets and experiments # are stored in this directory. data_directory = "./tmp" # Start HTTP server in debug mode (DO NOT enable in production). debug = false # Whether to run quick performance benchmark at start of application and each experiment enable_benchmark = false # Minimum number of rows needed to run experiments (values lower than 100 might not work) min_num_rows = 100 # Internal threshold for number of rows to trigger certain statistical techniques to increase statistical fidelity statistical_threshold_num_rows_small = 10000 # Internal threshold for number of rows to trigger certain statistical techniques that can speed up modeling statistical_threshold_num_rows_large = 1000000 # Maximum number of columns max_cols = 10000 # Threshold of rows * columns for which GPUs are disabled for speed purposes gpu_small_data_size = 100000 # Maximum number of uniques allowed in fold column max_fold_uniques = 100000 # Maximum number of classes max_num_classes = 100 # Minimum allowed seconds for time column min_time_value = 5e8 # ~ > 1986 # Minimum number of rows above which try to detect time series min_rows_detected_time = 10000 # relative standard deviation of hold-out score below which early stopping is tirggered for accuracy~5 stop_early_rel_std = 1e-3 # Variable importance below which feature is dropped (with possible replacement found that is better) # This also sets overall scale for lower interpretability thresholds varimp_threshold_at_interpretability_10 = 0.05 # Maximum number of GBM trees (early-stopping usually chooses much less) max_ntrees = 2000 # Authentication # unvalidated : Accepts user id and password, does not validate password # none : Does not ask for user id or password, authenticated as admin # pam : Accepts user id and password, Validates against user against the operating system # ldap : Accepts user id and password, Validates against an ldap server, look for additional settings under LDAP settings authentication_method = "unvalidated" # LDAP Settings ldap_server = "" ldap_port = "" ldap_dc = "" # Supported file formats (file name endings must match for files to show up in file browser): a comma separated list supported_file_types = "csv, tsv, txt, dat, tgz, zip, xz, xls, xlsx" # File System Support # Format: "file_system_1, file_system_2, file_system_3" # Allowed file systems: # file : local file system/server file system # hdfs : Hadoop file system, remember to configure the hadoop coresite and keytab below # s3 : Amazon S3, optionally configure secret and access key below enabled_file_systems = "file, hdfs, s3" # Configurations for a HDFS data source # Path of hdfs coresite.xml core_site_xml_path = "" # Path of the principal key tab file key_tab_path = "" # HDFS connector # Auth type can be Principal/keytab/keytabPrincipal # Specify HDFS Auth Type, allowed options are: # noauth : No authentication needed # principal : Authenticate with HDFS with a principal user # keytab : Authenticate with a Key tab (recommended) # keytabimpersonation : Login with impersonation using a keytab hdfs_auth_type = "noauth" # Kerberos app principal user (recommended) hdfs_app_principal_user = "" # Specify the user id of the current user here as user@realm hdfs_app_login_user = "" # hdfs_app_jvm_args = "" # AWS authentication settings # True : Authenticated connection # False : Unverified connection aws_auth = "False" # S3 Connector credentials aws_access_key_id = "" aws_secret_access_key = ""