Skip to main content
Version: devel

Configuration Reference

This page contains a reference of most configuration options and objects available in DLT.

Destination Configurations

AthenaClientConfiguration

Configuration for the Athena destination

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - AwsCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • staging_config - DestinationClientStagingConfiguration
    configuration of the staging, if present, injected at runtime
  • truncate_tables_on_staging_destination_before_load - bool
    If dlt should truncate the tables on staging destination before loading data.
  • query_result_bucket - str
  • athena_work_group - str
  • aws_data_catalog - str
  • connection_params - typing.Dict[str, typing.Any]
  • force_iceberg - bool
  • table_location_layout - str
  • table_properties - typing.Dict[str, str]
  • db_location - str

BigQueryClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - GcpServiceAccountCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • staging_config - DestinationClientStagingConfiguration
    configuration of the staging, if present, injected at runtime
  • truncate_tables_on_staging_destination_before_load - bool
    If dlt should truncate the tables on staging destination before loading data.
  • location - str
  • project_id - str
    Note, that this is BigQuery project_id which could be different from credentials.project_id
  • has_case_sensitive_identifiers - bool
    If True then dlt expects to load data into case sensitive dataset
  • should_set_case_sensitivity_on_new_dataset - bool
    If True, dlt will set case sensitivity flag on created datasets that corresponds to naming convention
  • http_timeout - float
    connection timeout for http request to BigQuery api
  • file_upload_timeout - float
    a timeout for file upload when loading local files
  • retry_deadline - float
    How long to retry the operation in case of error, the backoff 60 s.
  • batch_size - int
    Number of rows in streaming insert batch
  • autodetect_schema - bool
    Allow BigQuery to autodetect schemas and create data tables
  • ignore_unknown_values - bool
    Ignore unknown values in the data

ClickHouseClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - ClickHouseCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • staging_config - DestinationClientStagingConfiguration
    configuration of the staging, if present, injected at runtime
  • truncate_tables_on_staging_destination_before_load - bool
    If dlt should truncate the tables on staging destination before loading data.
  • dataset_table_separator - str
    Separator for dataset table names, defaults to '___', i.e. 'database.dataset___table'.
  • table_engine_type - merge_tree | shared_merge_tree | replicated_merge_tree
    The default table engine to use. Defaults to merge_tree. Other implemented options are shared_merge_tree and replicated_merge_tree.
  • dataset_sentinel_table_name - str
    Special table to mark dataset as existing
  • staging_use_https - bool
    Connect to the staging buckets via https

CustomDestinationClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - CredentialsConfiguration
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • destination_callable - str | typing.Callable[[typing.Union[typing.Any, typing.List[typing.Any], str], dlt.common.schema.typing.TTableSchema], NoneType]
  • loader_file_format - jsonl | typed-jsonl | insert_values | parquet | csv | reference | model
  • batch_size - int
  • skip_dlt_columns_and_tables - bool
  • max_table_nesting - int

DatabricksClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - DatabricksCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • staging_config - DestinationClientStagingConfiguration
    configuration of the staging, if present, injected at runtime
  • truncate_tables_on_staging_destination_before_load - bool
    If dlt should truncate the tables on staging destination before loading data.
  • staging_credentials_name - str
  • is_staging_external_location - bool
    If true, the temporary credentials are not propagated to the COPY command
  • staging_volume_name - str
    Name of the Databricks managed volume for temporary storage, e.g., catalog_name.database_name.volume_name. Defaults to '_dlt_temp_load_volume' if not set.
  • keep_staged_files - bool
    Tells if to keep the files in internal (volume) stage

DestinationClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - CredentialsConfiguration
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod

DestinationClientDwhConfiguration

Configuration of a destination that supports datasets/schemas

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - CredentialsConfiguration
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.

DestinationClientDwhWithStagingConfiguration

Configuration of a destination that can take data from staging destination

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - CredentialsConfiguration
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • staging_config - DestinationClientStagingConfiguration
    configuration of the staging, if present, injected at runtime
  • truncate_tables_on_staging_destination_before_load - bool
    If dlt should truncate the tables on staging destination before loading data.

DestinationClientStagingConfiguration

Configuration of a staging destination, able to store files with desired layout at bucket_url.

Also supports datasets and can act as standalone destination.

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - CredentialsConfiguration
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • as_staging_destination - bool
  • bucket_url - str
  • layout - str

DremioClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - DremioCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • staging_config - DestinationClientStagingConfiguration
    configuration of the staging, if present, injected at runtime
  • truncate_tables_on_staging_destination_before_load - bool
    If dlt should truncate the tables on staging destination before loading data.
  • staging_data_source - str
    The name of the staging data source

DuckDbClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - DuckDbCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • staging_config - DestinationClientStagingConfiguration
    configuration of the staging, if present, injected at runtime
  • truncate_tables_on_staging_destination_before_load - bool
    If dlt should truncate the tables on staging destination before loading data.
  • local_dir - str
  • pipeline_name - str
  • pipeline_working_dir - str
  • legacy_db_path - str
  • create_indexes - bool

DummyClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - DummyClientCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • loader_file_format - jsonl | typed-jsonl | insert_values | parquet | csv | reference | model
  • fail_schema_update - bool
  • fail_prob - float
    probability of terminal fail
  • retry_prob - float
    probability of job retry
  • completed_prob - float
    probability of successful job completion
  • exception_prob - float
    probability of exception transient exception when running job
  • timeout - float
    timeout time
  • fail_terminally_in_init - bool
    raise terminal exception in job init
  • fail_transiently_in_init - bool
    raise transient exception in job init
  • truncate_tables_on_staging_destination_before_load - bool
    truncate tables on staging destination
  • create_followup_jobs - bool
    create followup job for individual jobs
  • fail_followup_job_creation - bool
    Raise generic exception during followupjob creation
  • fail_table_chain_followup_job_creation - bool
    Raise generic exception during tablechain followupjob creation
  • create_followup_table_chain_sql_jobs - bool
    create a table chain merge job which is guaranteed to fail
  • create_followup_table_chain_reference_jobs - bool
    create table chain jobs which succeed

FilesystemConfigurationWithLocalFiles

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - AwsCredentials | GcpServiceAccountCredentials | AzureCredentialsWithoutDefaults | AzureServicePrincipalCredentialsWithoutDefaults | AzureCredentials | AzureServicePrincipalCredentials | GcpOAuthCredentials | SFTPCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • local_dir - str
  • pipeline_name - str
  • pipeline_working_dir - str
  • legacy_db_path - str
  • bucket_url - str
  • read_only - bool
    Indicates read only filesystem access. Will enable caching
  • kwargs - typing.Dict[str, typing.Any]
    Additional arguments passed to fsspec constructor ie. dict(use_ssl=True) for s3fs
  • client_kwargs - typing.Dict[str, typing.Any]
    Additional arguments passed to underlying fsspec native client ie. dict(verify="public.crt) for botocore
  • deltalake_storage_options - typing.Dict[str, typing.Any]
  • deltalake_configuration - typing.Dict[str, typing.Optional[str]]

FilesystemDestinationClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - AwsCredentials | GcpServiceAccountCredentials | AzureCredentialsWithoutDefaults | AzureServicePrincipalCredentialsWithoutDefaults | AzureCredentials | AzureServicePrincipalCredentials | GcpOAuthCredentials | SFTPCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • as_staging_destination - bool
  • bucket_url - str
  • layout - str
  • local_dir - str
  • pipeline_name - str
  • pipeline_working_dir - str
  • legacy_db_path - str
  • read_only - bool
    Indicates read only filesystem access. Will enable caching
  • kwargs - typing.Dict[str, typing.Any]
    Additional arguments passed to fsspec constructor ie. dict(use_ssl=True) for s3fs
  • client_kwargs - typing.Dict[str, typing.Any]
    Additional arguments passed to underlying fsspec native client ie. dict(verify="public.crt) for botocore
  • deltalake_storage_options - typing.Dict[str, typing.Any]
  • deltalake_configuration - typing.Dict[str, typing.Optional[str]]
  • current_datetime - class 'pendulum.datetime.DateTime' | typing.Callable[[], pendulum.datetime.DateTime]
  • extra_placeholders - typing.Dict[str, typing.Union[str, int, pendulum.datetime.DateTime, typing.Callable[[str, str, str, str, str], str]]]
  • max_state_files - int
    Maximum number of pipeline state files to keep; 0 or negative value disables cleanup.
  • always_refresh_views - bool
    Always refresh table scanner views by setting the newest table metadata or globbing table files

LanceDBClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - LanceDBCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • local_dir - str
  • pipeline_name - str
  • pipeline_working_dir - str
  • legacy_db_path - str
  • lance_uri - str
    LanceDB database URI. Defaults to local, on-disk instance.
  • dataset_separator - str
    Character for the dataset separator.
  • options - LanceDBClientOptions
    LanceDB client options.
  • embedding_model_provider - gemini-text | bedrock-text | cohere | gte-text | imagebind | instructor | open-clip | openai | sentence-transformers | huggingface | colbert | ollama
    Embedding provider used for generating embeddings. Default is "cohere". You can find the full list of
  • embedding_model_provider_host - str
    Full host URL with protocol and port (e.g. 'http://localhost:11434'). Uses LanceDB's default if not specified, assuming the provider accepts this parameter.
  • embedding_model - str
    The model used by the embedding provider for generating embeddings.
  • embedding_model_dimensions - int
    The dimensions of the embeddings generated. In most cases it will be automatically inferred, by LanceDB,
  • vector_field_name - str
    Name of the special field to store the vector embeddings.
  • sentinel_table_name - str
    Name of the sentinel table that encapsulates datasets. Since LanceDB has no

MotherDuckClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - MotherDuckCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • staging_config - DestinationClientStagingConfiguration
    configuration of the staging, if present, injected at runtime
  • truncate_tables_on_staging_destination_before_load - bool
    If dlt should truncate the tables on staging destination before loading data.
  • create_indexes - bool

MsSqlClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - MsSqlCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • staging_config - DestinationClientStagingConfiguration
    configuration of the staging, if present, injected at runtime
  • truncate_tables_on_staging_destination_before_load - bool
    If dlt should truncate the tables on staging destination before loading data.
  • create_indexes - bool
  • has_case_sensitive_identifiers - bool

PostgresClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - PostgresCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • staging_config - DestinationClientStagingConfiguration
    configuration of the staging, if present, injected at runtime
  • truncate_tables_on_staging_destination_before_load - bool
    If dlt should truncate the tables on staging destination before loading data.
  • create_indexes - bool
  • csv_format - CsvFormatConfiguration
    Optional csv format configuration

QdrantClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - QdrantCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • local_dir - str
  • pipeline_name - str
  • pipeline_working_dir - str
  • legacy_db_path - str
  • qd_location - str
  • qd_path - str
    Persistence path for QdrantLocal. Default: None
  • dataset_separator - str
  • embedding_batch_size - int
  • embedding_parallelism - int
  • upload_batch_size - int
  • upload_parallelism - int
  • upload_max_retries - int
  • options - QdrantClientOptions
  • model - str

RedshiftClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - RedshiftCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • staging_config - DestinationClientStagingConfiguration
    configuration of the staging, if present, injected at runtime
  • truncate_tables_on_staging_destination_before_load - bool
    If dlt should truncate the tables on staging destination before loading data.
  • create_indexes - bool
  • csv_format - CsvFormatConfiguration
    Optional csv format configuration
  • staging_iam_role - str
  • has_case_sensitive_identifiers - bool

SnowflakeClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - SnowflakeCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • staging_config - DestinationClientStagingConfiguration
    configuration of the staging, if present, injected at runtime
  • truncate_tables_on_staging_destination_before_load - bool
    If dlt should truncate the tables on staging destination before loading data.
  • stage_name - str
    Use an existing named stage instead of the default. Default uses the implicit table stage per table
  • keep_staged_files - bool
    Whether to keep or delete the staged files after COPY INTO succeeds
  • csv_format - CsvFormatConfiguration
    Optional csv format configuration
  • query_tag - str
    A tag with placeholders to tag sessions executing jobs
  • create_indexes - bool
    Whether UNIQUE or PRIMARY KEY constrains should be created
  • use_vectorized_scanner - bool
    Whether to use or not use the vectorized scanner in COPY INTO

SqlalchemyClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - SqlalchemyCredentials
    SQLAlchemy connection string
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • create_unique_indexes - bool
    Whether UNIQUE constrains should be created
  • create_primary_keys - bool
    Whether PRIMARY KEY constrains should be created
  • engine_args - typing.Dict[str, typing.Any]
    Additional arguments passed to sqlalchemy.create_engine

SynapseClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - SynapseCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • staging_config - DestinationClientStagingConfiguration
    configuration of the staging, if present, injected at runtime
  • truncate_tables_on_staging_destination_before_load - bool
    If dlt should truncate the tables on staging destination before loading data.
  • create_indexes - bool
    Whether primary_key and unique column hints are applied.
  • has_case_sensitive_identifiers - bool
  • default_table_index_type - heap | clustered_columnstore_index
  • staging_use_msi - bool
    Whether the managed identity of the Synapse workspace is used to authorize access to the staging Storage Account.

WeaviateClientConfiguration

None

  • destination_type - str
    Type of this destination, e.g. postgres or duckdb
  • credentials - WeaviateCredentials
    Credentials for this destination
  • destination_name - str
    Name of the destination, e.g. my_postgres or my_duckdb, will be the same as destination_type if not set
  • environment - str
    Environment of the destination, e.g. dev or prod
  • dataset_name - str
    dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefix
  • default_schema_name - str
    name of default schema to be used to name effective dataset to load data to
  • replace_strategy - truncate-and-insert | insert-from-staging | staging-optimized
    How to handle replace disposition for this destination, uses first strategy from caps if not declared
  • staging_dataset_name_layout - str
    Layout for staging dataset, where %s is replaced with dataset name. placeholder is optional
  • enable_dataset_name_normalization - bool
    Whether to normalize the dataset name. Affects staging dataset as well.
  • info_tables_query_threshold - int
    Threshold for information schema tables query, if exceeded tables will be filtered in code.
  • batch_size - int
  • batch_workers - int
  • batch_consistency - ONE | QUORUM | ALL
  • batch_retries - int
  • conn_timeout - float
  • read_timeout - float
  • startup_period - int
  • dataset_separator - str
  • vectorizer - str
  • module_config - typing.Dict[str, typing.Dict[str, str]]

Credential Configurations

AwsCredentials

None

  • aws_access_key_id - str
  • aws_secret_access_key - str
  • aws_session_token - str
  • profile_name - str
  • region_name - str
  • endpoint_url - str
  • s3_url_style - str
    Only needed for duckdb sql_client s3 access, for minio this needs to be set to path for example.

AwsCredentialsWithoutDefaults

None

  • aws_access_key_id - str
  • aws_secret_access_key - str
  • aws_session_token - str
  • profile_name - str
  • region_name - str
  • endpoint_url - str
  • s3_url_style - str
    Only needed for duckdb sql_client s3 access, for minio this needs to be set to path for example.

AzureCredentials

None

  • azure_storage_account_name - str
  • azure_account_host - str
    Alternative host when accessing blob storage endpoint ie. my_account.dfs.core.windows.net
  • azure_storage_account_key - str
  • azure_storage_sas_token - str
  • azure_sas_token_permissions - str
    Permissions to use when generating a SAS token. Ignored when sas token is provided directly

AzureCredentialsBase

None

  • azure_storage_account_name - str
  • azure_account_host - str
    Alternative host when accessing blob storage endpoint ie. my_account.dfs.core.windows.net

AzureCredentialsWithoutDefaults

Credentials for Azure Blob Storage, compatible with adlfs

  • azure_storage_account_name - str
  • azure_account_host - str
    Alternative host when accessing blob storage endpoint ie. my_account.dfs.core.windows.net
  • azure_storage_account_key - str
  • azure_storage_sas_token - str
  • azure_sas_token_permissions - str
    Permissions to use when generating a SAS token. Ignored when sas token is provided directly

AzureServicePrincipalCredentials

None

  • azure_storage_account_name - str
  • azure_account_host - str
    Alternative host when accessing blob storage endpoint ie. my_account.dfs.core.windows.net
  • azure_tenant_id - str
  • azure_client_id - str
  • azure_client_secret - str

AzureServicePrincipalCredentialsWithoutDefaults

None

  • azure_storage_account_name - str
  • azure_account_host - str
    Alternative host when accessing blob storage endpoint ie. my_account.dfs.core.windows.net
  • azure_tenant_id - str
  • azure_client_id - str
  • azure_client_secret - str

ClickHouseCredentials

None

  • drivername - str
  • database - str
    database connect to. Defaults to 'default'.
  • password - str
  • username - str
    Database user. Defaults to 'default'.
  • host - str
    Host with running ClickHouse server.
  • port - int
    Native port ClickHouse server is bound to. Defaults to 9440.
  • query - typing.Dict[str, typing.Any]
  • http_port - int
    HTTP Port to connect to ClickHouse server's HTTP interface.
  • secure - 0 | 1
    Enables TLS encryption when connecting to ClickHouse Server. 0 means no encryption, 1 means encrypted.
  • connect_timeout - int
    Timeout for establishing connection. Defaults to 10 seconds.
  • send_receive_timeout - int
    Timeout for sending and receiving data. Defaults to 300 seconds.

ConnectionStringCredentials

None

  • drivername - str
  • database - str
  • password - str
  • username - str
  • host - str
  • port - int
  • query - typing.Dict[str, typing.Any]

CredentialsConfiguration

Base class for all credentials. Credentials are configurations that may be stored only by providers supporting secrets.

DatabricksCredentials

None

  • catalog - str
  • server_hostname - str
  • http_path - str
  • access_token - str
  • client_id - str
  • client_secret - str
  • http_headers - typing.Dict[str, str]
  • session_configuration - typing.Dict[str, typing.Any]
    Dict of session parameters that will be passed to databricks.sql.connect
  • connection_parameters - typing.Dict[str, typing.Any]
    Additional keyword arguments that are passed to databricks.sql.connect
  • socket_timeout - int
  • user_agent_entry - str

DremioCredentials

None

  • drivername - str
  • database - str
  • password - str
  • username - str
  • host - str
  • port - int
  • query - typing.Dict[str, typing.Any]

DuckDbBaseCredentials

None

  • drivername - str
  • database - str
  • password - str
  • username - str
  • host - str
  • port - int
  • query - typing.Dict[str, typing.Any]
  • read_only - bool

DuckDbCredentials

None

  • drivername - str
  • database - str
  • password - str
  • username - str
  • host - str
  • port - int
  • query - typing.Dict[str, typing.Any]
  • read_only - bool

DummyClientCredentials

None

GcpCredentials

None

  • token_uri - str
  • auth_uri - str
  • project_id - str

GcpDefaultCredentials

None

  • token_uri - str
  • auth_uri - str
  • project_id - str

GcpOAuthCredentials

None

  • client_id - str
  • client_secret - str
  • refresh_token - str
  • scopes - typing.List[str]
  • token - str
    Access token
  • token_uri - str
  • auth_uri - str
  • project_id - str
  • client_type - str

GcpOAuthCredentialsWithoutDefaults

None

  • client_id - str
  • client_secret - str
  • refresh_token - str
  • scopes - typing.List[str]
  • token - str
    Access token
  • token_uri - str
  • auth_uri - str
  • project_id - str
  • client_type - str

GcpServiceAccountCredentials

None

  • token_uri - str
  • auth_uri - str
  • project_id - str
  • private_key - str
  • private_key_id - str
  • client_email - str
  • type - str

GcpServiceAccountCredentialsWithoutDefaults

None

  • token_uri - str
  • auth_uri - str
  • project_id - str
  • private_key - str
  • private_key_id - str
  • client_email - str
  • type - str

LanceDBCredentials

None

  • uri - str
  • api_key - str
    API key for the remote connections (LanceDB cloud).
  • embedding_model_provider_api_key - str
    API key for the embedding model provider.

MotherDuckCredentials

None

  • drivername - str
  • database - str
  • password - str
  • username - str
  • host - str
  • port - int
  • query - typing.Dict[str, typing.Any]
  • read_only - bool
  • custom_user_agent - str

MsSqlCredentials

None

  • drivername - str
  • database - str
  • password - str
  • username - str
  • host - str
  • port - int
  • query - typing.Dict[str, typing.Any]
  • connect_timeout - int
  • driver - str

OAuth2Credentials

None

  • client_id - str
  • client_secret - str
  • refresh_token - str
  • scopes - typing.List[str]
  • token - str
    Access token

PostgresCredentials

None

  • drivername - str
  • database - str
  • password - str
  • username - str
  • host - str
  • port - int
  • query - typing.Dict[str, typing.Any]
  • connect_timeout - int
  • client_encoding - str

QdrantCredentials

None

  • location - str
  • api_key - str
    # API key for authentication in Qdrant Cloud. Default: None
  • path - str

RedshiftCredentials

None

  • drivername - str
  • database - str
  • password - str
  • username - str
  • host - str
  • port - int
  • query - typing.Dict[str, typing.Any]
  • connect_timeout - int
  • client_encoding - str

SFTPCredentials

Credentials for SFTP filesystem, compatible with fsspec SFTP protocol.

Authentication is attempted in the following order of priority:

  • key_filename may contain OpenSSH public certificate paths as well as regular private-key paths; when files ending in -cert.pub are found, they are assumed to match a private key, and both components will be loaded.

  • Any key found through an SSH agent: any “id_rsa”, “id_dsa”, or “id_ecdsa” key discoverable in ~/.ssh/.

  • Plain username/password authentication, if a password was provided.

  • If a private key requires a password to unlock it, and a password is provided, that password will be used to attempt to unlock the key.

For more information about parameters: https://docs.paramiko.org/en/3.3/api/client.html#paramiko.client.SSHClient.connect

  • sftp_port - int
  • sftp_username - str
  • sftp_password - str
  • sftp_key_filename - str
  • sftp_key_passphrase - str
  • sftp_timeout - float
  • sftp_banner_timeout - float
  • sftp_auth_timeout - float
  • sftp_channel_timeout - float
  • sftp_allow_agent - bool
  • sftp_look_for_keys - bool
  • sftp_compress - bool
  • sftp_gss_auth - bool
  • sftp_gss_kex - bool
  • sftp_gss_deleg_creds - bool
  • sftp_gss_host - str
  • sftp_gss_trust_dns - bool

SnowflakeCredentials

None

  • drivername - str
  • database - str
  • password - str
  • username - str
  • host - str
  • port - int
  • query - typing.Dict[str, typing.Any]
  • warehouse - str
  • role - str
  • authenticator - str
  • token - str
  • private_key - str
  • private_key_path - str
  • private_key_passphrase - str
  • application - str

SqlalchemyCredentials

None

  • drivername - str
  • database - str
  • password - str
  • username - str
  • host - str
  • port - int
  • query - typing.Dict[str, typing.Any]
  • engine_args - typing.Dict[str, typing.Any]
    Additional arguments passed to sqlalchemy.create_engine

SynapseCredentials

None

  • drivername - str
  • database - str
  • password - str
  • username - str
  • host - str
  • port - int
  • query - typing.Dict[str, typing.Any]
  • connect_timeout - int
  • driver - str

WeaviateCredentials

None

  • url - str
  • api_key - str
  • additional_headers - typing.Dict[str, str]

All other Configurations

BaseConfiguration

None

ConfigProvidersConfiguration

None

ConfigSectionContext

None

  • in_container - bool
    Current container, if None then not injected
  • extras_added - bool
    Tells if extras were already added to this context
  • pipeline_name - str
  • sections - typing.Tuple[str, ...]
  • merge_style - typing.Callable[[dlt.common.configuration.specs.config_section_context.ConfigSectionContext, dlt.common.configuration.specs.config_section_context.ConfigSectionContext], NoneType]
  • source_state_key - str

ContainerInjectableContext

Base class for all configurations that may be injected from a Container. Injectable configuration is called a context

  • in_container - bool
    Current container, if None then not injected
  • extras_added - bool
    Tells if extras were already added to this context

CsvFormatConfiguration

None

  • delimiter - str
  • include_header - bool
  • quoting - quote_all | quote_needed
  • on_error_continue - bool
  • encoding - str

DBTRunnerConfiguration

None

  • package_location - str
  • package_repository_branch - str
  • package_repository_ssh_key - str
  • package_profiles_dir - str
  • package_profile_name - str
  • auto_full_refresh_when_out_of_sync - bool
  • package_additional_vars - typing.Mapping[str, typing.Any]
  • runtime - RuntimeConfiguration

DestinationCapabilitiesContext

Injectable destination capabilities required for many Pipeline stages ie. normalize

  • in_container - bool
    Current container, if None then not injected
  • extras_added - bool
    Tells if extras were already added to this context
  • preferred_loader_file_format - jsonl | typed-jsonl | insert_values | parquet | csv | reference | model
  • supported_loader_file_formats - typing.Sequence[typing.Literal['jsonl', 'typed-jsonl', 'insert_values', 'parquet', 'csv', 'reference', 'model']]
  • loader_file_format_selector - class 'dlt.common.destination.capabilities.LoaderFileFormatSelector'
    Callable that adapts preferred_loader_file_format and supported_loader_file_formats at runtime.
  • preferred_table_format - iceberg | delta | hive | native
  • supported_table_formats - typing.Sequence[typing.Literal['iceberg', 'delta', 'hive', 'native']]
  • type_mapper - typing.Type[dlt.common.destination.capabilities.DataTypeMapper]
  • recommended_file_size - int
    Recommended file size in bytes when writing extract/load files
  • preferred_staging_file_format - jsonl | typed-jsonl | insert_values | parquet | csv | reference | model
  • supported_staging_file_formats - typing.Sequence[typing.Literal['jsonl', 'typed-jsonl', 'insert_values', 'parquet', 'csv', 'reference', 'model']]
  • format_datetime_literal - typing.Callable[..., str]
  • escape_identifier - typing.Callable[[str], str]
  • escape_literal - typing.Callable[[typing.Any], typing.Any]
  • casefold_identifier - typing.Callable[[str], str]
    Casing function applied by destination to represent case insensitive identifiers.
  • has_case_sensitive_identifiers - bool
    Tells if destination supports case sensitive identifiers
  • decimal_precision - typing.Tuple[int, int]
  • wei_precision - typing.Tuple[int, int]
  • max_identifier_length - int
  • max_column_identifier_length - int
  • max_query_length - int
  • is_max_query_length_in_bytes - bool
  • max_text_data_type_length - int
  • is_max_text_data_type_length_in_bytes - bool
  • supports_transactions - bool
  • supports_ddl_transactions - bool
  • naming_convention - str | typing.Type[dlt.common.normalizers.naming.naming.NamingConvention] | class 'module'
  • alter_add_multi_column - bool
  • supports_create_table_if_not_exists - bool
  • supports_truncate_command - bool
  • schema_supports_numeric_precision - bool
  • timestamp_precision - int
  • max_rows_per_insert - int
  • insert_values_writer_type - str
  • supports_multiple_statements - bool
  • supports_clone_table - bool
    Destination supports CREATE TABLE ... CLONE ... statements
  • max_table_nesting - int
    Allows a destination to overwrite max_table_nesting from source
  • supported_merge_strategies - typing.Sequence[typing.Literal['delete-insert', 'scd2', 'upsert']]
  • merge_strategies_selector - class 'dlt.common.destination.capabilities.MergeStrategySelector'
  • supported_replace_strategies - typing.Sequence[typing.Literal['truncate-and-insert', 'insert-from-staging', 'staging-optimized']]
  • replace_strategies_selector - class 'dlt.common.destination.capabilities.ReplaceStrategySelector'
  • max_parallel_load_jobs - int
    The destination can set the maximum amount of parallel load jobs being executed
  • loader_parallelism_strategy - parallel | table-sequential | sequential
    The destination can override the parallelism strategy
  • max_query_parameters - int
    The maximum number of parameters that can be supplied in a single parametrized query
  • supports_native_boolean - bool
    The destination supports a native boolean type, otherwise bool columns are usually stored as integers
  • supports_nested_types - bool
    Tells if destination can write nested types, currently only destinations storing parquet are supported
  • enforces_nulls_on_alter - bool
    Tells if destination enforces null constraints when adding NOT NULL columns to existing tables
  • sqlglot_dialect - str
    The SQL dialect used by sqlglot to transpile a query to match the destination syntax.

FilesystemConfiguration

A configuration defining filesystem location and access credentials.

When configuration is resolved, bucket_url is used to extract a protocol and request corresponding credentials class.

Incremental

Adds incremental extraction for a resource by storing a cursor value in persistent state.

The cursor could for example be a timestamp for when the record was created and you can use this to load only new records created since the last run of the pipeline.

To use this the resource function should have an argument either type annotated with Incremental or a default Incremental instance. For example:

@dlt.resource(primary_key='id') def some_data(created_at=dlt.sources.incremental('created_at', '2023-01-01T00:00:00Z'): yield from request_data(created_after=created_at.last_value)

When the resource has a primary_key specified this is used to deduplicate overlapping items with the same cursor value.

Alternatively you can use this class as transform step and add it to any resource. For example:

@dlt.resource def some_data(): last_value = dlt.sources.incremental.from_existing_state("some_data", "item.ts") ...

r = some_data().add_step(dlt.sources.incremental("item.ts", initial_value=now, primary_key="delta")) info = p.run(r, destination="duckdb")

Args: cursor_path: The name or a JSON path to a cursor field. Uses the same names of fields as in your JSON document, before they are normalized to store in the database. initial_value: Optional value used for last_value when no state is available, e.g. on the first run of the pipeline. If not provided last_value will be None on the first run. last_value_func: Callable used to determine which cursor value to save in state. It is called with a list of the stored state value and all cursor vals from currently processing items. Default is max primary_key: Optional primary key used to deduplicate data. If not provided, a primary key defined by the resource will be used. Pass a tuple to define a compound key. Pass empty tuple to disable unique checks end_value: Optional value used to load a limited range of records between initial_value and end_value. Use in conjunction with initial_value, e.g. load records from given month incremental(initial_value="2022-01-01T00:00:00Z", end_value="2022-02-01T00:00:00Z") Note, when this is set the incremental filtering is stateless and initial_value always supersedes any previous incremental value in state. row_order: Declares that data source returns rows in descending (desc) or ascending (asc) order as defined by last_value_func. If row order is know, Incremental class is able to stop requesting new rows by closing pipe generator. This prevents getting more data from the source. Defaults to None, which means that row order is not known. allow_external_schedulers: If set to True, allows dlt to look for external schedulers from which it will take "initial_value" and "end_value" resulting in loading only specified range of data. Currently Airflow scheduler is detected: "data_interval_start" and "data_interval_end" are taken from the context and passed Incremental class. The values passed explicitly to Incremental will be ignored. Note that if logical "end date" is present then also "end_value" will be set which means that resource state is not used and exactly this range of date will be loaded on_cursor_value_missing: Specify what happens when the cursor_path does not exist in a record or a record has None at the cursor_path: raise, include, exclude lag: Optional value used to define a lag or attribution window. For datetime cursors, this is interpreted as seconds. For other types, it uses the + or - operator depending on the last_value_func. range_start: Decide whether the incremental filtering range is open or closed on the start value side. Default is closed. Setting this to open means that items with the same cursor value as the last value from the previous run (or initial_value) are excluded from the result. The open range disables deduplication logic so it can serve as an optimization when you know cursors don't overlap between pipeline runs. range_end: Decide whether the incremental filtering range is open or closed on the end value side. Default is open (exact end_value is excluded). Setting this to closed means that items with the exact same cursor value as the end_value are included in the result.

  • cursor_path - str
  • initial_value - typing.Any
  • end_value - typing.Any
  • row_order - asc | desc
  • allow_external_schedulers - bool
  • on_cursor_value_missing - raise | include | exclude
  • lag - float
  • range_start - open | closed
  • range_end - open | closed

ItemsNormalizerConfiguration

None

  • add_dlt_id - bool
    When true, items to be normalized will have _dlt_id column added with a unique ID for each row.
  • add_dlt_load_id - bool
    When true, items to be normalized will have _dlt_load_id column added with the current load ID.

LanceDBClientOptions

None

  • max_retries - int
    EmbeddingFunction class wraps the calls for source and query embedding

LoadPackageStateInjectableContext

None

  • in_container - bool
    Current container, if None then not injected
  • extras_added - bool
    Tells if extras were already added to this context
  • storage - class 'dlt.common.storages.load_package.PackageStorage'
  • load_id - str

LoadStorageConfiguration

None

  • load_volume_path - str
  • delete_completed_jobs - bool

LoaderConfiguration

None

  • pool_type - process | thread | none
    type of pool to run, must be set in derived configs
  • start_method - str
    start method for the pool (typically process). None is system default
  • workers - int
    how many parallel loads can be executed
  • run_sleep - float
    how long to sleep between runs with workload, seconds
  • parallelism_strategy - parallel | table-sequential | sequential
    Which parallelism strategy to use at load time
  • raise_on_failed_jobs - bool
    when True, raises on terminally failed jobs immediately
  • raise_on_max_retries - int
    When gt 0 will raise when job reaches raise_on_max_retries
  • truncate_staging_dataset - bool

NormalizeConfiguration

None

NormalizeStorageConfiguration

None

  • normalize_volume_path - str

ParquetFormatConfiguration

None

  • flavor - str
  • version - str
  • data_page_size - int
  • timestamp_timezone - str
  • row_group_size - int
  • coerce_timestamps - s | ms | us | ns
  • allow_truncated_timestamps - bool

PipelineContext

None

  • in_container - bool
    Current container, if None then not injected
  • extras_added - bool
    Tells if extras were already added to this context

PoolRunnerConfiguration

None

  • pool_type - process | thread | none
    type of pool to run, must be set in derived configs
  • start_method - str
    start method for the pool (typically process). None is system default
  • workers - int
    # how many threads/processes in the pool
  • run_sleep - float
    how long to sleep between runs with workload, seconds

QdrantClientOptions

None

  • port - int
  • grpc_port - int
  • prefer_grpc - bool
  • https - bool
  • prefix - str
  • timeout - int
  • host - str

RuntimeConfiguration

None

  • pipeline_name - str
  • sentry_dsn - str
  • slack_incoming_hook - str
  • dlthub_telemetry - bool
  • dlthub_telemetry_endpoint - str
  • dlthub_telemetry_segment_write_key - str
  • log_format - str
  • log_level - str
  • request_timeout - float
    Timeout for http requests
  • request_max_attempts - int
    Max retry attempts for http clients
  • request_backoff_factor - float
    Multiplier applied to exponential retry delay for http requests
  • request_max_retry_delay - float
    Maximum delay between http request retries
  • config_files_storage_path - str
    Platform connection
  • dlthub_dsn - str

SchemaConfiguration

None

  • naming - str | typing.Type[dlt.common.normalizers.naming.naming.NamingConvention] | class 'module'
  • json_normalizer - typing.Dict[str, typing.Any]
  • allow_identifier_change_on_table_with_data - bool
  • use_break_path_on_normalize - bool
    Post 1.4.0 to allow table and column names that contain table separators

SchemaStorageConfiguration

None

  • schema_volume_path - str
  • import_schema_path - str
  • export_schema_path - str
  • external_schema_format - json | yaml
  • external_schema_format_remove_defaults - bool

SourceInjectableContext

A context containing the source schema, present when dlt.resource decorated function is executed

  • in_container - bool
    Current container, if None then not injected
  • extras_added - bool
    Tells if extras were already added to this context
  • source - class 'dlt.extract.source.DltSource'

SourceSchemaInjectableContext

A context containing the source schema, present when dlt.source/resource decorated function is executed

  • in_container - bool
    Current container, if None then not injected
  • extras_added - bool
    Tells if extras were already added to this context
  • schema - class 'dlt.common.schema.schema.Schema'

StateInjectableContext

None

  • in_container - bool
    Current container, if None then not injected
  • extras_added - bool
    Tells if extras were already added to this context
  • state - class 'dlt.common.pipeline.TPipelineState'

TransformationConfiguration

Configuration for a transformation

  • buffer_max_items - int

VaultProviderConfiguration

None

  • only_secrets - bool
  • only_toml_fragments - bool
  • list_secrets - bool

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.