Configuration Reference
This page contains a reference of most configuration options and objects available in DLT.
Destination Configurations
AthenaClientConfiguration
Configuration for the Athena destination
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- AwsCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.staging_config
- DestinationClientStagingConfiguration
configuration of the staging, if present, injected at runtimetruncate_tables_on_staging_destination_before_load
- bool
If dlt should truncate the tables on staging destination before loading data.query_result_bucket
- strathena_work_group
- straws_data_catalog
- strconnection_params
- typing.Dict[str, typing.Any]force_iceberg
- booltable_location_layout
- strtable_properties
- typing.Dict[str, str]db_location
- str
BigQueryClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- GcpServiceAccountCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.staging_config
- DestinationClientStagingConfiguration
configuration of the staging, if present, injected at runtimetruncate_tables_on_staging_destination_before_load
- bool
If dlt should truncate the tables on staging destination before loading data.location
- strproject_id
- str
Note, that this is BigQuery project_id which could be different from credentials.project_idhas_case_sensitive_identifiers
- bool
If True then dlt expects to load data into case sensitive datasetshould_set_case_sensitivity_on_new_dataset
- bool
If True, dlt will set case sensitivity flag on created datasets that corresponds to naming conventionhttp_timeout
- float
connection timeout for http request to BigQuery apifile_upload_timeout
- float
a timeout for file upload when loading local filesretry_deadline
- float
How long to retry the operation in case of error, the backoff 60 s.batch_size
- int
Number of rows in streaming insert batchautodetect_schema
- bool
Allow BigQuery to autodetect schemas and create data tablesignore_unknown_values
- bool
Ignore unknown values in the data
ClickHouseClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- ClickHouseCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.staging_config
- DestinationClientStagingConfiguration
configuration of the staging, if present, injected at runtimetruncate_tables_on_staging_destination_before_load
- bool
If dlt should truncate the tables on staging destination before loading data.dataset_table_separator
- str
Separator for dataset table names, defaults to '___', i.e. 'database.dataset___table'.table_engine_type
- merge_tree | shared_merge_tree | replicated_merge_tree
The default table engine to use. Defaults tomerge_tree
. Other implemented options areshared_merge_tree
andreplicated_merge_tree
.dataset_sentinel_table_name
- str
Special table to mark dataset as existingstaging_use_https
- bool
Connect to the staging buckets via https
CustomDestinationClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- CredentialsConfiguration
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
destination_callable
- str | typing.Callable[[typing.Union[typing.Any, typing.List[typing.Any], str], dlt.common.schema.typing.TTableSchema], NoneType]loader_file_format
- jsonl | typed-jsonl | insert_values | parquet | csv | reference | modelbatch_size
- intskip_dlt_columns_and_tables
- boolmax_table_nesting
- int
DatabricksClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- DatabricksCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.staging_config
- DestinationClientStagingConfiguration
configuration of the staging, if present, injected at runtimetruncate_tables_on_staging_destination_before_load
- bool
If dlt should truncate the tables on staging destination before loading data.staging_credentials_name
- stris_staging_external_location
- bool
If true, the temporary credentials are not propagated to the COPY commandstaging_volume_name
- str
Name of the Databricks managed volume for temporary storage, e.g., catalog_name.database_name.volume_name. Defaults to '_dlt_temp_load_volume' if not set.keep_staged_files
- bool
Tells if to keep the files in internal (volume) stage
DestinationClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- CredentialsConfiguration
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
DestinationClientDwhConfiguration
Configuration of a destination that supports datasets/schemas
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- CredentialsConfiguration
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.
DestinationClientDwhWithStagingConfiguration
Configuration of a destination that can take data from staging destination
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- CredentialsConfiguration
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.staging_config
- DestinationClientStagingConfiguration
configuration of the staging, if present, injected at runtimetruncate_tables_on_staging_destination_before_load
- bool
If dlt should truncate the tables on staging destination before loading data.
DestinationClientStagingConfiguration
Configuration of a staging destination, able to store files with desired layout
at bucket_url
.
Also supports datasets and can act as standalone destination.
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- CredentialsConfiguration
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.as_staging_destination
- boolbucket_url
- strlayout
- str
DremioClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- DremioCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.staging_config
- DestinationClientStagingConfiguration
configuration of the staging, if present, injected at runtimetruncate_tables_on_staging_destination_before_load
- bool
If dlt should truncate the tables on staging destination before loading data.staging_data_source
- str
The name of the staging data source
DuckDbClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- DuckDbCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.staging_config
- DestinationClientStagingConfiguration
configuration of the staging, if present, injected at runtimetruncate_tables_on_staging_destination_before_load
- bool
If dlt should truncate the tables on staging destination before loading data.local_dir
- strpipeline_name
- strpipeline_working_dir
- strlegacy_db_path
- strcreate_indexes
- bool
DummyClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- DummyClientCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
loader_file_format
- jsonl | typed-jsonl | insert_values | parquet | csv | reference | modelfail_schema_update
- boolfail_prob
- float
probability of terminal failretry_prob
- float
probability of job retrycompleted_prob
- float
probability of successful job completionexception_prob
- float
probability of exception transient exception when running jobtimeout
- float
timeout timefail_terminally_in_init
- bool
raise terminal exception in job initfail_transiently_in_init
- bool
raise transient exception in job inittruncate_tables_on_staging_destination_before_load
- bool
truncate tables on staging destinationcreate_followup_jobs
- bool
create followup job for individual jobsfail_followup_job_creation
- bool
Raise generic exception during followupjob creationfail_table_chain_followup_job_creation
- bool
Raise generic exception during tablechain followupjob creationcreate_followup_table_chain_sql_jobs
- bool
create a table chain merge job which is guaranteed to failcreate_followup_table_chain_reference_jobs
- bool
create table chain jobs which succeed
FilesystemConfigurationWithLocalFiles
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- AwsCredentials | GcpServiceAccountCredentials | AzureCredentialsWithoutDefaults | AzureServicePrincipalCredentialsWithoutDefaults | AzureCredentials | AzureServicePrincipalCredentials | GcpOAuthCredentials | SFTPCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
local_dir
- strpipeline_name
- strpipeline_working_dir
- strlegacy_db_path
- strbucket_url
- strread_only
- bool
Indicates read only filesystem access. Will enable cachingkwargs
- typing.Dict[str, typing.Any]
Additional arguments passed to fsspec constructor ie. dict(use_ssl=True) for s3fsclient_kwargs
- typing.Dict[str, typing.Any]
Additional arguments passed to underlying fsspec native client ie. dict(verify="public.crt) for botocoredeltalake_storage_options
- typing.Dict[str, typing.Any]deltalake_configuration
- typing.Dict[str, typing.Optional[str]]
FilesystemDestinationClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- AwsCredentials | GcpServiceAccountCredentials | AzureCredentialsWithoutDefaults | AzureServicePrincipalCredentialsWithoutDefaults | AzureCredentials | AzureServicePrincipalCredentials | GcpOAuthCredentials | SFTPCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.as_staging_destination
- boolbucket_url
- strlayout
- strlocal_dir
- strpipeline_name
- strpipeline_working_dir
- strlegacy_db_path
- strread_only
- bool
Indicates read only filesystem access. Will enable cachingkwargs
- typing.Dict[str, typing.Any]
Additional arguments passed to fsspec constructor ie. dict(use_ssl=True) for s3fsclient_kwargs
- typing.Dict[str, typing.Any]
Additional arguments passed to underlying fsspec native client ie. dict(verify="public.crt) for botocoredeltalake_storage_options
- typing.Dict[str, typing.Any]deltalake_configuration
- typing.Dict[str, typing.Optional[str]]current_datetime
- class 'pendulum.datetime.DateTime' | typing.Callable[[], pendulum.datetime.DateTime]extra_placeholders
- typing.Dict[str, typing.Union[str, int, pendulum.datetime.DateTime, typing.Callable[[str, str, str, str, str], str]]]max_state_files
- int
Maximum number of pipeline state files to keep; 0 or negative value disables cleanup.always_refresh_views
- bool
Always refresh table scanner views by setting the newest table metadata or globbing table files
LanceDBClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- LanceDBCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.local_dir
- strpipeline_name
- strpipeline_working_dir
- strlegacy_db_path
- strlance_uri
- str
LanceDB database URI. Defaults to local, on-disk instance.dataset_separator
- str
Character for the dataset separator.options
- LanceDBClientOptions
LanceDB client options.embedding_model_provider
- gemini-text | bedrock-text | cohere | gte-text | imagebind | instructor | open-clip | openai | sentence-transformers | huggingface | colbert | ollama
Embedding provider used for generating embeddings. Default is "cohere". You can find the full list ofembedding_model_provider_host
- str
Full host URL with protocol and port (e.g. 'http://localhost:11434'). Uses LanceDB's default if not specified, assuming the provider accepts this parameter.embedding_model
- str
The model used by the embedding provider for generating embeddings.embedding_model_dimensions
- int
The dimensions of the embeddings generated. In most cases it will be automatically inferred, by LanceDB,vector_field_name
- str
Name of the special field to store the vector embeddings.sentinel_table_name
- str
Name of the sentinel table that encapsulates datasets. Since LanceDB has no
MotherDuckClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- MotherDuckCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.staging_config
- DestinationClientStagingConfiguration
configuration of the staging, if present, injected at runtimetruncate_tables_on_staging_destination_before_load
- bool
If dlt should truncate the tables on staging destination before loading data.create_indexes
- bool
MsSqlClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- MsSqlCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.staging_config
- DestinationClientStagingConfiguration
configuration of the staging, if present, injected at runtimetruncate_tables_on_staging_destination_before_load
- bool
If dlt should truncate the tables on staging destination before loading data.create_indexes
- boolhas_case_sensitive_identifiers
- bool
PostgresClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- PostgresCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.staging_config
- DestinationClientStagingConfiguration
configuration of the staging, if present, injected at runtimetruncate_tables_on_staging_destination_before_load
- bool
If dlt should truncate the tables on staging destination before loading data.create_indexes
- boolcsv_format
- CsvFormatConfiguration
Optional csv format configuration
QdrantClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- QdrantCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.local_dir
- strpipeline_name
- strpipeline_working_dir
- strlegacy_db_path
- strqd_location
- strqd_path
- str
Persistence path for QdrantLocal. Default:None
dataset_separator
- strembedding_batch_size
- intembedding_parallelism
- intupload_batch_size
- intupload_parallelism
- intupload_max_retries
- intoptions
- QdrantClientOptionsmodel
- str
RedshiftClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- RedshiftCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.staging_config
- DestinationClientStagingConfiguration
configuration of the staging, if present, injected at runtimetruncate_tables_on_staging_destination_before_load
- bool
If dlt should truncate the tables on staging destination before loading data.create_indexes
- boolcsv_format
- CsvFormatConfiguration
Optional csv format configurationstaging_iam_role
- strhas_case_sensitive_identifiers
- bool
SnowflakeClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- SnowflakeCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.staging_config
- DestinationClientStagingConfiguration
configuration of the staging, if present, injected at runtimetruncate_tables_on_staging_destination_before_load
- bool
If dlt should truncate the tables on staging destination before loading data.stage_name
- str
Use an existing named stage instead of the default. Default uses the implicit table stage per tablekeep_staged_files
- bool
Whether to keep or delete the staged files after COPY INTO succeedscsv_format
- CsvFormatConfiguration
Optional csv format configurationquery_tag
- str
A tag with placeholders to tag sessions executing jobscreate_indexes
- bool
Whether UNIQUE or PRIMARY KEY constrains should be createduse_vectorized_scanner
- bool
Whether to use or not use the vectorized scanner in COPY INTO
SqlalchemyClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- SqlalchemyCredentials
SQLAlchemy connection stringdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.create_unique_indexes
- bool
Whether UNIQUE constrains should be createdcreate_primary_keys
- bool
Whether PRIMARY KEY constrains should be createdengine_args
- typing.Dict[str, typing.Any]
Additional arguments passed tosqlalchemy.create_engine
SynapseClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- SynapseCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.staging_config
- DestinationClientStagingConfiguration
configuration of the staging, if present, injected at runtimetruncate_tables_on_staging_destination_before_load
- bool
If dlt should truncate the tables on staging destination before loading data.create_indexes
- bool
Whetherprimary_key
andunique
column hints are applied.has_case_sensitive_identifiers
- booldefault_table_index_type
- heap | clustered_columnstore_indexstaging_use_msi
- bool
Whether the managed identity of the Synapse workspace is used to authorize access to the staging Storage Account.
WeaviateClientConfiguration
None
destination_type
- str
Type of this destination, e.g.postgres
orduckdb
credentials
- WeaviateCredentials
Credentials for this destinationdestination_name
- str
Name of the destination, e.g.my_postgres
ormy_duckdb
, will be the same as destination_type if not setenvironment
- str
Environment of the destination, e.g.dev
orprod
dataset_name
- str
dataset name in the destination to load data to, for schemas that are not default schema, it is used as dataset prefixdefault_schema_name
- str
name of default schema to be used to name effective dataset to load data toreplace_strategy
- truncate-and-insert | insert-from-staging | staging-optimized
How to handle replace disposition for this destination, uses first strategy from caps if not declaredstaging_dataset_name_layout
- str
Layout for staging dataset, where %s is replaced with dataset name. placeholder is optionalenable_dataset_name_normalization
- bool
Whether to normalize the dataset name. Affects staging dataset as well.info_tables_query_threshold
- int
Threshold for information schema tables query, if exceeded tables will be filtered in code.batch_size
- intbatch_workers
- intbatch_consistency
- ONE | QUORUM | ALLbatch_retries
- intconn_timeout
- floatread_timeout
- floatstartup_period
- intdataset_separator
- strvectorizer
- strmodule_config
- typing.Dict[str, typing.Dict[str, str]]
Credential Configurations
AwsCredentials
None
aws_access_key_id
- straws_secret_access_key
- straws_session_token
- strprofile_name
- strregion_name
- strendpoint_url
- strs3_url_style
- str
Only needed for duckdb sql_client s3 access, for minio this needs to be set to path for example.
AwsCredentialsWithoutDefaults
None
aws_access_key_id
- straws_secret_access_key
- straws_session_token
- strprofile_name
- strregion_name
- strendpoint_url
- strs3_url_style
- str
Only needed for duckdb sql_client s3 access, for minio this needs to be set to path for example.
AzureCredentials
None
azure_storage_account_name
- strazure_account_host
- str
Alternative host when accessing blob storage endpoint ie. my_account.dfs.core.windows.netazure_storage_account_key
- strazure_storage_sas_token
- strazure_sas_token_permissions
- str
Permissions to use when generating a SAS token. Ignored when sas token is provided directly
AzureCredentialsBase
None
azure_storage_account_name
- strazure_account_host
- str
Alternative host when accessing blob storage endpoint ie. my_account.dfs.core.windows.net
AzureCredentialsWithoutDefaults
Credentials for Azure Blob Storage, compatible with adlfs
azure_storage_account_name
- strazure_account_host
- str
Alternative host when accessing blob storage endpoint ie. my_account.dfs.core.windows.netazure_storage_account_key
- strazure_storage_sas_token
- strazure_sas_token_permissions
- str
Permissions to use when generating a SAS token. Ignored when sas token is provided directly
AzureServicePrincipalCredentials
None
azure_storage_account_name
- strazure_account_host
- str
Alternative host when accessing blob storage endpoint ie. my_account.dfs.core.windows.netazure_tenant_id
- strazure_client_id
- strazure_client_secret
- str
AzureServicePrincipalCredentialsWithoutDefaults
None
azure_storage_account_name
- strazure_account_host
- str
Alternative host when accessing blob storage endpoint ie. my_account.dfs.core.windows.netazure_tenant_id
- strazure_client_id
- strazure_client_secret
- str
ClickHouseCredentials
None
drivername
- strdatabase
- str
database connect to. Defaults to 'default'.password
- strusername
- str
Database user. Defaults to 'default'.host
- str
Host with running ClickHouse server.port
- int
Native port ClickHouse server is bound to. Defaults to 9440.query
- typing.Dict[str, typing.Any]http_port
- int
HTTP Port to connect to ClickHouse server's HTTP interface.secure
- 0 | 1
Enables TLS encryption when connecting to ClickHouse Server. 0 means no encryption, 1 means encrypted.connect_timeout
- int
Timeout for establishing connection. Defaults to 10 seconds.send_receive_timeout
- int
Timeout for sending and receiving data. Defaults to 300 seconds.
ConnectionStringCredentials
None
drivername
- strdatabase
- strpassword
- strusername
- strhost
- strport
- intquery
- typing.Dict[str, typing.Any]
CredentialsConfiguration
Base class for all credentials. Credentials are configurations that may be stored only by providers supporting secrets.
DatabricksCredentials
None
catalog
- strserver_hostname
- strhttp_path
- straccess_token
- strclient_id
- strclient_secret
- strhttp_headers
- typing.Dict[str, str]session_configuration
- typing.Dict[str, typing.Any]
Dict of session parameters that will be passed todatabricks.sql.connect
connection_parameters
- typing.Dict[str, typing.Any]
Additional keyword arguments that are passed todatabricks.sql.connect
socket_timeout
- intuser_agent_entry
- str
DremioCredentials
None
drivername
- strdatabase
- strpassword
- strusername
- strhost
- strport
- intquery
- typing.Dict[str, typing.Any]
DuckDbBaseCredentials
None
drivername
- strdatabase
- strpassword
- strusername
- strhost
- strport
- intquery
- typing.Dict[str, typing.Any]read_only
- bool
DuckDbCredentials
None
drivername
- strdatabase
- strpassword
- strusername
- strhost
- strport
- intquery
- typing.Dict[str, typing.Any]read_only
- bool
DummyClientCredentials
None
GcpCredentials
None
token_uri
- strauth_uri
- strproject_id
- str
GcpDefaultCredentials
None
token_uri
- strauth_uri
- strproject_id
- str
GcpOAuthCredentials
None
client_id
- strclient_secret
- strrefresh_token
- strscopes
- typing.List[str]token
- str
Access tokentoken_uri
- strauth_uri
- strproject_id
- strclient_type
- str
GcpOAuthCredentialsWithoutDefaults
None
client_id
- strclient_secret
- strrefresh_token
- strscopes
- typing.List[str]token
- str
Access tokentoken_uri
- strauth_uri
- strproject_id
- strclient_type
- str
GcpServiceAccountCredentials
None
token_uri
- strauth_uri
- strproject_id
- strprivate_key
- strprivate_key_id
- strclient_email
- strtype
- str
GcpServiceAccountCredentialsWithoutDefaults
None
token_uri
- strauth_uri
- strproject_id
- strprivate_key
- strprivate_key_id
- strclient_email
- strtype
- str
LanceDBCredentials
None
uri
- strapi_key
- str
API key for the remote connections (LanceDB cloud).embedding_model_provider_api_key
- str
API key for the embedding model provider.
MotherDuckCredentials
None
drivername
- strdatabase
- strpassword
- strusername
- strhost
- strport
- intquery
- typing.Dict[str, typing.Any]read_only
- boolcustom_user_agent
- str
MsSqlCredentials
None
drivername
- strdatabase
- strpassword
- strusername
- strhost
- strport
- intquery
- typing.Dict[str, typing.Any]connect_timeout
- intdriver
- str
OAuth2Credentials
None
client_id
- strclient_secret
- strrefresh_token
- strscopes
- typing.List[str]token
- str
Access token
PostgresCredentials
None
drivername
- strdatabase
- strpassword
- strusername
- strhost
- strport
- intquery
- typing.Dict[str, typing.Any]connect_timeout
- intclient_encoding
- str
QdrantCredentials
None
location
- strapi_key
- str
# API key for authentication in Qdrant Cloud. Default:None
path
- str
RedshiftCredentials
None
drivername
- strdatabase
- strpassword
- strusername
- strhost
- strport
- intquery
- typing.Dict[str, typing.Any]connect_timeout
- intclient_encoding
- str
SFTPCredentials
Credentials for SFTP filesystem, compatible with fsspec SFTP protocol.
Authentication is attempted in the following order of priority:
-
key_filename
may contain OpenSSH public certificate paths as well as regular private-key paths; when files ending in-cert.pub
are found, they are assumed to match a private key, and both components will be loaded. -
Any key found through an SSH agent: any “id_rsa”, “id_dsa”, or “id_ecdsa” key discoverable in ~/.ssh/.
-
Plain username/password authentication, if a password was provided.
-
If a private key requires a password to unlock it, and a password is provided, that password will be used to attempt to unlock the key.
For more information about parameters: https://docs.paramiko.org/en/3.3/api/client.html#paramiko.client.SSHClient.connect
sftp_port
- intsftp_username
- strsftp_password
- strsftp_key_filename
- strsftp_key_passphrase
- strsftp_timeout
- floatsftp_banner_timeout
- floatsftp_auth_timeout
- floatsftp_channel_timeout
- floatsftp_allow_agent
- boolsftp_look_for_keys
- boolsftp_compress
- boolsftp_gss_auth
- boolsftp_gss_kex
- boolsftp_gss_deleg_creds
- boolsftp_gss_host
- strsftp_gss_trust_dns
- bool
SnowflakeCredentials
None
drivername
- strdatabase
- strpassword
- strusername
- strhost
- strport
- intquery
- typing.Dict[str, typing.Any]warehouse
- strrole
- strauthenticator
- strtoken
- strprivate_key
- strprivate_key_path
- strprivate_key_passphrase
- strapplication
- str
SqlalchemyCredentials
None
drivername
- strdatabase
- strpassword
- strusername
- strhost
- strport
- intquery
- typing.Dict[str, typing.Any]engine_args
- typing.Dict[str, typing.Any]
Additional arguments passed tosqlalchemy.create_engine
SynapseCredentials
None
drivername
- strdatabase
- strpassword
- strusername
- strhost
- strport
- intquery
- typing.Dict[str, typing.Any]connect_timeout
- intdriver
- str
WeaviateCredentials
None
url
- strapi_key
- stradditional_headers
- typing.Dict[str, str]
All other Configurations
BaseConfiguration
None
ConfigProvidersConfiguration
None
enable_airflow_secrets
- boolenable_google_secrets
- boolairflow_secrets
- VaultProviderConfigurationgoogle_secrets
- VaultProviderConfiguration
ConfigSectionContext
None
in_container
- bool
Current container, if None then not injectedextras_added
- bool
Tells if extras were already added to this contextpipeline_name
- strsections
- typing.Tuple[str, ...]merge_style
- typing.Callable[[dlt.common.configuration.specs.config_section_context.ConfigSectionContext, dlt.common.configuration.specs.config_section_context.ConfigSectionContext], NoneType]source_state_key
- str
ContainerInjectableContext
Base class for all configurations that may be injected from a Container. Injectable configuration is called a context
in_container
- bool
Current container, if None then not injectedextras_added
- bool
Tells if extras were already added to this context
CsvFormatConfiguration
None
delimiter
- strinclude_header
- boolquoting
- quote_all | quote_neededon_error_continue
- boolencoding
- str
DBTRunnerConfiguration
None
package_location
- strpackage_repository_branch
- strpackage_repository_ssh_key
- strpackage_profiles_dir
- strpackage_profile_name
- strauto_full_refresh_when_out_of_sync
- boolpackage_additional_vars
- typing.Mapping[str, typing.Any]runtime
- RuntimeConfiguration
DestinationCapabilitiesContext
Injectable destination capabilities required for many Pipeline stages ie. normalize
in_container
- bool
Current container, if None then not injectedextras_added
- bool
Tells if extras were already added to this contextpreferred_loader_file_format
- jsonl | typed-jsonl | insert_values | parquet | csv | reference | modelsupported_loader_file_formats
- typing.Sequence[typing.Literal['jsonl', 'typed-jsonl', 'insert_values', 'parquet', 'csv', 'reference', 'model']]loader_file_format_selector
- class 'dlt.common.destination.capabilities.LoaderFileFormatSelector'
Callable that adaptspreferred_loader_file_format
andsupported_loader_file_formats
at runtime.preferred_table_format
- iceberg | delta | hive | nativesupported_table_formats
- typing.Sequence[typing.Literal['iceberg', 'delta', 'hive', 'native']]type_mapper
- typing.Type[dlt.common.destination.capabilities.DataTypeMapper]recommended_file_size
- int
Recommended file size in bytes when writing extract/load filespreferred_staging_file_format
- jsonl | typed-jsonl | insert_values | parquet | csv | reference | modelsupported_staging_file_formats
- typing.Sequence[typing.Literal['jsonl', 'typed-jsonl', 'insert_values', 'parquet', 'csv', 'reference', 'model']]format_datetime_literal
- typing.Callable[..., str]escape_identifier
- typing.Callable[[str], str]escape_literal
- typing.Callable[[typing.Any], typing.Any]casefold_identifier
- typing.Callable[[str], str]
Casing function applied by destination to represent case insensitive identifiers.has_case_sensitive_identifiers
- bool
Tells if destination supports case sensitive identifiersdecimal_precision
- typing.Tuple[int, int]wei_precision
- typing.Tuple[int, int]max_identifier_length
- intmax_column_identifier_length
- intmax_query_length
- intis_max_query_length_in_bytes
- boolmax_text_data_type_length
- intis_max_text_data_type_length_in_bytes
- boolsupports_transactions
- boolsupports_ddl_transactions
- boolnaming_convention
- str | typing.Type[dlt.common.normalizers.naming.naming.NamingConvention] | class 'module'alter_add_multi_column
- boolsupports_create_table_if_not_exists
- boolsupports_truncate_command
- boolschema_supports_numeric_precision
- booltimestamp_precision
- intmax_rows_per_insert
- intinsert_values_writer_type
- strsupports_multiple_statements
- boolsupports_clone_table
- bool
Destination supports CREATE TABLE ... CLONE ... statementsmax_table_nesting
- int
Allows a destination to overwrite max_table_nesting from sourcesupported_merge_strategies
- typing.Sequence[typing.Literal['delete-insert', 'scd2', 'upsert']]merge_strategies_selector
- class 'dlt.common.destination.capabilities.MergeStrategySelector'supported_replace_strategies
- typing.Sequence[typing.Literal['truncate-and-insert', 'insert-from-staging', 'staging-optimized']]replace_strategies_selector
- class 'dlt.common.destination.capabilities.ReplaceStrategySelector'max_parallel_load_jobs
- int
The destination can set the maximum amount of parallel load jobs being executedloader_parallelism_strategy
- parallel | table-sequential | sequential
The destination can override the parallelism strategymax_query_parameters
- int
The maximum number of parameters that can be supplied in a single parametrized querysupports_native_boolean
- bool
The destination supports a native boolean type, otherwise bool columns are usually stored as integerssupports_nested_types
- bool
Tells if destination can write nested types, currently only destinations storing parquet are supportedenforces_nulls_on_alter
- bool
Tells if destination enforces null constraints when adding NOT NULL columns to existing tablessqlglot_dialect
- str
The SQL dialect used by sqlglot to transpile a query to match the destination syntax.
FilesystemConfiguration
A configuration defining filesystem location and access credentials.
When configuration is resolved, bucket_url
is used to extract a protocol and request corresponding credentials class.
-
s3
-
gs, gcs
-
az, abfs, adl, abfss, azure
-
file, memory
-
gdrive
-
sftp
-
bucket_url
- str -
credentials
- AwsCredentials | GcpServiceAccountCredentials | AzureCredentialsWithoutDefaults | AzureServicePrincipalCredentialsWithoutDefaults | AzureCredentials | AzureServicePrincipalCredentials | GcpOAuthCredentials | SFTPCredentials -
read_only
- bool
Indicates read only filesystem access. Will enable caching -
kwargs
- typing.Dict[str, typing.Any]
Additional arguments passed to fsspec constructor ie. dict(use_ssl=True) for s3fs -
client_kwargs
- typing.Dict[str, typing.Any]
Additional arguments passed to underlying fsspec native client ie. dict(verify="public.crt) for botocore -
deltalake_storage_options
- typing.Dict[str, typing.Any] -
deltalake_configuration
- typing.Dict[str, typing.Optional[str]]
Incremental
Adds incremental extraction for a resource by storing a cursor value in persistent state.
The cursor could for example be a timestamp for when the record was created and you can use this to load only new records created since the last run of the pipeline.
To use this the resource function should have an argument either type annotated with Incremental
or a default Incremental
instance.
For example:
@dlt.resource(primary_key='id') def some_data(created_at=dlt.sources.incremental('created_at', '2023-01-01T00:00:00Z'): yield from request_data(created_after=created_at.last_value)
When the resource has a primary_key
specified this is used to deduplicate overlapping items with the same cursor value.
Alternatively you can use this class as transform step and add it to any resource. For example:
@dlt.resource def some_data(): last_value = dlt.sources.incremental.from_existing_state("some_data", "item.ts") ...
r = some_data().add_step(dlt.sources.incremental("item.ts", initial_value=now, primary_key="delta")) info = p.run(r, destination="duckdb")
Args:
cursor_path: The name or a JSON path to a cursor field. Uses the same names of fields as in your JSON document, before they are normalized to store in the database.
initial_value: Optional value used for last_value
when no state is available, e.g. on the first run of the pipeline. If not provided last_value
will be None
on the first run.
last_value_func: Callable used to determine which cursor value to save in state. It is called with a list of the stored state value and all cursor vals from currently processing items. Default is max
primary_key: Optional primary key used to deduplicate data. If not provided, a primary key defined by the resource will be used. Pass a tuple to define a compound key. Pass empty tuple to disable unique checks
end_value: Optional value used to load a limited range of records between initial_value
and end_value
.
Use in conjunction with initial_value
, e.g. load records from given month incremental(initial_value="2022-01-01T00:00:00Z", end_value="2022-02-01T00:00:00Z")
Note, when this is set the incremental filtering is stateless and initial_value
always supersedes any previous incremental value in state.
row_order: Declares that data source returns rows in descending (desc) or ascending (asc) order as defined by last_value_func
. If row order is know, Incremental class
is able to stop requesting new rows by closing pipe generator. This prevents getting more data from the source. Defaults to None, which means that
row order is not known.
allow_external_schedulers: If set to True, allows dlt to look for external schedulers from which it will take "initial_value" and "end_value" resulting in loading only
specified range of data. Currently Airflow scheduler is detected: "data_interval_start" and "data_interval_end" are taken from the context and passed Incremental class.
The values passed explicitly to Incremental will be ignored.
Note that if logical "end date" is present then also "end_value" will be set which means that resource state is not used and exactly this range of date will be loaded
on_cursor_value_missing: Specify what happens when the cursor_path does not exist in a record or a record has None
at the cursor_path: raise, include, exclude
lag: Optional value used to define a lag or attribution window. For datetime cursors, this is interpreted as seconds. For other types, it uses the + or - operator depending on the last_value_func.
range_start: Decide whether the incremental filtering range is open
or closed
on the start value side. Default is closed
.
Setting this to open
means that items with the same cursor value as the last value from the previous run (or initial_value
) are excluded from the result.
The open
range disables deduplication logic so it can serve as an optimization when you know cursors don't overlap between pipeline runs.
range_end: Decide whether the incremental filtering range is open
or closed
on the end value side. Default is open
(exact end_value
is excluded).
Setting this to closed
means that items with the exact same cursor value as the end_value
are included in the result.
cursor_path
- strinitial_value
- typing.Anyend_value
- typing.Anyrow_order
- asc | descallow_external_schedulers
- boolon_cursor_value_missing
- raise | include | excludelag
- floatrange_start
- open | closedrange_end
- open | closed
ItemsNormalizerConfiguration
None
add_dlt_id
- bool
When true, items to be normalized will have_dlt_id
column added with a unique ID for each row.add_dlt_load_id
- bool
When true, items to be normalized will have_dlt_load_id
column added with the current load ID.
LanceDBClientOptions
None
max_retries
- int
EmbeddingFunction
class wraps the calls for source and query embedding
LoadPackageStateInjectableContext
None
in_container
- bool
Current container, if None then not injectedextras_added
- bool
Tells if extras were already added to this contextstorage
- class 'dlt.common.storages.load_package.PackageStorage'load_id
- str
LoadStorageConfiguration
None
load_volume_path
- strdelete_completed_jobs
- bool
LoaderConfiguration
None
pool_type
- process | thread | none
type of pool to run, must be set in derived configsstart_method
- str
start method for the pool (typically process). None is system defaultworkers
- int
how many parallel loads can be executedrun_sleep
- float
how long to sleep between runs with workload, secondsparallelism_strategy
- parallel | table-sequential | sequential
Which parallelism strategy to use at load timeraise_on_failed_jobs
- bool
when True, raises on terminally failed jobs immediatelyraise_on_max_retries
- int
When gt 0 will raise when job reaches raise_on_max_retriestruncate_staging_dataset
- bool
NormalizeConfiguration
None
pool_type
- process | thread | none
type of pool to run, must be set in derived configsstart_method
- str
start method for the pool (typically process). None is system defaultworkers
- int
# how many threads/processes in the poolrun_sleep
- float
how long to sleep between runs with workload, secondsdestination_capabilities
- DestinationCapabilitiesContextjson_normalizer
- ItemsNormalizerConfigurationparquet_normalizer
- ItemsNormalizerConfigurationmodel_normalizer
- ItemsNormalizerConfiguration
NormalizeStorageConfiguration
None
normalize_volume_path
- str
ParquetFormatConfiguration
None
flavor
- strversion
- strdata_page_size
- inttimestamp_timezone
- strrow_group_size
- intcoerce_timestamps
- s | ms | us | nsallow_truncated_timestamps
- bool
PipelineContext
None
in_container
- bool
Current container, if None then not injectedextras_added
- bool
Tells if extras were already added to this context
PoolRunnerConfiguration
None
pool_type
- process | thread | none
type of pool to run, must be set in derived configsstart_method
- str
start method for the pool (typically process). None is system defaultworkers
- int
# how many threads/processes in the poolrun_sleep
- float
how long to sleep between runs with workload, seconds
QdrantClientOptions
None
port
- intgrpc_port
- intprefer_grpc
- boolhttps
- boolprefix
- strtimeout
- inthost
- str
RuntimeConfiguration
None
pipeline_name
- strsentry_dsn
- strslack_incoming_hook
- strdlthub_telemetry
- booldlthub_telemetry_endpoint
- strdlthub_telemetry_segment_write_key
- strlog_format
- strlog_level
- strrequest_timeout
- float
Timeout for http requestsrequest_max_attempts
- int
Max retry attempts for http clientsrequest_backoff_factor
- float
Multiplier applied to exponential retry delay for http requestsrequest_max_retry_delay
- float
Maximum delay between http request retriesconfig_files_storage_path
- str
Platform connectiondlthub_dsn
- str
SchemaConfiguration
None
naming
- str | typing.Type[dlt.common.normalizers.naming.naming.NamingConvention] | class 'module'json_normalizer
- typing.Dict[str, typing.Any]allow_identifier_change_on_table_with_data
- booluse_break_path_on_normalize
- bool
Post 1.4.0 to allow table and column names that contain table separators
SchemaStorageConfiguration
None
schema_volume_path
- strimport_schema_path
- strexport_schema_path
- strexternal_schema_format
- json | yamlexternal_schema_format_remove_defaults
- bool
SourceInjectableContext
A context containing the source schema, present when dlt.resource decorated function is executed
in_container
- bool
Current container, if None then not injectedextras_added
- bool
Tells if extras were already added to this contextsource
- class 'dlt.extract.source.DltSource'
SourceSchemaInjectableContext
A context containing the source schema, present when dlt.source/resource decorated function is executed
in_container
- bool
Current container, if None then not injectedextras_added
- bool
Tells if extras were already added to this contextschema
- class 'dlt.common.schema.schema.Schema'
StateInjectableContext
None
in_container
- bool
Current container, if None then not injectedextras_added
- bool
Tells if extras were already added to this contextstate
- class 'dlt.common.pipeline.TPipelineState'
TransformationConfiguration
Configuration for a transformation
buffer_max_items
- int
VaultProviderConfiguration
None
only_secrets
- boolonly_toml_fragments
- boollist_secrets
- bool