- [dagit] The run timeline now shows all future schedule ticks for the visible time window, not just the next ten ticks.
- [dagit] Asset graph views in Dagit refresh as materialization events arrive, making it easier to watch your assets update in real-time.
- [dagster-airbyte] Added support for basic auth login to the Airbyte resource.
- Configuring a Python Log Level will now also apply to system logs created by Dagster during a run.
- Fixed a bug that broke asset partition mappings when using the
key_prefix
with methods like load_assets_from_modules
. - [dagster-dbt] When running dbt Cloud jobs with the dbt_cloud_run_op, the op would emit a failure if the targeted job did not create a run_results.json artifact, even if this was the expected behavior. This has been fixed.
- Improved performance by adding database indexes which should speed up the run view as well as a range of asset-based queries. These migrations can be applied by running
dagster instance migrate
. - An issue that would cause schedule/sensor latency in the daemon during workspace refreshes has been resolved.
- [dagit] Shift-clicking Materialize for partitioned assets now shows the asset launchpad, allowing you to launch execution of a partition with config.
- Fixed a bug where asset keys with
-
were not being properly sanitized in some situations. Thanks @peay! - [dagster-airbyte] A list of connection directories can now be specified in
load_assets_from_airbyte_project
. Thanks @adam-bloom! - [dagster-gcp] Dagster will now retry connecting to GCS if it gets a
ServiceUnavailable
error. Thanks @cavila-evoliq! - [dagster-postgres] Use of SQLAlchemy engine instead of psycopg2 when subscribing to PostgreSQL events. Thanks @peay!
- [dagster-dbt] Added a
display_raw_sql
flag to the dbt asset loading functions. If set to False, this will remove the raw sql blobs from the asset descriptions. For large dbt projects, this can significantly reduce the size of the generated workspace snapshots. - [dagit] A “New asset detail pages” feature flag available in Dagit’s settings allows you to preview some upcoming changes to the way historical materializations and partitions are viewed.
- Tags can now be provided to an asset reconciliation sensor and will be applied to all RunRequests returned by the sensor.
- If you don’t explicitly specify a DagsterType on a graph input, but all the inner inputs that the graph input maps to have the same DagsterType, the graph input’s DagsterType will be set to the the DagsterType of the inner inputs.
- [dagster-airbyte]
load_assets_from_airbyte_project
now caches the project data generated at repo load time so it does not have to be regenerated in subprocesses. - [dagster-airbyte] Output table schema metadata is now generated at asset definition time when using
load_assets_from_airbyte_instance
or load_assets_from_airbyte_project
. - [dagit] The run timeline now groups all jobs by repository. You can collapse or expand each repository in this view by clicking the repository name. This state will be preserved locally. You can also hold
Shift
while clicking the repository name, and all repository groups will be collapsed or expanded accordingly. - [dagit] In the launchpad view, a “Remove all” button is now available once you have accrued three or more tabs for that job, to make it easier to clear stale configuration tabs from view.
- [dagit] When scrolling through the asset catalog, the toolbar is now sticky. This makes it simpler to select multiple assets and materialize them without requiring you to scroll back to the top of the page.
- [dagit] A “Materialize” option has been added to the action menu on individual rows in the asset catalog view.
- [dagster-aws] The
EcsRunLauncher
now allows you to pass in a dictionary in the task_definition
config field that specifies configuration for the task definition of the launched run, including role ARNs and a list of sidecar containers to include. Previously, the task definition could only be configured by passing in a task definition ARN or by basing the the task definition off of the task definition of the ECS task launching the run. See the docs for the full set of available config.
- Previously, yielding a
SkipReason
within a multi-asset sensor (experimental) would raise an error. This has been fixed. - [dagit] Previously, if you had a partitioned asset job and supplied a hardcoded dictionary of config to
define_asset_job
, you would run into a CheckError
when launching the job from Dagit. This has been fixed. - [dagit] When viewing the Runs section of Dagit, the counts displayed in the tabs (e.g. “In progress”, “Queued”, etc.) were not updating on a poll interval. This has been fixed.
AssetMaterialization
now has a metadata
property, which allows accessing the materialization’s metadata as a dictionary.DagsterInstance
now has a get_latest_materialization_event
method, which allows fetching the most recent materialization event for a particular asset key.RepositoryDefinition.load_asset_value
and AssetValueLoader.load_asset_value
now work with IO managers whose load_input
implementation accesses the op_def
and name
attributes on the InputContext
.RepositoryDefinition.load_asset_value
and AssetValueLoader.load_asset_value
now respect the DAGSTER_HOME
environment variable.InMemoryIOManager
, the IOManager
that backs mem_io_manager
, has been added to the public API.- The
multi_asset_sensor
(experimental) now supports marking individual partitioned materializations as “consumed”. Unconsumed materializations will appear in future calls to partitioned context methods. - The
build_multi_asset_sensor_context
testing method (experimental) now contains a flag to set the cursor to the newest events in the Dagster instance. TableSchema
now has a static constructor that enables building it from a dictionary of column names to column types.- Added a new CLI command
dagster run migrate-repository
which lets you migrate the run history for a given job from one repository to another. This is useful to preserve run history for a job when you have renamed a repository, for example. - [dagit] The run timeline view now shows jobs grouped by repository, with each repository section collapsible. This feature was previously gated by a feature flag, and is now turned on for everyone.
- [dagster-airbyte] Added option to specify custom request params to the Airbyte resource, which can be used for auth purposes.
- [dagster-airbyte] When loading Airbyte assets from an instance or from YAML, a filter function can be specified to ignore certain connections.
- [dagster-airflow]
DagsterCloudOperator
and DagsterOperator
now support Airflow 2. Previously, installing the library on Airflow 2 would break due to an import error. - [dagster-duckdb] A new integration with DuckDB allows you to store op outputs and assets in an in-process database.
- Previously, if retries were exceeded when running with
execute_in_process
, no error would be raised. Now, a DagsterMaxRetriesExceededError
will be launched off. - [dagster-airbyte] Fixed generating assets for Airbyte normalization tables corresponding with nested union types.
- [dagster-dbt] When running assets with
load_assets_from_...(..., use_build=True)
, AssetObservation events would be emitted for each test. These events would have metadata fields which shared names with the fields added to the AssetMaterialization events, causing confusing historical graphs for fields such as Compilation Time. This has been fixed. - [dagster-dbt] The name for the underlying op for
load_assets_from_...
was generated in a way which was non-deterministic for dbt projects which pulled in external packages, leading to errors when executing across multiple processes. This has been fixed.
- [dagster-dbt] The package no longer depends on pandas and dagster-pandas.
- [dagster-airbyte] Added possibility to change request timeout value when calling Airbyte. Thanks @FransDel!
- [dagster-airflow] Fixed an import error in
dagster_airflow.hooks
. Thanks @bollwyvl! - [dagster-gcp] Unpin Google dependencies.
dagster-gcp
now supports google-api-python-client 2.x. Thanks @amarrella! - [dagstermill] Fixed an issue where DagsterTranslator was missing an argument required by newer versions of papermill. Thanks @tizz98!
- Added an example, underneath examples/assets_smoke_test, that shows how to write a smoke test that feeds empty data to all the transformations in a data pipeline.
- Added documentation for
build_asset_reconciliation_sensor
. - Added documentation for monitoring partitioned materializations using the
multi_asset_sensor
and kicking off subsequent partitioned runs. - [dagster-cloud] Added documentation for running the Dagster Cloud Docker agent with Docker credential helpers.
- [dagster-dbt] The class methods of the dbt_cli_resource are now visible in the API docs for the dagster-dbt library.
- [dagster-dbt] Added a step-by-step tutorial for using dbt models with Dagster software-defined assets
- The
multi_asset_sensor
(experimental) now accepts an AssetSelection
of assets to monitor. There are also minor API updates for the multi-asset sensor context. AssetValueLoader
, the type returned by RepositoryDefinition.get_asset_value_loader
is now part of Dagster’s public API.RepositoryDefinition.load_asset_value
and AssetValueLoader.load_asset_value
now support a partition_key
argument.RepositoryDefinition.load_asset_value
and AssetValueLoader.load_asset_value
now work with I/O managers that invoke context.upstream_output.asset_key
.- When running Dagster locally, the default amount of time that the system waits when importing user code has been increased from 60 seconds to 180 seconds, to avoid false positives when importing code with heavy dependencies or large numbers of assets. This timeout can be configured in
dagster.yaml
as follows:
code_servers:
local_startup_timeout: 120
- [dagit] The “Status” section has been renamed to “Deployment”, to better reflect that this section of the app shows deployment-wide information.
- [dagit] When viewing the compute logs for a run and choosing a step to filter on, there is now a search input to make it easier to find the step you’re looking for.
- [dagster-aws] The EcsRunLauncher can now launch runs in ECS clusters using both Fargate and EC2 capacity providers. See the Deploying to ECS docs for more information.
- [dagster-airbyte] Added the
load_assets_from_airbyte_instance
function which automatically generates asset definitions from an Airbyte instance. For more details, see the new Airbyte integration guide. - [dagster-airflow] Added the
DagsterCloudOperator
and DagsterOperator
, which are airflow operators that enable orchestrating dagster jobs, running on either cloud or OSS dagit instances, from Apache Airflow.
- Fixed a bug where if resource initialization failed for a dynamic op, causing other dynamic steps to be skipped, those skipped dynamic steps would be ignored when retrying from failure.
- Previously, some invocations within the Dagster framework would result in warnings about deprecated metadata APIs. Now, users should only see warnings if their code uses deprecated metadata APIs.
- How the daemon process manages its understanding of user code artifacts has been reworked to improve memory consumption.
- [dagit] The partition selection UI in the Asset Materialization modal now allows for mouse selection and matches the UI used for partitioned op jobs.
- [dagit] Sidebars in Dagit shrink more gracefully on small screens where headers and labels need to be truncated.
- [dagit] Improved performance for loading runs with >10,000 logs
- [dagster-airbyte] Previously, the
port
configuration in the airbyte_resource
was marked as not required, but if it was not supplied, an error would occur. It is now marked as required. - [dagster-dbt] A change made to the manifest.json schema in dbt 1.3 would result in an error when using
load_assets_from_dbt_project
or load_assets_from_manifest_json
. This has been fixed. - [dagster-postgres] connections that fail due to
sqlalchemy.exc.TimeoutError
now retry
- [dagster-aws] The
redshift_resource
no longer accepts a schema
configuration parameter. Previously, this parameter would error whenever used, because Redshift connections do not support this parameter.
- We now reference the correct method in the "loading asset values outside of Dagster runs" example (thank you Peter A. I. Forsyth!)
- We now reference the correct test directory in the “Create a New Project” documentation (thank you Peter A. I. Forsyth!)
- [dagster-pyspark] dagster-pyspark now contains a
LazyPysparkResource
that only initializes a spark session once it’s accessed (thank you @zyd14!)
- The new
build_asset_reconciliation_sensor
function accepts a set of software-defined assets and returns a sensor that automatically materializes those assets after their parents are materialized. - [dagit] A new "groups-only" asset graph feature flag allows you to zoom way out on the global asset graph, collapsing asset groups into smaller nodes you can double-click to expand.
RepositoryDefinition
now exposes a load_asset_value
method, which accepts an asset key and invokes the asset’s I/O manager’s load_input
function to load the asset as a Python object. This can be used in notebooks to do exploratory data analysis on assets.- Methods to fetch a list of partition keys from an input/output
PartitionKeyRange
now exist on the op execution context and input/output context. - [dagit] On the Instance Overview page, batched runs in the run timeline view will now proportionally reflect the status of the runs in the batch instead of reducing all run statuses to a single color.
- [dagster-dbt][dagster-snowflake] You can now use the Snowflake IO manager with dbt assets, which allows them to be loaded from Snowflake into Pandas DataFrames in downstream steps.
- The dagster package’s pin of the alembic package is now much less restrictive.
- The sensor daemon when using threads will no longer evaluate the next tick for a sensor if the previous one is still in flight. This resolves a memory leak in the daemon process.
- The scheduler will no longer remove tracked state for automatically running schedules when they are absent due to a workspace load error.
- The way user code severs manage repository definitions has been changed to more efficiently serve requests.
- The
@multi_asset
decorator now respects its config_schema
parameter. - [dagit] Config supplied to
define_asset_job
is now prefilled in the modal that pops up when you click the Materialize button on an asset job page, so you can quickly adjust the defaults. - [dagster-dbt] Previously,
DagsterDbtCliError
s produced from the dagster-dbt library would contain large serialized objects representing the raw unparsed logs from the relevant cli command. Now, these messages will contain only the parsed version of these messages. - Fixed an issue where the
deploy_ecs
example didn’t work when built and deployed on an M1 Mac.
- [dagster-fivetran] The
resync_parameters
configuration on the fivetran_resync_op
is now optional, enabling triggering historical re*syncs for connectors. Thanks @dwallace0723!
- Improved API documentation for the Snowflake resource.