Changelog#

1.1.8 (core) / 0.17.8 (libraries)#

New#

  • Asset backfills launched from the asset graph now respect partition mappings. For example, if partition N of asset2 depends on partition N-1 of asset1, and both of those partitions are included in a backfill, asset2’s partition N won’t be backfilled until asset1’s partition N-1 has been materialized.
  • Asset backfills launched from the asset graph will now only materialize each non-partitioned asset once - after all upstream partitions within the backfill have been materialized.
  • Executors can now be configured with a tag_concurrency_limits key that allows you to specify limits on the number of ops with certain tags that can be executing at once within a single run. See the docs for more information.
  • ExecuteInProcessResult, the type returned by materialize, materialize_to_memory, and execute_in_process, now has an asset_value method that allows you to fetch output values by asset key.
  • AssetIns can now accept Nothing for their dagster_type, which allows omitting the input from the parameters of the @asset- or @multi_asset- decorated function. This is useful when you want to specify a partition mapping or metadata for a non-managed input.
  • The start_offset and end_offset arguments of TimeWindowPartitionMapping now work across TimeWindowPartitionsDefinitions with different start dates and times.
  • If add_output_metadata is called multiple times within an op, asset, or IO manager handle_output, the values will now be merged, instead of later dictionaries overwriting earlier ones.
  • materialize and materialize_to_memory now both accept a tags argument.
  • Added SingleDimensionDependencyMapping, a PartitionMapping object that defines a correspondence between an upstream single-dimensional partitions definition and a downstream MultiPartitionsDefinition.
  • The RUN_DEQUEUED event has been removed from the event log, since it was duplicative with the RUN_STARTING event.
  • When an Exception is raised during the execution of an op or asset, Dagit will now include the original Exception that was raised, even if it was caught and another Exception was raised instead. Previously, Dagit would only show exception chains if the Exception was included using the raise Exception() from e syntax.
  • [dagit] The Asset Catalog table in Dagit is now a virtualized infinite-scroll table. It is searchable and filterable just as before, and you can now choose assets for bulk materialization without having to select across pages.
  • [dagit] Restored some metadata to the Code Locations table, including image, python file, and module name.
  • [dagit] Viewing a partition on the asset details page now shows both the latest materialization and also all observations about that materialization.
  • [dagit] Improved performance of the loading time for the backfills page
  • [dagit] Improved performance when materializing assets with very large partition sets
  • [dagit] Moving around asset and op graphs while selecting nodes is easier - drag gestures no longer clear your selection.
  • [dagster-k8s] The Dagster Helm chart now allows you to set an arbitrary kubernetes config dictionary to be included in the launched job and pod for each run, using the runK8sConfig key in the k8sRunLauncher section. See the docs for more information.
  • [dagster-k8s] securityContext can now be set in the k8sRunLauncher section of the Dagster Helm chart.
  • [dagster-aws] The EcsRunLauncher can now be configured with cpu and memory resources for each launched job. Previously, individual jobs needed to be tagged with CPU and memory resources. See the docs for more information.
  • [dagster-aws] The S3ComputeLogManager now takes in an argument upload_extra_args which are passed through as the ExtraArgs parameter to the file upload call.
  • [dagster-airflow] added make_dagster_definitions_from_airflow_dags_path and make_dagster_definitions_from_airflow_dag_bag which are passed through as the ExtraArgs parameter to the file upload call.

Bugfixes#

  • Fixed a bug where ad-hoc materializations of assets were not correctly retrieving metadata of upstream assets.
  • Fixed a bug that caused ExperimentalWarnings related to LogicalVersions to appear even when version-based staleness was not in use.
  • Fixed a bug in the asset reconciliation sensor that caused multi-assets to be reconciled when some, but not all, of the assets they depended on, were reconciled.
  • Fixed a bug in the asset reconciliation sensor that caused it to only act on one materialization per asset per tick, even when multiple partitions of an asset were materialized.
  • Fixed a bug in the asset reconciliation sensor that caused it to never attempt to rematerialize assets which failed in their last execution. Now, it will launch the next materialization for a given asset at the same time that it would have if the original run had completed successfully.
  • The load_assets_from_modules and load_assets_from_package_module utilities now will also load cacheable assets from the specified modules.
  • The dequeue_num_workers config setting on QueuedRunCoordinatoris now respected.
  • [dagit] Fixed a bug that caused a “Maximum recursion depth exceeded” error when viewing partitioned assets with self-dependencies.
  • [dagit] Fixed a bug where “Definitions loaded” notifications would constantly show up in cases where there were multiple dagit hosts running.
  • [dagit] Assets that are partitioned no longer erroneously appear "Stale" in the asset graph.
  • [dagit] Assets with a freshness policy no longer appear stale when they are still meeting their freshness policy.
  • [dagit] Viewing Dagit in Firefox no longer results in erroneous truncation of labels in the left sidebar.
  • [dagit] Timestamps on the asset graph are smaller and have an appropriate click target.
  • [dagster-databricks] The databricks_pyspark_step_launcher will now cancel the relevant databricks job if the Dagster step execution is interrupted.
  • [dagster-databricks] Previously, the databricks_pyspark_step_launcher could exit with an unhelpful error after receiving an HTTPError from databricks with an empty message. This has been fixed.
  • [dagster-snowflake] Fixed a bug where calling execute_queries or execute_query on a snowflake_resource would raise an error unless the parameters argument was explicitly set.
  • [dagster-aws] Fixed a bug in the EcsRunLauncher when launching many runs in parallel. Previously, each run risked hitting a ClientError in AWS for registering too many concurrent changes to the same task definition family. Now, the EcsRunLauncher recovers gracefully from this error by retrying it with backoff.
  • [dagster-airflow] Added make_dagster_definitions_from_airflow_dags_path and make_dagster_definitions_from_airflow_dag_bag for creating Dagster definitions from a given airflow Dag file path or DagBag

Community Contributions#

  • Fixed a metadata loading error in UPathIOManager, thanks @danielgafni!
  • [dagster-aws]FakeS3Session now includes additional functions and improvements to align with the boto3 S3 client API, thanks @asharov!
  • Typo fix from @vpicavet, thank you!
  • Repository license file year and company update, thanks @vwbusguy!

Experimental#

  • Added experimental BranchingIOManager to model use case where you wish to read upstream assets from production environments and write them into a development environment.
  • Add create_repository_using_definitions_args to allow for the creation of named repositories.
  • Added the ability to use Python 3 typing to define and access op and asset config.
  • [dagster-dbt] Added DbtManifestAssetSelection, which allows you to define selections of assets loaded from a dbt manifest using dbt selection syntax (e.g. tag:foo,path:marts/finance).

Documentation#

  • There’s now only one Dagster Cloud Getting Started guide, which includes instructions for both Hybrid and Serverless deployment setups.
  • Lots of updates throughout the docs to clean up remaining references to @repository, replacing them with Definitions.
  • Lots of updates to the dagster-airflow documentation, a tutorial for getting started with Dagster from an airflow background, a migration guide for going to Dagster from Airflow and a terminology/concept map for Airflow onto Dagster.

1.1.7 (core) / 0.17.7 (libraries)#

New#

  • Definitions is no longer marked as experimental and is the preferred API over @repository for new users of Dagster. Examples, tutorials, and documentation have largely ported to this new API. No migration is needed. Please see GitHub discussion for more details.
  • The “Workspace” section of Dagit has been removed. All definitions for your code locations can be accessed via the “Deployment” section of the app. Just as in the old Workspace summary page, each code location will show counts of its available jobs, assets, schedules, and sensors. Additionally, the code locations page is now available at /locations.
  • Lagged / rolling window partition mappings: TimeWindowPartitionMapping now accepts start_offset and end_offset arguments that allow specifying that time partitions depend on earlier or later time partitions of upstream assets.
  • Asset partitions can now depend on earlier time partitions of the same asset. The asset reconciliation sensor will respect these dependencies when requesting runs.
  • dagit can now accept multiple arguments for the -m and -f flags. For each argument a new code location is loaded.
  • Schedules created by build_schedule_from_partitioned_job now execute more performantly - in constant time, rather than linear in the number of partitions.
  • The QueuedRunCoordinator now supports options dequeue_use_threads and dequeue_num_workers options to enable concurrent run dequeue operations for greater throughput.
  • [dagster-dbt] load_assets_from_dbt_project, load_assets_from_dbt_manifest, and load_assets_from_dbt_cloud_job now support applying freshness policies to loaded nodes. To do so, you can apply dagster_freshness_policy config directly in your dbt project, i.e. config(dagster_freshness_policy={"maximum_lag_minutes": 60}) would result in the corresponding asset being assigned a FreshnessPolicy(maximum_lag_minutes=60).
  • The DAGSTER_RUN_JOB_NAME environment variable is now set in containerized environments spun up by our run launchers and executor.
  • [dagster-airflow] make_dagster_repo_from_airflow_dags_path ,make_dagster_job_from_airflow_dag and make_dagster_repo_from_airflow_dag_bag have a new connections parameter which allows for configuring the airflow connections used by migrated dags.

Bugfixes#

  • Fixed a bug where the log property was not available on the RunStatusSensorContext context object provided for run status sensors for sensor logging.

  • Fixed a bug where the re-execute button on runs of asset jobs would incorrectly show warning icon, indicating that the pipeline code may have changed since you last ran it.

  • Fixed an issue which would cause metadata supplied to graph-backed assets to not be viewable in the UI.

  • Fixed an issue where schedules often took up to 5 seconds to start after their tick time.

  • Fixed an issue where Dagster failed to load a dagster.yaml file that specified the folder to use for sqlite storage in the dagster.yaml file using an environment variable.

  • Fixed an issue which would cause the k8s/docker executors to unnecessarily reload CacheableAssetsDefinitions (such as those created when using load_assets_from_dbt_cloud_job) on each step execution.

  • [dagster-airbyte] Fixed an issue where Python-defined Airbyte sources and destinations were occasionally recreated unnecessarily.

  • Fixed an issue with build_asset_reconciliation_sensor that would cause it to ignore in-progress runs in some cases.

  • Fixed a bug where GQL errors would be thrown in the asset explorer when a previously materialized asset had its dependencies changed.

  • [dagster-airbyte] Fixed an error when generating assets for normalization table for connections with non-object streams.

  • [dagster-dbt] Fixed an error where dbt Cloud jobs with dbt run and dbt run-operation were incorrectly validated.

  • [dagster-airflow] use_ephemeral_airflow_db now works when running within a PEX deployment artifact.

Documentation#

  • New documentation for Code locations and how to define one using Definitions
  • Lots of updates throughout the docs to reflect the recommended usage of Definitions. Any content not ported to Definitions in this release is in the process of being updated.
  • New documentation for dagster-airflow on how to start writing dagster code from an airflow background.

1.1.6 (core) / 0.17.6 (libraries)#

New#

  • [dagit] Throughout Dagit, when the default repository name __repository__ is used for a repo, only the code location name will be shown. This change also applies to URL paths.
  • [dagster-dbt] When attempting to generate software-defined assets from a dbt Cloud job, an error is now raised if none are created.
  • [dagster-dbt] Software-defined assets can now be generated for dbt Cloud jobs that execute multiple commands.

Bugfixes#

  • Fixed a bug that caused load_asset_value to error with the default IO manager when a partition_key argument was provided.
  • Previously, trying to access context.partition_key or context.asset_partition_key_for_output when invoking an asset directly (e.g. in a unit test) would result in an error. This has been fixed.
  • Failure hooks now receive the original exception instead of RetryRequested when using a retry policy.
  • The LocationStateChange GraphQL subscription has been fixed (thanks @**roeij !)
  • Fixed a bug where a sqlite3.ProgrammingError error was raised when creating an ephemeral DagsterInstance, most commonly when build_resources was called without passing in an instance parameter.
  • [dagstermill] Jupyter notebooks now correctly render in Dagit on Windows machines.
  • [dagster-duckdb-pyspark] New duckdb_pyspark_io_manager helper to automatically create a DuckDB I/O manager that can store and load PySpark DataFrames.
  • [dagster-mysql] Fixed a bug where versions of mysql < 8.0.31 would raise an error on some run queries.
  • [dagster-postgres] connection url param “options“ are no longer overwritten in dagit.
  • [dagit] Dagit now allows backfills to be launched for asset jobs that have partitions and required config.
  • [dagit] Dagit no longer renders the "Job in repo@location" label incorrectly in Chrome v109.
  • [dagit] Dagit's run list now shows improved labels on asset group runs of more than three assets
  • [dagit] Dagit's run gantt chart now renders per-step resource initialization markers correctly.
  • [dagit] In op and asset descriptions in Dagit, rendered markdown no longer includes extraneous escape slashes.
  • Assorted typos and omissions fixed in the docs — thanks @C0DK and @akan72!

Experimental#

  • As an optional replacement of the workspace/repository concepts, a new Definitions entrypoint for tools and the UI has been added. A single Definitions object per code location may be instantiated, and accepts typed, named arguments, rather than the heterogenous list of definitions returned from an @repository-decorated function. To learn more about this feature, and provide feedback, please refer to the Github Discussion.
  • [dagster-slack] A new make_slack_on_freshness_policy_status_change_sensor allows you to create a sensor to alert you when an asset is out of date with respect to its freshness policy (and when it’s back on time!)

Documentation#

1.1.5 (core) / 0.17.5 (libraries)#

Bugfixes#

  • [dagit] Fixed an issue where the Partitions tab sometimes failed to load for asset jobs.

1.1.4 (core) / 0.17.4 (libraries)#

Community Contributions#

  • Fixed a typo in GCSComputeLogManager docstring (thanks reidab)!
  • [dagster-airbyte] job cancellation on run termination is now optional. (Thanks adam-bloom)!
  • [dagster-snowflake] Can now specify snowflake role in config to snowflake io manager (Thanks binhnefits)!
  • [dagster-aws] A new AWS systems manager resource (thanks zyd14)!
  • [dagstermill] Retry policy can now be set on dagstermill assets (thanks nickvazz)!
  • Corrected typo in docs on metadata (thanks C0DK)!

New#

  • Added a job_name parameter to InputContext
  • Fixed inconsistent io manager behavior when using execute_in_process on a GraphDefinition (it would use the fs_io_manager instead of the in-memory io manager)
  • Compute logs will now load in Dagit even when websocket connections are not supported.
  • [dagit] A handful of changes have been made to our URLs:
    • The /instance URL path prefix has been removed. E.g. /instance/runs can now be found at /runs.
    • The /workspace URL path prefix has been changed to /locations. E.g. the URL for job my_job in repository foo@bar can now be found at /locations/foo@bar/jobs/my_job.
  • [dagit] The “Workspace” navigation item in the top nav has been moved to be a tab under the “Deployment” section of the app, and is renamed to “Definitions”.
  • [dagstermill] Dagster events can now be yielded from asset notebooks using dagstermill.yield_event.
  • [dagstermill] Failed notebooks can be saved for inspection and debugging using the new save_on_notebook_failure parameter.
  • [dagster-airflow] Added a new option use_ephemeral_airflow_db which will create a job run scoped airflow db for airflow dags running in dagster
  • [dagster-dbt] Materializing software-defined assets using dbt Cloud jobs now supports partitions.
  • [dagster-dbt] Materializing software-defined assets using dbt Cloud jobs now supports subsetting. Individual dbt Cloud models can be materialized, and the proper filters will be passed down to the dbt Cloud job.
  • [dagster-dbt] Software-defined assets from dbt Cloud jobs now support configurable group names.
  • [dagster-dbt] Software-defined assets from dbt Cloud jobs now support configurable AssetKeys.

Bugfixes#

  • Fixed regression starting in 1.0.16 for some compute log managers where an exception in the compute log manager setup/teardown would cause runs to fail.
  • The S3 / GCS / Azure compute log managers now sanitize the optional prefix argument to prevent badly constructed paths.
  • [dagit] The run filter typeahead no longer surfaces key-value pairs when searching for tag:. This resolves an issue where retrieving the available tags could cause significant performance problems. Tags can still be searched with freeform text, and by adding them via click on individual run rows.
  • [dagit] Fixed an issue in the Runs tab for job snapshots, where the query would fail and no runs were shown.
  • [dagit] Schedules defined with cron unions displayed “Invalid cron string” in Dagit. This has been resolved, and human-readable versions of all members of the union will now be shown.

Breaking Changes#

  • You can no longer set an output’s asset key by overriding get_output_asset_key on the IOManager handling the output. Previously, this was experimental and undocumented.

Experimental#

  • Sensor and schedule evaluation contexts now have an experimental log property, which log events that can later be viewed in Dagit. To enable these log views in dagit, navigate to the user settings and enable the Experimental schedule/sensor logging view option. Log links will now be available for sensor/schedule ticks where logs were emitted. Note: this feature is not available for users using the NoOpComputeLogManager.