Changelog#

0.14.3#

New#

  • When using an executor that runs each op in its own process, exceptions in the Dagster system code that result in the op process failing will now be surfaced in the event log.
  • Introduced new SecretsManager resources to the dagster-aws package to enable loading secrets into Jobs more easily. For more information, seethe documentation.
  • Daemon heartbeats are now processed in a batch request to the database.
  • Job definitions now contain a method called run_request_for_partition, which returns a RunRequest that can be returned in a sensor or schedule evaluation function to launch a run for a particular partition for that job. See our documentation for more information.
  • Renamed the filter class from PipelineRunsFilter => RunsFilter.
  • Assets can now be directly invoked for unit testing.
  • [dagster-dbt] load_assets_from_dbt_project will now attach schema information to the generated assets if it is available in the dbt project (schema.yml).
  • [examples] Added an example that demonstrates using Software Defined Assets with Airbyte, dbt, and custom Python.
  • The default io manager used in the AssetGroup api is now the fs_asset_io_manager.
  • It's now possible to build a job where partitioned assets depend on partitioned assets that are maintained outside the job, and for those upstream partitions to show up on the context in the op and IOManager load_input function.
  • SourceAssets can now be partitioned, by setting the partitions_def argument.

Bugfixes#

  • Fixed an issue where run status sensors would sometimes fire multiple times for the same run if the sensor function raised an error.
  • [ECS] Previously, setting cpu/memory tags on a job would override the ECS task’s cpu/memory, but not individual containers. If you were using a custom task definition that explicitly sets a container’s cpu/memory, the container would not resize even if you resized the task. Now, setting cpu/memory tags on a job overrides both the ECS task’s cpu/memory and the container's cpu/memory.
  • [ECS] Previously, if the EcsRunLauncher launched a run from a task with multiple containers - for example if both dagit and daemon were running in the same task - then the run would be launched with too many containers. Now, the EcsRunLauncher only launches tasks with a single container.
  • Fixed an issue where the run status of job invoked through execute_in_process was not updated properly.
  • Fixed some storage queries that were incompatible with versions of SQLAlchemy<1.4.0.
  • [dagster-dbt] Fixed issue where load_assets_from_dbt_project would fail if models were organized into subdirectories.
  • [dagster-dbt] Fixed issue where load_assets_from_dbt_project would fail if seeds or snapshots were present in the project.

Community Contributions#

  • [dagster-fivetran] A new fivetran_resync_op (along with a corresponding resync_and_poll method on the fivetran_resource) allows you to kick off Fivetran resyncs using Dagster (thanks @dwallace0723!)

  • [dagster-shell] Fixed an issue where large log output could cause operations to hang (thanks @kbd!)

  • [documentation] Fixed export message with dagster home path (thanks @proteusiq)!

  • [documentation] Remove duplicate entries under integrations (thanks @kahnwong)!

UI#

  • Added a small toggle to the right of each graph on the asset details page, allowing them to be toggled on and off.
  • Full asset paths are now displayed on the asset details page.

Documentation#

  • Added API doc entries for validate_run_config.
  • Fixed the example code for the reexecute_pipeline API.
  • TableRecord, TableSchema and its constituents are now documented in the API docs.
  • Docs now correctly use new metadata names MetadataEntry and MetadataValue instead of old ones.

0.14.2#

New#

  • Run status sensors can now be invoked in unit tests. Added build_run_status_sensor_context to help build context objects for run status sensors

Bugfixes#

  • An issue preventing the use of default_value on inputs has been resolved. Previously, a defensive error that did not take default_value in to account was thrown.
  • [dagster-aws] Fixed issue where re-emitting log records from the pyspark_step_launcher would occasionally cause a failure.
  • [dagit] The asset catalog now displays entries for materialized assets when only a subset of repositories were selected. Previously, it only showed the software-defined assets unless all repositories were selected in Dagit.

Community Contributions#

  • Fixed an invariant check in the databricks step launcher that was causing failures when setting the local_dagster_job_package_path config option (Thanks Iswariya Manivannan!)

Documentation#

  • Fixed the example code in the reconstructable API docs.

0.14.1#

New#

  • [dagit] The sensor tick timeline now shows cursor values in the tick tooltip if they exist.

Bugfixes#

  • Pinned dependency on markupsafe to function with existing Jinja2 pin.
  • Sensors that have a default status can now be manually started. Previously, this would fail with an invariant exception.

0.14.0 “Never Felt Like This Before”#

Major Changes#

  • Software-defined assets, which offer a declarative approach to data orchestration on top of Dagster’s core job/op/graph APIs, have matured significantly. Improvements include partitioned assets, a revamped asset details page in Dagit, a cross-repository asset graph view in Dagit, Dagster types on assets, structured metadata on assets, and the ability to materialize ad-hoc selections of assets without defining jobs. Users can expect the APIs to only undergo minor changes before being declared fully stable in Dagster’s next major release. For more information, view the software-defined assets concepts page here.

  • We’ve made it easier to define a set of software-defined assets where each Dagster asset maps to a dbt model. All of the dependency information between the dbt models will be reflected in the Dagster asset graph, while still running your dbt project in a single step.

  • Dagit has a new homepage, dubbed the “factory floor” view, that provides an overview of recent runs of all the jobs. From it, you can monitor the status of each job’s latest run or quickly re-execute a job. The new timeline view reports the status of all recent runs in a convenient gantt chart.

  • You can now write schedules and sensors that default to running as soon as they are loaded in your workspace, without needing to be started manually in Dagit. For example, you can create a sensor like this:

    from dagster import sensor, DefaultSensorStatus
    
    @sensor(job=my_job, default_status=DefaultSensorStatus.RUNNING)
    def my_running_sensor():
        ...
    

    or a schedule like this:

    from dagster import schedule, DefaultScheduleStatus, ScheduleEvaluationContext
    
    @schedule(job=my_job, cron_schedule="0 0 * * *", default_status=DefaultScheduleStatus.RUNNING)
    def my_running_schedule(context: ScheduleEvaluationContext):
        ...
    

    As soon as schedules or sensors with the default_status field set to RUNNING are included in the workspace loaded by your Dagster Daemon, they will begin creating ticks and submitting runs.

  • Op selection now supports selecting ops inside subgraphs. For example, to select an op my_op inside a subgraph my_graph, you can now specify the query as my_graph.my_op. This is supported in both Dagit and Python APIs.

  • Dagster Types can now have attached metadata. This allows TableSchema objects to be attached to Dagster Types via TableSchemaMetadata. A Dagster Type with a TableSchema will have the schema rendered in Dagit.

  • A new Pandera integration (dagster-pandera) allows you to use Pandera’s dataframe validation library to wrap dataframe schemas in Dagster types. This provides two main benefits: (1) Pandera’s rich schema validation can be used for runtime data validation of Pandas dataframes in Dagster ops/assets; (2) Pandera schema information is displayed in Dagit using a new TableSchema API for representing arbitrary table schemas.

  • The new AssetObservation event enables recording metadata about an asset without indicating that the asset has been updated.

  • AssetMaterializations, ExpectationResults, and AssetObservations can be logged via the context of an op using the OpExecutionContext.log_event method. Output metadata can also be logged using the OpExecutionContext.add_output_metadata method. Previously, Dagster expected these events to be yielded within the body of an op, which caused lint errors for many users, made it difficult to add mypy types to ops, and also forced usage of the verbose Output API. Here’s an example of the new invocations:

    from dagster import op, AssetMaterialization
    
    @op
    def the_op(context):
        context.log_event(AssetMaterialization(...))
        context.add_output_metadata({"foo": "bar"})
        ...
    
  • A new Airbyte integration (dagster-airbyte) allows you to kick off and monitor Airbyte syncs from within Dagster. The original contribution from @airbytehq’s own @marcosmarxm includes a resource implementation as well as a pre-built op for this purpose, and we’ve extended this library to support software-defined asset use cases as well. Regardless of which interface you use, Dagster will automatically capture the Airbyte log output (in the compute logs for the relevant steps) and track the created tables over time (via AssetMaterializations).

  • The ECSRunLauncher (introduced in Dagster 0.11.15) is no longer considered experimental. You can bootstrap your own Dagster deployment on ECS using our docker compose example or you can use it in conjunction with a managed Dagster Cloud deployment. Since its introduction, we’ve added the ability to customize Fargate container memory and CPU, mount secrets from AWS SecretsManager, and run with a variety of AWS networking configurations. Join us in #dagster-ecs in Slack!

  • [Helm] The default liveness and startup probes for Dagit and user deployments have been replaced with readiness probes. The liveness and startup probe for the Daemon has been removed. We observed and heard from users that under load, Dagit could fail the liveness probe which would result in the pod restarting. With the new readiness probe, the pod will not restart but will stop serving new traffic until it recovers. If you experience issues with any of the probe changes, you can revert to the old behavior by specifying liveness and startup probes in your Helm values (and reach out via an issue or Slack).

Breaking Changes and Deprecations#

  • The Dagster Daemon now uses the same workspace.yaml file as Dagit to locate your Dagster code. You should ensure that if you make any changes to your workspace.yaml file, they are included in both Dagit’s copy and the Dagster Daemon’s copy. When you make changes to the workspace.yaml file, you don’t need to restart either Dagit or the Dagster Daemon - in Dagit, you can reload the workspace from the Workspace tab, and the Dagster Daemon will periodically check the workspace.yaml file for changes every 60 seconds. If you are using the Dagster Helm chart, no changes are required to include the workspace in the Dagster Daemon.

  • Dagster’s metadata API has undergone a signficant overhaul. Changes include:

    • To reflect the fact that metadata can be specified on definitions in addition to events, the following names are changing. The old names are deprecated, and will function as aliases for the new names until 0.15.0:
      • EventMetadata > MetadataValue
      • EventMetadataEntry > MetadataEntry
      • XMetadataEntryData > XMetadataValue (e.g. TextMetadataEntryData > TextMetadataValue)
    • The metadata_entries keyword argument to events and Dagster types is deprecated. Instead, users should use the metadata keyword argument, which takes a dictionary mapping string labels to MetadataValues.
    • Arbitrary metadata on In/InputDefinition and Out/OutputDefinition is deprecated. In 0.15.0, metadata passed for these classes will need to be resolvable to MetadataValue (i.e. function like metadata everywhere else in Dagster).
    • The description attribute of EventMetadataEntry is deprecated.
    • The static API of EventMetadataEntry (e.g. EventMetadataEntry.text) is deprecated. In 0.15.0, users should avoid constructing EventMetadataEntry objects directly, instead utilizing the metadata dictionary keyword argument, which maps string labels to MetadataValues.
  • In previous releases, it was possible to supply either an AssetKey, or a function that produced an AssetKey from an OutputContext as the asset_key argument to an Out/OutputDefinition. The latter behavior makes it impossible to gain information about these relationships without running a job, and has been deprecated. However, we still support supplying a static AssetKey as an argument.

  • We have renamed many of the core APIs that interact with ScheduleStorage, which keeps track of sensor/schedule state and ticks. The old term for the generic schedule/sensor “job” has been replaced by the term “instigator” in order to avoid confusion with the execution API introduced in 0.12.0. If you have implemented your own schedule storage, you may need to change your method signatures appropriately.

  • Dagit is now powered by Starlette instead of Flask. If you have implemented a custom run coordinator, you may need to make the following change:

    from flask import has_request_context, request
    
    def submit_run(self, context: SubmitRunContext) -> PipelineRun:
        jwt_claims_header = (
            request.headers.get("X-Amzn-Oidc-Data", None) if has_request_context() else None
        )
    

    Should be replaced by:

    def submit_run(self, context: SubmitRunContext) -> PipelineRun:
        jwt_claims_header = context.get_request_header("X-Amzn-Oidc-Data")
    
  • Dagit

    • Dagit no longer allows non-software-defined asset materializations to be be graphed or grouped by partition. This feature could render in incorrect / incomplete ways because no partition space was defined for the asset.
    • Dagit’s “Jobs” sidebar now collapses by default on Instance, Job, and Asset pages. To show the left sidebar, click the “hamburger” icon in the upper left.
    • “Step Execution Time” is no longer graphed on the asset details page in Dagit, which significantly improves page load time. To view this graph, go to the asset graph for the job, uncheck “View as Asset Graph” and click the step to view its details.
    • The “experimental asset UI” feature flag has been removed from Dagit, this feature is shipped in 0.14.0!
  • The Dagster Daemon now requires a workspace.yaml file, much like Dagit.

  • Ellipsis (“...”) is now an invalid substring of a partition key. This is because Dagit accepts an ellipsis to specify partition ranges.

  • [Helm] The Dagster Helm chart now only supported Kubernetes clusters above version 1.18.

New since 0.13.19#

  • Software Defined Assets:

    • In Dagit, the Asset Catalog now offers a third display mode - a global graph of your software-defined assets.

    • The Asset Catalog now allows you to filter by repository to see a subset of your assets, and offers a “View in Asset Graph” button for quickly seeing software-defined assets in context.

    • The Asset page in Dagit has been split into two tabs, “Activity” and “Definition”.

    • Dagit now displays a warning on the Asset page if the most recent run including the asset’s step key failed without yielding a materialization, making it easier to jump to error logs.

    • Dagit now gives you the option to view jobs with software-defined assets as an Asset Graph (default) or as an Op Graph, and displays asset <-> op relationships more prominently when a single op yields multiple assets.

    • You can now include your assets in a repository with the use of an AssetGroup. Each repository can only have one AssetGroup, and it can provide a jumping off point for creating the jobs you plan on using from your assets.

      from dagster import AssetGroup, repository, asset
      
      @asset(required_resource_keys={"foo"})
      def asset1():
          ...
      
      @asset
      def asset2():
          ...
      
      @repository
      def the_repo():
          asset_group = AssetGroup(assets=[asset1, asset2], resource_defs={"foo": ...})
          return [asset_group, asset_group.build_job(selection="asset1-")]
      
    • AssetGroup.build_job supports a selection syntax similar to that found in op selection.

  • Asset Observations:

    • You can now yield AssetObservations to log metadata about a particular asset from beyond its materialization site. AssetObservations appear on the asset details page alongside materializations and numerical metadata is graphed. For assets with software-defined partitions, materialized and observed metadata about each partition is rolled up and presented together. For more information, view the docs page here.
    • Added an asset_observations_for_node method to ExecuteInProcessResult for fetching the AssetObservations from an in-process execution.
  • Dagster Types with an attached TableSchemaMetadataValue now render the schema in Dagit UI.

  • [dagster-pandera] New integration library dagster-pandera provides runtime validation from the Pandera dataframe validation library and renders table schema information in Dagit.

  • OpExecutionContext.log_event provides a way to log AssetMaterializations, ExpectationResults, and AssetObservations from the body of an op without having to yield anything. Likewise, you can use OpExecutionContext.add_output_metadata to attach metadata to an output without having to explicitly use the Output object.

  • OutputContext.log_event provides a way to log AssetMaterializations from within the handle_output method of an IO manager without yielding. Likewise, output metadata can be added using OutputContext.add_output_metadata.

  • [dagster-dbt] The load_assets_from_dbt_project function now returns a set of assets that map to a single dbt run command (rather than compiling each dbt model into a separate step). It also supports a new node_info_to_asset_key argument which allows you to customize the asset key that will be used for each dbt node.

  • [dagster-airbyte] The dagster-airbyte integration now collects the Airbyte log output for each run as compute logs, and generates AssetMaterializations for each table that Airbyte updates or creates.

  • [dagster-airbyte] The dagster-airbyte integration now supports the creation of software-defined assets, with the build_airbyte_assets function.

  • [dagster-fivetran] The dagster-fivetran integration now supports the creation of software-defined assets with the build_fivetran_assets function.

  • The multiprocess executor now supports choosing between spawn or forkserver for how its subprocesses are created. When using forkserver we attempt to intelligently preload modules to reduce the per-op overhead.

  • [Helm] Labels can now be set on the Dagit and daemon deployments.

  • [Helm] The default liveness and startup probes for Dagit and user deployments have been replaced with readiness probes. The liveness and startup probe for the Daemon has been removed. We observed and heard from users that under load, Dagit could fail the liveness probe which would result in the pod restarting. With the new readiness probe, the pod will not restart but will stop serving new traffic until it recovers. If you experience issues with any of the probe changes, you can revert to the old behavior by specifying liveness and startup probes in your Helm values (and reach out via an issue or Slack).

  • [Helm] The Ingress v1 is now supported.

Community Contributions#

  • Typo fix from @jiafi, thank you!

Bugfixes#

  • Fixed an issue where long job names were truncated prematurely in the Jobs page in Dagit.
  • Fixed an issue where loading the sensor timeline would sometimes load slowly or fail with a timeout error.
  • Fixed an issue where the first time a run_status_sensor executed, it would sometimes run very slowly or time out.
  • Fixed an issue where Launchpad mistakenly defaulted with invalid subset error in Dagit.
  • Multi-component asset keys can now be used in the asset graph filter bar.
  • Increased the storage query statement timeout to better handle more complex batch queries.
  • Added fallback support for older versions of sqlite to service top-level repository views in Dagit (e.g. the top-level jobs, schedule, and sensor pages).

Documentation#

  • Images in the documentation now enlarge when clicked.
  • New example in examples/bollinger demonstrates dagster-pandera and TableSchema , and software-defined assets in the context of analyzing stock price data.

0.13.19#

New#

  • [dagit] Various performance improvements for asset graph views.
  • [dagster-aws] The EcsRunLauncher can now override the secrets_tag parameter to None, which will cause it to not look for any secrets to be included in the tasks for the run. This can be useful in situations where the run launcher does not have permissions to query AWS Secretsmanager.

Bugfixes#

  • [dagster-mysql] For instances using MySQL for their run storage, runs created using dagster versions 0.13.17 / 0.13.18 might display an incorrect timestamp for its start time on the Runs page. Running the dagster instance migrate CLI command should resolve the issue.