Changelog#

0.9.8#

New

Support for the Dagster step selection DSL: reexecute_pipeline now takes step_selection, which accepts queries like *solid_a.compute++ (i.e., solid_a.compute, all of its ancestors, its immediate descendants, and their immediate descendants). steps_to_execute is deprecated and will be removed in 0.10.0.

Community contributions

Bugfixes

Fixed a bug that pipeline-level hooks were not correctly applied on a pipeline subset.
Improved error messages when execute command can't load a code pointer.
Fixed a bug that prevented serializing Spark intermediates with configured intermediate storages.

Dagit

Enabled subset reexecution via Dagit when part of the pipeline is still running.
Made Schedules clickable and link to View All page in the schedule section.
Various Dagit UI improvements.

Experimental

[lakehouse] Added CLI command for building and executing a pipeline that updates a given set of assets: house update --module package.module —assets my_asset*

Documentation

Bugfixes

Fixed an issue in the dagstermill library that caused solid config fetch to be non-deterministic.
Fixed an issue in the K8sScheduler where multiple pipeline runs were kicked off for each scheduled execution.

New

Added ADLS2 storage plugin for Spark DataFrame (Thanks @sd2k!)
Added feature in the Dagit Playground to automatically remove extra configuration that does not conform to a pipeline’s config schema.
[Dagster-Celery/Celery-K8s/Celery-Docker] Added Celery worker names and pods to the logs for each step execution

Community contributions

Re-enabled dagster-azure integration tests in dagster-databricks tests (Thanks @sd2k!)
Moved dict_without_keys from dagster-pandas into dagster.utils (Thanks @DavidKatz-il)
Moved Dask DataFrame read/to options under read/to keys (Thanks @kinghuang)

Bugfixes

Fixed helper for importing data from GCS paths into Bigquery (Thanks @grabangomb (https://github.com/grabangomb)!)
Postgres event storage now waits to open a thread to watch runs until it is needed

Experimental

Added version computation function for DagsterTypeLoader. (Actual versioning will be supported in 0.10.0)
Added version attribute to solid and SolidDefinition. (Actual versioning will be supported in 0.10.0)

New

Bugfixes

[dagstermill] fixes an issue with output notebooks and s3 storage
[dagster_celery] bug fixed in pythonpath calculation (thanks @enima2648!)
[dagster_pandas] marked create_structured_dataframe_type and ConstraintWithMetadata as experimental APIs
[dagster_k8s] reduced default job backoff limit to 0

Docs

Breaking Changes

When using the configured API on a solid or composite solid, a new solid name must be provided.
The image used by the K8sScheduler to launch scheduled executions is now specified under the “scheduler” section of the Helm chart (previously under “pipeline_run” section).

New

Added an experimental mode that speeds up interactions in dagit by launching a gRPC server on startup for each repository location in your workspace. To enable it, add the following to your dagster.yaml:


opt_in:
  local_servers: true

Intermediate Storage and System Storage now default to the first provided storage definition when no configuration is provided. Previously, it would be necessary to provide a run config for storage whenever providing custom storage definitions, even if that storage required no run configuration. Now, if the first provided storage definition requires no run configuration, the system will default to using it.
Added a timezone picker to Dagit, and made all timestamps timezone-aware
Added solid_config to hook context which provides the access to the config schema variable of the corresponding solid.
Hooks can be directly set on PipelineDefinition or @pipeline, e.g. @pipeline(hook_defs={hook_a}). It will apply the hooks on every single solid instance within the pipeline.
Added Partitions tab for partitioned pipelines, with new backfill selector.