Changelog#

0.7.12#

Bugfix

  • We now only render the subset of an execution plan that has actually executed, and persist that subset information along with the snapshot.
  • @pipeline and @composite_solid now correctly capture __doc__ from the function they decorate.
  • Fixed a bug with using solid subsets in the Dagit playground

0.7.11#

Bugfix

  • Fixed an issue with strict snapshot ID matching when loading historical snapshots, which caused errors on the Runs page when viewing historical runs.
  • Fixed an issue where dagster_celery had introduced a spurious dependency on dagster_k8s (#2435)
  • Fixed an issue where our Airflow, Celery, and Dask integrations required S3 or GCS storage and prevented use of filesystem storage. Filesystem storage is now also permitted, to enable use of these integrations with distributed filesystems like NFS (#2436).

0.7.10#

New

  • RepositoryDefinition now takes schedule_defs and partition_set_defs directly. The loading scheme for these definitions via repository.yaml under the scheduler: and partitions: keys is deprecated and expected to be removed in 0.8.0.
  • Mark published modules as python 3.8 compatible.
  • The dagster-airflow package supports loading all Airflow DAGs within a directory path, file path, or Airflow DagBag.
  • The dagster-airflow package supports loading all 23 DAGs in Airflow example_dags folder and execution of 17 of them (see: make_dagster_repo_from_airflow_example_dags).
  • The dagster-celery CLI tools now allow you to pass additional arguments through to the underlying celery CLI, e.g., running dagster-celery worker start -n my-worker -- --uid=42 will pass the --uid flag to celery.
  • It is now possible to create a PresetDefinition that has no environment defined.
  • Added dagster schedule debug command to help debug scheduler state.
  • The SystemCronScheduler now verifies that a cron job has been successfully been added to the crontab when turning a schedule on, and shows an error message if unsuccessful.

Breaking Changes

  • A dagster instance migrate is required for this release to support the new experimental assets view.
  • Runs created prior to 0.7.8 will no longer render their execution plans as DAGs. We are only rendering execution plans that have been persisted. Logs are still available.
  • Path is no longer valid in config schemas. Use str or dagster.String instead.
  • Removed the @pyspark_solid decorator - its functionality, which was experimental, is subsumed by requiring a StepLauncher resource (e.g. emr_pyspark_step_launcher) on the solid.

Dagit

  • Merged "re-execute", "single-step re-execute", "resume/retry" buttons into one "re-execute" button with three dropdown selections on the Run page.

Experimental

  • Added new asset_key string parameter to Materializations and created a new “Assets” tab in Dagit to view pipelines and runs associated with these keys. The API and UI of these asset-based are likely to change, but feedback is welcome and will be used to inform these changes.
  • Added an emr_pyspark_step_launcher that enables launching PySpark solids in EMR. The "simple_pyspark" example demonstrates how it’s used.

Bugfix

  • Fixed an issue when running Jupyter notebooks in a Python 2 kernel through dagstermill with Dagster running in Python 3.
  • Improved error messages produced when dagstermill spins up an in-notebook context.
  • Fixed an issue with retrieving step events from CompositeSolidResult objects.

0.7.9#

Breaking Changes

  • If you are launching runs using DagsterInstance.launch_run, this method now takes a run id instead of an instance of PipelineRun. Additionally, DagsterInstance.create_run and DagsterInstance.create_empty_run have been replaced by DagsterInstance.get_or_create_run and DagsterInstance.create_run_for_pipeline.
  • If you have implemented your own RunLauncher, there are two required changes:
    • RunLauncher.launch_run takes a pipeline run that has already been created. You should remove any calls to instance.create_run in this method.
    • Instead of calling startPipelineExecution (defined in the dagster_graphql.client.query.START_PIPELINE_EXECUTION_MUTATION) in the run launcher, you should call startPipelineExecutionForCreatedRun (defined in dagster_graphql.client.query.START_PIPELINE_EXECUTION_FOR_CREATED_RUN_MUTATION).
    • Refer to the RemoteDagitRunLauncher for an example implementation.

New

  • Improvements to preset and solid subselection in the playground. An inline preview of the pipeline instead of a modal when doing subselection, and the correct subselection is chosen when selecting a preset.
  • Improvements to the log searching. Tokenization and autocompletion for searching messages types and for specific steps.
  • You can now view the structure of pipelines from historical runs, even if that pipeline no longer exists in the loaded repository or has changed structure.
  • Historical execution plans are now viewable, even if the pipeline has changed structure.
  • Added metadata link to raw compute logs for all StepStart events in PipelineRun view and Step view.
  • Improved error handling for the scheduler. If a scheduled run has config errors, the errors are persisted to the event log for the run and can be viewed in Dagit.

Bugfix

  • No longer manually dispose sqlalchemy engine in dagster-postgres
  • Made boto3 dependency in dagster-aws more flexible (#2418)
  • Fixed tooltip UI cleanup in partitioned schedule view

Documentation

  • Brand new documentation site, available at https://docs.dagster.io
  • The tutorial has been restructured to multiple sections, and the examples in intro_tutorial have been rearranged to separate folders to reflect this.

0.7.8#

Breaking Changes

  • The execute_pipeline_with_mode and execute_pipeline_with_preset APIs have been dropped in favor of new top level arguments to execute_pipeline, mode and preset.
  • The use of RunConfig to pass options to execute_pipeline has been deprecated, and RunConfig will be removed in 0.8.0.
  • The execute_solid_within_pipeline and execute_solids_within_pipeline APIs, intended to support tests, now take new top level arguments mode and preset.

New

  • The dagster-aws Redshift resource now supports providing an error callback to debug failed queries.
  • We now persist serialized execution plans for historical runs. They will render correctly even if the pipeline structure has changed or if it does not exist in the current loaded repository.
  • Clicking on a pipeline tag in the Runs view will apply that tag as a filter.

Bugfix

  • Fixed a bug where telemetry logger would create a log file (but not write any logs) even when telemetry was disabled.

Experimental

  • The dagster-airflow package supports ingesting Airflow dags and running them as dagster pipelines (see: make_dagster_pipeline_from_airflow_dag). This is in the early experimentation phase.
  • Improved the layout of the experimental partition runs table on the Schedules detailed view.

Documentation

  • Fixed a grammatical error (Thanks @flowersw!)