This topic lists all the metrics that Airflow can generate over the StatsD.
The following table lists the metrics for Airflow Gauges.
Name | Description |
---|---|
dagbag_size | Number of DAGs found when the scheduler ran a scan based on its configuration. |
dag_processing.import_errors | Number of errors found while trying to parse DAG files. |
dag_processing.total_parse_time | Seconds taken to scan and import all DAG files together. |
dag_processing.last_run.seconds_ago.<dag_file> | Seconds since <dag_file> was last processed. |
scheduler.tasks.running | Number of tasks running in an executor. |
scheduler.tasks.starving | Number of tasks that cannot be scheduled because of no open slot in pool. |
scheduler.tasks.executable | Number of tasks that are ready for execution (set to queued) with respect to pool limits, dag concurrency, executor state, and priority. |
executor.open_slots | Number of open slots on executor. |
executor.queued_tasks | Number of queued tasks on executor. |
executor.running_tasks | Number of running tasks on executor. |
pool.open_slots.<pool_name> | Number of open slots in the pool. |
pool.queued_slots.<pool_name> | Number of queued slots in the pool. |
pool.running_slots.<pool_name> | Number of running slots in the pool. |
pool.starving_tasks.<pool_name> | Number of starving tasks in the pool. |
triggers.running | Number of triggers currently running (per triggerer). |
The following table lists the metrics for Airflow Counters
Name | Description |
---|---|
<job_name>_start | Number of started <job_name> job. For example, SchedulerJob, LocalTaskJob. |
<job_name>_end | Number of ended <job_name> job. For example, SchedulerJob, LocalTaskJob. |
<job_name>_heartbeat_failure | Number of failed Heartbeats for a <job_name> job. For example, SchedulerJob, LocalTaskJob. |
local_task_job.task_exit.<job_id>.<dag_id>.<task_id>.<return_code> | Number of LocalTaskJob terminations with a <return_code> while running a task <task_id> of a DAG <dag_id>. |
operator_failures_<operator_name> | Operator <operator_name> failures. |
operator_successes_<operator_name> | Operator <operator_name> successes. |
ti_failures | Overall task instance failures. |
ti_successes | Overall task instance successes. |
previously_succeeded | Number of previously succeeded task instances. |
zombies_killed | Zombie tasks killed. |
scheduler_heartbeat | Scheduler heartbeats. |
dag_processing.processes | Number of currently running DAG parsing processes. |
dag_processing.processor_timeouts | Number of file processors that have been killed due to long time. |
dag_file_processor_timeouts | Number of file processors that have been killed due to long time. |
dag_processing.manager_stalls | Number of stalled DagFileProcessorManager. |
dag_file_refresh_error | Number of failures loading any DAG files. |
scheduler.tasks.killed_externally | Number of tasks killed externally. |
scheduler.orphaned_tasks.cleared | Number of Orphaned tasks cleared by the Scheduler. |
scheduler.orphaned_tasks.adopted | Number of Orphaned tasks adopted by the Scheduler. |
scheduler.critical_section_busy | Count of times a scheduler process tried to get a lock on the critical section (needed to send tasks to the executor) and found it locked by another process. |
sla_missed | Number of SLA misses. |
sla_callback_notification_failure | Number of failed SLA miss callback notification attempts. |
sla_email_notification_failure | Number of failed SLA miss email notification attempts. |
ti.start.<dag_id>.<task_id> | Number of started task in a given dag. Similar to <job_name>_start but for task. |
ti.finish.<dag_id>.<task_id>.<state> | Number of completed task in a given dag. Similar to <job_name>_end but for task. |
dag.callback_exceptions | Number of exceptions raised from DAG callbacks. When this happens, it means DAG callback is not working. |
celery.task_timeout_error | Number of AirflowTaskTimeout errors raised when publishing Task to Celery Broker. |
celery.execute_command.failure | Number of non-zero exit code from Celery task. |
task_removed_from_dag.<dag_id> | Number of tasks removed for a given dag (i.e. task no longer exists in DAG). |
task_restored_to_dag.<dag_id> | Number of tasks restored for a given dag (i.e. task instance which was previously in REMOVED state in the DB is added to DAG file). |
task_instance_created-<operator_name> | Number of tasks instances created for a given Operator. |
triggers.blocked_main_thread | Number of triggers that blocked the main thread (likely due to not being fully asynchronous). |
triggers.failed | Number of triggers that errored before they could fire an event. |
triggers.succeeded | Number of triggers that have fired at least one event. |
The following table lists the metrics for Airflow timers.
Name | Description |
---|---|
dagrun.dependency-check.<dag_id> | Milliseconds taken to check DAG dependencies. |
dag.<dag_id>.<task_id>.duration | Milliseconds taken to finish a task. |
dag_processing.last_duration.<dag_file> | Milliseconds taken to load the given DAG file. |
dagrun.duration.success.<dag_id> | Seconds taken for a DagRun to reach success state. |
dagrun.duration.failed.<dag_id> | Milliseconds taken for a DagRun to reach failed state. |
dagrun.schedule_delay.<dag_id> | Seconds of delay between the scheduled DagRun start date and the actual DagRun start date. |
scheduler.critical_section_duration | Milliseconds spent in the critical section of scheduler loop, only a single scheduler can enter this loop at a time. |
scheduler.critical_section_query_duration | Milliseconds spent running the critical section task instance query. |
scheduler.scheduler_loop_duration | Milliseconds spent running one scheduler loop. |
dagrun.<dag_id>.first_task_scheduling_delay | Seconds elapsed between first task start_date and dagrun expected start. |
collect_db_dags | Milliseconds taken for fetching all Serialized Dags from DB. |