-
Notifications
You must be signed in to change notification settings - Fork 1
Add OpenTelemetry instrumentation to Databricks Notebooks with Azure Application Insights integration #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ly describe the notebook
| try: | ||
| # Execute Child Notebook 2 | ||
| print("Executing Child Notebook 2...") | ||
| child2_result_json = dbutils.notebook.run("./child_notebook_2", timeout_seconds=600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there's a way we could wrap dbutils.notebook.run so we don't have to do the manual .start_tracing()/.end_tracing()?
Maybe something like:
// This is a member function on "workflow_otel_helper"
def manually_instrument_fn(self, function, trace_name):
if trace_name is None:
trace_name = function.__name__
outer_self = self
def trace_wrapper(*args, **kwargs):
nonlocal outer_self
# Maybe capture the arguments here?
outer_self.start_tracing(trace_name)
try:
# Could be worth injecting the current trace_id into kwargs
# so traces can continue across notebooks
function(*args, **kwargs)
finally:
outer_self.end_tracing(trace_name)
return trace_wrapperThen you could run other notebooks like this without needing to manually include the start_tracing and end_tracing calls:
run_traced_notebook = workflow_otel_helper.manually_instrument_function(dbutils.notebook.run)
child2_result_json = run_traced_notebook("./child_notebook_2", timeout_seconds=600)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is derived from a design philosophy of mine I like to call "make the easiest way to do something the right way"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emanguy Thank you for the improvement advice and good philosophy!
Just updated the project related to this. Let me know if this looks good to you?
Here's a quick summary:
- Added helper methods
_trace_execution()(private),instrument_function()andrun_notebook_with_tracing()to the helper class. - Updated the example notebooks to use these.
- Updated the documentation.
- Tested the notebook changes via Azure Databricks.
- Tested the related KQL queries in Azure Application Insights.
So the code related to the Child2 notebook call now looks like this:
# Execute Child Notebook 2 with automatic tracing
print("Executing Child Notebook 2 with automatic tracing...")
child2_result = workflow_otel_helper.run_notebook_with_tracing(
notebook_path="./child_notebook_2",
span_name="Child_Notebook_2",
timeout_seconds=600,
etl_pipeline_id=workflow_id,
notebook_type="aggregation"
)
print(f"Child Notebook 2 completed with status code: {child2_result['status_code']}")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, looks good to me!
- Expand README with business value, visual diagrams, and documentation guide - Update existing documentation files with more detailed information - Reorganize and improve documentation structure
- Add glossary.md with definitions of technical terms - Add quick_start.md with streamlined setup instructions - Improve onboarding experience for new users
- Add architecture_diagram.md showing overall system architecture - Add etl_pipeline_visualization.md illustrating data flow through ETL pipeline - Add parent_child_workflow_diagram.md showing span correlation in notebook workflows - Enhance documentation with visual representations
Summary
This PR implements a comprehensive framework for instrumenting Databricks notebooks with OpenTelemetry and integrating with Azure Application Insights for monitoring and observability. The implementation focuses on notebook monitoring, providing detailed tracing and metrics collection across extraction, transformation, and loading stages, with enhanced documentation and visual representations to improve usability.
Key Features
Implementation Details
otel_helper.pymodule, which provides a reusable helper classrun_notebook_with_tracingfor automatic tracing of child notebook executionsinstrument_functionfor wrapping any function with OpenTelemetry tracingtrace_functiondecorator for automatic tracing of function executionBusiness Value
Learning Context
I worked on this project as part of my personal study to deepen my understanding of:
This learning exercise has helped me gain hands-on experience with modern observability techniques and how they can be applied to data engineering workflows.
Testing
Documentation
AI Assistance Disclosure
This contribution utilized AI tools (e.g., ChatGPT, Claude 3.7 Sonnet via VSCode Cline) for development assistance. All outputs were manually reviewed and tested to ensure adherence to project standards.