How to Automate Incident Response : How Q Developer Helped Me Automate a Daily Pain Point

Contents hide

Introduction

As a DevOps engineer, I spend countless hours documenting incidents. Every time something goes wrong, I have to manually extract information from Slack conversations, format it into reports, and ensure all stakeholders get the details they need. This process typically takes 30-60 minutes per incident, and with multiple incidents per week, it was eating up a significant portion of my time.

When I discovered AWS Q Developer, I saw an opportunity to automate this tedious process. In this article, I’ll share how I built a CLI automation tool that transforms Slack conversations into professional incident reports in under 30 seconds.

The Problem: Manual Incident Documentation

Before Automation

Time-consuming: 30-60 minutes per incident report
Error-prone: Missing critical details due to human oversight
Inconsistent: Different team members used different formats
Delayed: Reports weren’t available when stakeholders needed them
Repetitive: Same manual process for every incident

The Impact

This manual process was affecting our team’s productivity and incident response effectiveness. We were spending more time documenting incidents than actually learning from them and implementing improvements.

The Solution: AI-Powered Incident Reporter

I built a Python CLI tool that:

Fetches Slack conversations using the Slack API
Analyzes content using Groq’s Llama3-70B model
Generates structured reports in both HTML and Markdown formats
Extracts exact timestamps from messages for accurate timelines
Provides consistent formatting every time

Key Features

🔍 Smart Analysis: AI extracts incident details, root cause, and impact
📊 Dual Output: Both HTML and Markdown reports for flexibility
⏱️ Time Tracking: Automatic timestamp extraction and formatting
🎯 Simple Interface: Just provide a Slack message ID
📁 Local Storage: Organized file management with timestamped names

How Q Developer Assisted the Development Process

1. Problem Analysis and Architecture Design

Q Developer helped me identify the core requirements:

The tool needed to be simple enough for non-technical users
It should integrate with existing Slack workflows
Output should be professional and consistent
Processing should be fast and reliable

Architecture decisions Q Developer suggested:

Use Groq instead of slower AI alternatives for better performance
Implement both HTML and Markdown outputs for maximum flexibility
Create a simple CLI interface that only requires a message ID
Use environment variables for secure configuration management

2. Code Implementation and Integration

Slack API Integration: Q Developer helped me understand the Slack API structure and implement:

Thread message retrieval with proper error handling
User information extraction for attribution
Message formatting cleanup (removing Slack-specific formatting)
Timestamp conversion from Unix timestamps to readable format

AI Analysis Setup: Q Developer guided me through:

Groq API integration and authentication
Prompt engineering for structured JSON output
Error handling for API rate limits and failures
Response parsing and validation

Report Generation: Q Developer assisted with:

HTML template creation with professional styling
Markdown formatting for documentation systems
File naming conventions with timestamps
Browser auto-opening for immediate review

Debugging Challenges: Q Developer helped me solve several technical issues:

Timestamp extraction: Initially the AI wasn’t using the exact timestamps from Slack messages
JSON parsing: Ensuring the AI returned properly formatted JSON
Error handling: Making the tool robust for various failure scenarios
User experience: Improving error messages and feedback

Performance Optimization:

Optimized the AI prompt for better analysis quality
Improved error messages for better debugging
Added markdown output alongside HTML for flexibility

Technical Implementation

Technology Stack

Python 3.8+: Core automation logic
Groq API: Fast AI inference (Llama3-70B model)
Slack API: Message retrieval and user management
Markdown/HTML: Report formatting and styling
Environment Variables: Secure configuration management

Key Components

1. Slack Integration

def fetch_slack_thread(self, channel_id, thread_ts):
    """Fetches all messages in a Slack thread."""
    response = requests.get(
        f"https://slack.com/api/conversations.replies",
        headers={"Authorization": f"Bearer {SLACK_BOT_TOKEN}"},
        params={"channel": channel_id, "ts": thread_ts}
    )
    return response.json()

2. AI Analysis

def analyze_with_groq(self, conversation):
    """Sends conversation to Groq for analysis."""
    chat_completion = self.groq_client.chat.completions.create(
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": f"Here is the Slack thread transcript:\n\n{conversation}"},
        ],
        model="llama3-70b-8192",
        temperature=0.2,
        response_format={"type": "json_object"},
    )
    return json.loads(chat_completion.choices[0].message.content)

3. Report Generation

def create_local_report(self, title, content, incident_date):
    """Creates both HTML and Markdown reports."""
    # Generate filenames with timestamps
    html_filename = f"{incident_date}_{safe_title}.html"
    md_filename = f"{incident_date}_{safe_title}.md"
    
    # Save markdown version
    with open(md_filepath, 'w', encoding='utf-8') as f:
        f.write(md_header + content)
    
    # Save HTML version with styling
    with open(html_filepath, 'w', encoding='utf-8') as f:
        f.write(html_document)

Setup and Usage

Prompt used

I need to create a Python script that automatically generates incident reports from Slack conversations using Groq. Here are the requirements:

**Core Functionality:**
1. Fetch Slack conversation threads using Slack API
2. Use Groq to analyze the conversation
3. Generate structured incident reports as local HTML files
4. Use the exact same AI prompt structure as my existing incident handler

**Technical Requirements:**
- Python 3.7+
- Use boto3 for AWS Bedrock integration
- Use requests for Slack API calls
- Use python-dotenv for environment variables
- Use markdown library for HTML generation
- Generate beautiful, styled HTML reports that open in browser

**Environment Variables Needed:**
- SLACK_BOT_TOKEN (xoxb-...)
- SLACK_USER_TOKEN (xoxp-...)
- SLACK_CHANNEL_ID

**AI Prompt Structure (MUST USE THIS EXACT PROMPT):**  Analyze the conversation to identify key events, user actions, and outcomes.
The JSON object must contain these exact keys:
* "summary"
* "author" (Identify the primary person who resolved the issue)
* "priority" (Value must be one of: "High", "Medium", "Low")
* "status" (Value must be one of: "Resolved", "Investigating", "Monitoring")
* "description"
* "category" (e.g., "Infrastructure", "Application", "Database")
* "environment" (e.g., "Production", "Staging")
* "affected_resources" (Comma-separated list, e.g., "Jenkins, API Server")
* "root_cause_text"
* "remediation_steps_text"
* "impact_text"
* "timeline_markdown": "A detailed, multi-line timeline as a single JSON string. Each event must be a Markdown bullet point on a new line, attributing the action to the user who performed it. All newlines MUST be escaped as \\n. Example format: '- [10:09 PM] - Ajay noticed that storage is full.\\n- [10:11 PM] - Yuga cleaned up storage to free space.'"
* "what_went_well_markdown": "A Markdown numbered list as a single JSON string, describing what went well during the incident response. All newlines MUST be escaped as \\n."
* "what_went_wrong_markdown": "A Markdown numbered list as a single JSON string, describing what could be improved. All newlines MUST be escaped as \\n." text

Prerequisites

Python 3.8 or higher
Slack workspace with admin access
Groq API key (free tier available)

Quick Setup

Install dependencies: pip install -r requirements.txt
Configure Slack app with required scopes
Get Groq API key from console.groq.com
Set environment variables in .env file
Run the tool: python slack_incident_reporter.py <message-id>

Example Usage

# Generate report from a Slack thread
python slack_incident_reporter.py p1753168536411769

# Output files created:
# - incident_reports/2025-01-22_Incident_Report_Storage_Issue.html
# - incident_reports/2025-01-22_Incident_Report_Storage_Issue.md

Results and Impact

Time Savings

Before: 30-60 minutes per incident report
After: 30 seconds per incident report
Improvement: 98% reduction in documentation time

Quality Improvements

Consistent formatting across all incidents
Complete timeline extraction with exact timestamps
Structured analysis including root cause and impact
Professional presentation for stakeholders

Team Adoption

Reduced documentation burden on incident responders
Faster availability of reports for post-mortem analysis
Improved compliance with incident documentation requirements
Better knowledge sharing across the organization

Sample Output

Generated Report Structure

# Incident Report: Storage Issue on Shared Runner

**Generated on:** 2025-01-22 15:45:30  
**Incident Date:** 2025-01-22

## Summary
Resolved storage issue on shared runner causing GitLab pipeline failures

## Timeline
- [2025-01-22 15:30:45] - Ajay noticed that storage is full
- [2025-01-22 15:32:12] - Ajay checked pipeline logs and found storage issue
- [2025-01-22 15:35:20] - Ajay SSH into shared runner and ran df -h command
- [2025-01-22 15:38:15] - Ajay found docker images and cache taking more space
- [2025-01-22 15:40:30] - Ajay ran docker system prune -af
- [2025-01-22 15:42:45] - Ajay created automation to remove unused images

## Impact
- GitLab pipelines were failing due to insufficient storage
- Development workflow was blocked for 2 hours
- Affected all projects using the shared runner

## Root Cause
Docker images and cache accumulated over time, consuming 95% of available storage space

Lessons Learned

What Went Well

Q Developer’s guidance was invaluable for architecture decisions
Groq’s speed made the tool feel instant and responsive
Simple CLI interface made adoption easy for the team
Dual output format (HTML + Markdown) provided flexibility

Challenges Overcome

Timestamp extraction: Required careful prompt engineering
API integration: Needed robust error handling
User experience: Iterative improvements based on feedback
Documentation: Comprehensive setup instructions were crucial

Future Improvements

Web interface for non-technical users
Integration with Jira/ServiceNow for ticket creation
Email notifications for completed reports
Team dashboard for incident analytics

Conclusion

This project demonstrates how AWS Q Developer can help transform daily productivity pain points into automated solutions. What started as a manual 30-minute process is now a 30-second automated workflow.

The key to success was:

Identifying a real pain point that affects daily work
Leveraging Q Developer’s expertise for architecture and implementation
Focusing on user experience and simplicity
Iterating based on feedback and testing

The result is a tool that not only saves time but also improves the quality and consistency of our incident documentation, making our team more effective and our processes more reliable.

Resources

GitHub Repository: https://github.com/Aj7Ay/Amazon-Q.git
Groq API Documentation: https://console.groq.com/docs
Slack API Documentation: https://api.slack.com/
AWS Q Developer: https://aws.amazon.com/q/

This project was built as part of the #q-developer-challenge-1. The automation addresses a real productivity pain point in incident response workflows, demonstrating how AI can transform manual processes into efficient, automated solutions.

Tags: #q-developer-challenge-1 #automation #incident-response #slack #groq #python #cli #ai

How to Automate Incident Response : How Q Developer Helped Me Automate a Daily Pain Point

AI-Powered Incident Reporter:

Introduction

The Problem: Manual Incident Documentation

Before Automation

The Impact

The Solution: AI-Powered Incident Reporter

Key Features

How Q Developer Assisted the Development Process

1. Problem Analysis and Architecture Design

2. Code Implementation and Integration

3. Testing and Refinement

Technical Implementation

Technology Stack

Key Components

1. Slack Integration

2. AI Analysis

3. Report Generation

Setup and Usage

Prompt used

Prerequisites

Quick Setup

Example Usage

Results and Impact

Time Savings

Quality Improvements

Team Adoption

Sample Output

Generated Report Structure

Lessons Learned

What Went Well

Challenges Overcome

Future Improvements

Conclusion

Resources

Comments

Leave a Reply Cancel reply

Day -1: Kick Off Cloud Security with AWS Registration

How to Automate Incident Response : How Q Developer Helped Me Automate a Daily Pain Point

How to Run Docker Model Runner on Ubuntu 24.04