ConvOps ConvOps

Documentation

What ConvOps monitors, how investigation works, and exactly what gets deployed in your AWS account.

๐Ÿ“‹ On this page
  1. What ConvOps Monitors
  2. Investigation Capabilities (Reply "4")
  3. Alert Flow
  4. Customer Stack โ€” What Gets Deployed
  5. Security Model
  6. Actions Available
  7. Configurable Settings

๐Ÿ”ญ What ConvOps Monitors

ConvOps connects to your AWS account via a read-only IAM role and monitors CloudWatch Alarms in real time โ€” across CPU, memory, error rate, latency, and any custom metric you've configured.

Alarm Types

๐Ÿ’ป CPU Utilization

Threshold alarms on EC2, ECS tasks, RDS, Lambda concurrency.

๐Ÿง  Memory Usage

Container-level and instance-level memory alarms.

๐Ÿšจ Error Rate

5xx rates, Lambda errors, application error metrics.

โฑ๏ธ Latency

API Gateway response times, Lambda duration, ALB target response time.

๐Ÿ”ง Custom Metrics

Any CloudWatch alarm you've created โ€” ConvOps handles it automatically.

Supported AWS Resources

๐Ÿณ ECS ๐Ÿ–ฅ๏ธ EC2 ๐Ÿ—„๏ธ RDS โšก Lambda โš–๏ธ ALB ๐ŸŒ API Gateway ๐ŸŽ๏ธ ElastiCache ๐Ÿ“จ SQS ๐Ÿ“ฆ DynamoDB ๐Ÿชฃ S3

Alert Channels

๐Ÿ’ฌ WhatsApp # Slack

Choose one or both. Alerts land wherever your team already is โ€” no new tools.

Alert Conditions

๐Ÿ”ด
ALARM state โ†’ Immediate notification
When a CloudWatch alarm transitions to ALARM, ConvOps ingests it instantly, runs AI analysis, and delivers a rich notification with root cause โ€” not just a raw metric value.
โœ…
OK state โ†’ Resolved notification
When the alarm returns to OK, you receive a resolved message so you know the incident closed โ€” no need to manually check CloudWatch.
๐Ÿ”
Repeat-alert deduplication
If the same alarm fires repeatedly (flapping), ConvOps batches repeat notifications and only pages you again after repeat_alert_threshold fires (default: 5). Prevents alert fatigue from noisy alarms. Configurable per workspace.

๐Ÿ” Investigation Capabilities

Reply 4 to any alert to trigger a deep investigation. ConvOps pulls data from multiple AWS sources simultaneously, correlates everything, and delivers an AI root cause analysis โ€” within seconds, without opening AWS console.

๐Ÿ’ก The first alert already includes AI analysis. Reply "4" when you want the full deep-dive with all data sources โ€” metrics, logs, CloudTrail, cost, security groups, and more.
๐Ÿ“Š
CloudWatch Metrics โ€” last 30 min
Avg + max values for the relevant metric. Shows the trend leading up to the alarm threshold breach so you can see if it's a spike or a gradual increase.
๐Ÿ“‹
Application Logs โ€” last 15 min
Scans CloudWatch Logs filtered for: ERROR WARN Exception timeout OOM. Surfaces the most relevant log lines without you having to search.
๐Ÿณ
ECS Service State
Running / desired / pending task counts. Deployment status. Failed task details. Stopped task reasons + exit codes โ€” the exact message that tells you why a container stopped.
๐Ÿ–ฅ๏ธ
EC2 Instance State
Instance state, instance type, CPU credit mode (for burstable T-series). Confirms whether the instance is running or in a degraded/impaired state.
๐Ÿ—„๏ธ
RDS Database Status
DB status, engine version, connectivity analysis including security group rules and VPC subnet configuration. Identifies whether the database is reachable from your application layer.
๐Ÿ›ก๏ธ
Security Group Audit
Inbound rules check. Missing port detection โ€” flags if a required port (e.g., 5432 for PostgreSQL, 3306 for MySQL, 6379 for Redis) is not open to expected sources.
๐ŸŒ
API Gateway Diagnostics
5xx and 4xx error rates. CORS configuration status. Stage info (deployment ID, throttling limits). Distinguishes client errors from server-side issues.
โš–๏ธ
ALB Target Health
Per-target group breakdown: healthy / unhealthy / draining target counts. Identifies which backend instances have dropped out of rotation.
๐Ÿ’ฐ
Cost Analysis โ€” last 7 days
Per-service spend breakdown. Spend spike detection: flags any service with cost >20% above its 7-day rolling average. Useful for spotting runaway resources during incidents.
๐Ÿ“œ
CloudTrail โ€” last 5 changes (past 2h)
The last 5 changes to the affected resource in the past 2 hours. Shows who changed what โ€” deploys, config changes, SG modifications, IAM updates. Correlates changes to incidents.
๐Ÿค–
Claude AI โ€” Root Cause Analysis
All collected data is fed to Claude AI, which produces 3โ€“5 bullet root cause analysis + recommended actions. Correlates metrics, logs, config changes, and resource state into a single coherent diagnosis with suggested next steps.

โšก Alert Flow

End-to-end: from CloudWatch alarm to resolved incident.

โ˜๏ธ CloudWatch
Alarm
โ†’
๐Ÿ“ฃ SNS
Topic
โ†’
๐Ÿ”Œ ConvOps
Ingest
โ†’
๐Ÿค– AI
Filter
โ†’
๐Ÿ’ฌ WhatsApp
# Slack
โ†“ customer replies
โœ… ConvOps Acts
โ†’
๐Ÿ” Confirm
+ Audit Log
โ†’
๐Ÿ“ฑ Customer
Replies
1. CloudWatch Alarm
Alarm state changes (ALARM or OK) are published to your SNS topic.
2. ConvOps Ingest
SNS delivers to ConvOps via HTTPS. AI immediately pulls metrics, logs, and resource state.
3. Notification Sent
You receive a WhatsApp/Slack message with root cause already analysed.
4. You Reply
Reply 1 to act, 4 to investigate more, 2 to snooze, 5 to escalate.
5. Confirmed Action
ConvOps asks for YES confirmation. All actions logged with timestamp + approver.

๐Ÿ—๏ธ Customer Stack โ€” What Gets Deployed

ConvOps deploys a single CloudFormation stack into your AWS account. Here is exactly what it creates โ€” no surprises.

๐Ÿ”— One-click setup. Go to app.convops.io, enter your AWS Account ID, and click Launch Stack. CloudFormation handles everything below automatically.

๐Ÿ†“ Free Tier โ€” IAM Role: ConvOpsAccessRole

  • Trusted by: ConvOps AWS account 009001720832 โ€” only ConvOps can assume this role.
  • ExternalId: A unique token generated per customer, stored in Secrets Manager. Prevents confused-deputy attacks โ€” even if someone knows your role ARN, they cannot assume it without the ExternalId.
  • Managed policy attached: arn:aws:iam::aws:policy/ReadOnlyAccess โ€” AWS-managed read-only access to all services. ConvOps can read metrics, logs, and resource state but cannot modify anything.

โœ… What you get: Real-time alerts + AI root cause analysis + full investigation. No write permissions.

๐Ÿ’Ž Pro / Enterprise โ€” Custom Action Policy

On Pro/Enterprise plans, you select which resources ConvOps can act on during onboarding. We generate a custom IAM policy scoped to only those resources. You review and approve before deployment.

Available resource types:
EC2 ECS RDS ElastiCache (Redis) Lambda SSM

How it works:

  1. During onboarding, select resource types (EC2, ECS, RDS, Redis, Lambda, SSM)
  2. ConvOps generates a custom IAM policy scoped to those resources
  3. You review the policy JSON before deployment
  4. Every action still requires "YES" confirmation before execution

Sample policy (EC2, ECS, RDS, Redis selected):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ConvOpsEC2Actions",
      "Effect": "Allow",
      "Action": ["ec2:RebootInstances", "ec2:StopInstances", "ec2:StartInstances"],
      "Resource": "arn:aws:ec2:*:*:instance/*"
    },
    {
      "Sid": "ConvOpsECSActions",
      "Effect": "Allow",
      "Action": ["ecs:UpdateService", "ecs:DescribeServices"],
      "Resource": ["arn:aws:ecs:*:*:service/*/*", "arn:aws:ecs:*:*:task-definition/*"]
    },
    {
      "Sid": "ConvOpsRDSActions",
      "Effect": "Allow",
      "Action": ["rds:RebootDBInstance", "rds:ModifyDBInstance"],
      "Resource": "arn:aws:rds:*:*:db:*"
    },
    {
      "Sid": "ConvOpsElastiCacheActions",
      "Effect": "Allow",
      "Action": ["elasticache:RebootCacheCluster", "elasticache:ModifyCacheCluster"],
      "Resource": "arn:aws:elasticache:*:*:cluster:*"
    }
  ]
}

Tag-based scoping (optional):

{
  "Sid": "ConvOpsECSActionsProductionOnly",
  "Effect": "Allow",
  "Action": ["ecs:UpdateService"],
  "Resource": "arn:aws:ecs:*:*:service/*/*",
  "Condition": {
    "StringEquals": {
      "aws:ResourceTag/Environment": "production"
    }
  }
}

โ†‘ This limits ConvOps to only act on ECS services tagged Environment=production.

๐Ÿ“ฃ SNS Topic: ConvOpsAlertTopic

  • Receives CloudWatch alarm state change notifications.
  • Subscribed to the ConvOps ingestion endpoint via HTTPS: https://ewvdzp6c79.execute-api.eu-central-1.amazonaws.com/prod/ingest
  • You attach this SNS topic as an action on your CloudWatch alarms โ€” that's the only wiring needed.

๐Ÿ“… EventBridge Rule (optional)

  • Catches CloudWatch alarm state changes automatically, without needing SNS actions on each alarm individually.
  • Useful for accounts with many alarms โ€” set-and-forget routing.

Trust Policy (applies to both Free and Pro/Enterprise)

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::009001720832:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "<your-unique-registration-token>"
        }
      }
    }
  ]
}

๐Ÿ”’ Security Model

ConvOps has cross-account access to your AWS environment. Here's exactly what that means โ€” no vague claims.

๐Ÿ”‘ No stored AWS credentials

ConvOps never stores your AWS credentials. Access works via STS AssumeRole โ€” short-lived tokens with a 15-minute TTL. Tokens expire automatically.

๐Ÿ›ก๏ธ ExternalId protection

Your IAM role requires a unique ExternalId (generated per customer, stored in Secrets Manager). Prevents confused-deputy attacks โ€” nobody else can assume your role.

๐Ÿ‘๏ธ Read-only by default

The ReadOnlyAccess managed policy covers all reads. Write actions (ecs:UpdateService, ec2:RebootInstances, rds:RebootDBInstance) require your explicit confirmation for every individual action.

๐Ÿ“ฑ Phone binding

Only the person who received the alert on their registered number/Slack account can confirm actions. Confirmation expires after 5 minutes.

๐Ÿ“‹ Audit log โ€” 90 days

Every investigation and action is logged: timestamp, resource ARN, who approved, outcome. Exportable on request. 90-day retention.

๐Ÿ—ƒ๏ธ Minimal data storage

We store only operational metadata: alarm events, conversation state. Never raw application logs, log content, secrets, or business data.

๐Ÿ‡ช๐Ÿ‡บ EU data residency

All ConvOps infrastructure runs in eu-central-1 (Frankfurt). Data does not leave the EU. GDPR compliant โ€” see convops.io/privacy.

๐Ÿ—‘๏ธ Revoke instantly

Delete the ConvOpsAccessRole from your AWS console. ConvOps loses all access immediately โ€” no support ticket, no waiting period.

๐ŸŽฌ Actions Available

Note: Action execution is coming in the Pro plan. The current free plan includes alerts + full AI investigation. Actions (restart, scale, reboot) are in active development.

When you confirm an action (reply 1 + YES), ConvOps can execute the following remediation actions on your behalf. Every action requires two-step confirmation and is logged.

๐Ÿ”„ Restart ECS service
Forces a new deployment โ€” stops and replaces all running tasks. Clears hung containers, OOM state, connection pool saturation.
๐Ÿ“ˆ Scale up ECS (+1 task)
Increments desired count by 1. Absorbs traffic spikes without a full restart.
๐Ÿ“‰ Scale down ECS (-1 task)
Decrements desired count by 1. Useful for cost reduction or isolating a bad task.
โช Rollback ECS
Redeploys the previous task definition revision. Reverts a bad deploy without touching code.
๐Ÿ”Œ Reboot EC2 instance
Issues a soft reboot via AWS API. Useful for clearing transient kernel/OS issues without data loss.
๐Ÿ—„๏ธ Reboot RDS instance
Restarts the database engine. Clears connection pool saturation and recovers from transient DB failures.

โš™๏ธ Configurable Settings

Adjust ConvOps behaviour per workspace from the app dashboard.

๐Ÿ”
repeat_alert_threshold
How many times the same alarm must fire before ConvOps sends a repeat notification. Default: 5. Set lower for critical alarms, higher for noisy ones. Configured per workspace.
๐Ÿ“ข
Alert channels
Choose WhatsApp, Slack, or both. You can route different alarm types to different channels (e.g., P1 to WhatsApp, P2/P3 to Slack).
๐Ÿ“ž
On-call routing
Route alerts to specific users. Configure who gets action buttons (can confirm remediation) vs who gets read-only notifications (informed only). Useful for on-call rotations.

Ready to try ConvOps?

Connect your AWS account in 2 minutes. Free plan includes alerts + full AI investigation. No credit card.

Get Started โ€” Free โ†’

or email info@convops.io with questions