DevFlow Agent Protocol

Overview

An agent is a lightweight in-process HTTP server embedded inside a target application. External clients — CLIs, test drivers, AI tools, IDE extensions — connect to it to:

Inspect the visual tree and element properties
Perform UI actions (tap, fill, scroll, gesture, navigate)
Capture screenshots
Monitor network traffic, logs, and performance
Interact with embedded web content (WebViews)
Read and write application storage

The protocol is framework-agnostic. The same client code works whether the target app is built with .NET MAUI, native Android, React Native, Flutter, WinUI, or anything else. Each framework provides its own agent implementation that translates the standard protocol into framework-specific operations.

Adding the agent to your app? Skip ahead to a platform guide: .NET MAUI, Native Android, Flutter, React Native, Expo, or WinUI.

Reference docs & downloads

The human-readable protocol page is paired with generated reference docs and downloadable bundles produced from the canonical spec during the website build. The canonical authoring files stay in docs/openapi.yaml, docs/asyncapi.yaml, docs/schemas/*.json, and docs/examples/*.json; generated bundles are published for consumers and code generators.

OpenAPI reference

Generated HTTP API docs with request and response schemas.

WebSocket reference

Generated AsyncAPI docs for streaming channels and message envelopes.

Artifact	Download
Complete spec package	ailoha-devflow-agent-protocol-v1.zip
OpenAPI 3.1 YAML bundle	openapi.yaml
OpenAPI 3.1 JSON bundle	openapi.json
OpenAPI 3.0 client compatibility JSON	openapi-client-3.0.json
AsyncAPI YAML and JSON bundles	asyncapi.yaml · asyncapi.json

Architecture

The agent runs in-process inside the target application, giving it direct access to the UI framework's APIs, the visual tree, and application state. All communication is local — the agent binds to localhost:{port} over HTTP and WebSocket.

Broker handles discovery. A separate broker daemon manages port assignment and agent registration. Clients ask the broker "where is my app?" and get back a port number.
Any client language works. Standard HTTP + JSON. Anything that can make HTTP requests can be a client.

For Ailoha's CLI and MCP host, the practical resolution order is: explicit command args → environment or local project selection (AILOHA_AGENT_HOST, AILOHA_AGENT_PORT, .devflow) → broker discovery → default localhost port.

URL structure & versioning

All HTTP routes share a common shape:

http://localhost:{port}/api/v1/{namespace}/{resource}
ws://localhost:{port}/ws/v1/{channel}

Routes are organized under versioned namespaces:

Namespace	Purpose	Example
`agent`	Identity & capabilities	`/api/v1/agent/status`
`ui`	Visual tree, actions, screenshots	`/api/v1/ui/tree`
`webview`	Embedded web content	`/api/v1/webview/contexts`
`profiler`	Performance monitoring	`/api/v1/profiler/sessions`
`network`	HTTP traffic monitoring	`/api/v1/network/requests`
`logs`	Application log access	`/api/v1/logs`
`device`	Hardware, sensors, permissions	`/api/v1/device/info`
`storage`	Preferences & secure storage	`/api/v1/storage/preferences`
`ext`	Third-party extensions	`/api/v1/ext/{namespace}/...`

URL path versioning is used everywhere. The version segment (v1) appears in both HTTP and WebSocket URLs. Future breaking changes increment the version (v2, v3) while maintaining backward compatibility on previous versions.

Quick start

1. Discover the agent

Verify the agent is running and inspect its identity:

GET /api/v1/agent/status

{
  "agent": {
    "name": "devflow-maui",
    "version": "1.0.0",
    "framework": "maui",
    "frameworkVersion": "10.0"
  },
  "platform": "ios",
  "device": { "model": "iPhone 15 Pro", "manufacturer": "Apple", "osVersion": "18.0" },
  "app": { "name": "MyApp", "packageId": "com.example.myapp", "version": "2.1.0" },
  "running": true,
  "uptime": 42.5
}

2. Check capabilities

Find out what this agent supports before calling feature endpoints:

GET /api/v1/agent/capabilities

{
  "capabilities": {
    "ui.tree":       { "version": 1, "features": ["css-selectors", "hit-test", "native-layer"] },
    "ui.actions":    { "version": 1, "features": ["tap", "fill", "scroll", "navigate", "batch"] },
    "ui.screenshot": { "version": 1, "features": ["element-capture", "window-capture"] },
    "webview":       { "version": 1, "features": ["evaluate", "dom-query", "source"] },
    "profiler":      { "version": 1, "features": ["samples", "markers", "spans"] },
    "network":       { "version": 1, "features": ["capture", "stream"] },
    "logs":          { "version": 1, "features": ["query", "stream"] }
  }
}

3. Inspect → act → verify

The most common workflow is three steps: get the tree, perform an action, and check the result. Bundle them in two HTTP calls using the include parameter:

GET /api/v1/ui/tree?depth=5

POST /api/v1/ui/actions/tap
Content-Type: application/json

{ "elementId": "btn_submit", "include": ["screenshot", "tree"] }

The action response now contains the success flag, the resulting screenshot, and the new visual tree — no follow-up calls needed.

Capability discovery

Before calling any feature endpoint, clients should check what the agent supports. Not every agent implements every capability - a Flutter agent may not expose storage.secure; the native Android agent currently returns explicit unsupported responses for secure storage, BLE, jobs, and WebSocket streams.

Capabilities are organized into namespaces with a version number and a list of supported features:

{
  "capabilities": {
    "ui.tree": {
      "version": 1,
      "features": ["css-selectors", "hit-test", "native-layer"]
    },
    "ui.actions": {
      "version": 1,
      "features": ["tap", "fill", "clear", "scroll", "focus", "navigate", "resize", "back", "key", "gesture", "batch"]
    },
    "device.sensors": {
      "version": 1,
      "features": ["accelerometer", "gyroscope", "compass"]
    }
  }
}

Client usage pattern

caps = client.get("/api/v1/agent/capabilities")

if "ui.actions" in caps["capabilities"]:
    actions = caps["capabilities"]["ui.actions"]
    if "batch" in actions["features"]:
        client.post("/api/v1/ui/actions/batch", { ... })
    else:
        client.post("/api/v1/ui/actions/tap", { ... })

if "webview" not in caps["capabilities"]:
    print("WebView features not available for this framework")

Extension marker on status

GET /api/v1/agent/status includes a lightweight extension marker so clients can skip the heavier capabilities request when nothing's registered:

{ "extensions": { "count": 2, "hash": "a1b2c3d4..." } }

Cache extension metadata by hash and only re-fetch GET /api/v1/agent/capabilities when the hash changes.

Element queries & locator strategies

Elements in the visual tree are located using named locator strategies, inspired by the W3C WebDriver specification.

Strategy	Description	Example
`accessibility-id`	Match by automation/accessibility ID	`?strategy=accessibility-id&value=submit-btn`
`css-selector`	CSS selector adapted for native UI trees	`?strategy=css-selector&value=Button:visible.primary`
`type`	Match by element type name	`?strategy=type&value=Entry`
`text`	Match by visible text	`?strategy=text&value=Submit`
`xpath`	XPath expression over the tree	`?strategy=xpath&value=//Button[@text='Submit']`

Element model

Every element in the tree conforms to the ElementInfo schema — a cross-framework model with consistent fields regardless of the underlying UI framework:

{
  "id": "elem_abc123",
  "parentId": "window_0",
  "type": "Button",
  "fullType": "Microsoft.Maui.Controls.Button",
  "framework": "maui",
  "automationId": "submit-btn",
  "text": "Submit",
  "role": "button",
  "traits": ["interactive", "focusable"],
  "state": {
    "displayed": true, "enabled": true, "selected": false, "focused": false, "opacity": 1.0
  },
  "bounds": { "x": 10, "y": 100, "width": 200, "height": 50, "coordinate": "window" },
  "gestures": ["tap"],
  "nativeView": {
    "type": "android.widget.Button",
    "properties": { "elevation": "4.0" }
  },
  "frameworkProperties": {
    "maui:bindingContext": "SubmitViewModel"
  },
  "children": []
}

Field	Purpose
`id`	Globally unique identifier — use this to target actions
`type` / `fullType`	Short and fully-qualified type name
`role`	Semantic role (`button`, `textbox`, `checkbox`, `list`, `window`, …)
`traits`	Behavioral traits (`interactive`, `focusable`, `scrollable`, `adjustable`)
`state`	Current interactive state — displayed, enabled, selected, focused
`bounds`	Bounding rectangle in window or screen coordinates
`nativeView`	Underlying platform view type and properties
`frameworkProperties`	Framework-specific data not captured by standard fields

Tree layers

Agents may support multiple tree representations:

GET /api/v1/ui/tree?layer=framework    # Default: framework component tree
GET /api/v1/ui/tree?layer=native       # Underlying platform views
GET /api/v1/ui/tree?layer=render       # Render objects (Flutter-specific)

Windows as tree nodes

Windows are root-level elements in the visual tree, not a separate API resource. There is no ?window=N parameter — windows appear naturally as top-level nodes:

{
  "tree": [
    { "id": "window_0", "type": "Window", "role": "window",
      "state": { "focused": true, "displayed": true, "enabled": true, "selected": false, "opacity": 1.0 },
      "children": [ { "id": "page_main", "type": "NavigationPage", "children": [ ... ] } ] },
    { "id": "window_1", "type": "Window", "role": "window",
      "state": { "focused": false, "displayed": true, "enabled": true, "selected": false, "opacity": 1.0 },
      "children": [ ... ] }
  ]
}

One call shows everything. The full tree includes all windows.
Element IDs are globally unique. Tap elem_abc123 and it targets the correct window automatically.
CSS selectors work across windows. Window:focused Button.submit naturally scopes to the focused window.
Filter when needed. GET /api/v1/ui/tree?rootId=window_0 returns just one window's subtree.

AI-optimized patterns

The protocol is designed for AI agents that loop inspect → act → verify. Every unnecessary HTTP round-trip slows the AI down, so two patterns minimize calls.

Composite responses

Every action endpoint accepts an include parameter to bundle additional data in the response:

POST /api/v1/ui/actions/tap
Content-Type: application/json

{ "elementId": "btn_login", "include": ["screenshot", "tree"] }

{
  "success": true,
  "screenshot": "iVBORw0KGgoAAAANSUhEUgAA...",
  "tree": [ { "id": "window_0", "type": "Window", "children": [ ... ] } ]
}

screenshot and tree are only populated when requested — no wasted bandwidth if you don't need them.

Batch actions

Execute multiple actions in a single request with POST /api/v1/ui/actions/batch:

POST /api/v1/ui/actions/batch
Content-Type: application/json

{
  "actions": [
    { "action": "fill", "elementId": "txt_email",    "text": "user@example.com" },
    { "action": "fill", "elementId": "txt_password", "text": "s3cret!" },
    { "action": "tap",  "elementId": "btn_login" }
  ],
  "include": ["screenshot", "tree"],
  "continueOnError": false
}

One request fills two fields, taps a button, and returns the resulting screenshot and tree. Two calls (tree + batch) instead of five.

WebSocket channels

Real-time streaming uses WebSocket connections. All channels share a standard message envelope.

{
  "type": "event_type",
  "timestamp": "2024-01-15T10:30:00.123Z",
  "data": { ... }
}

Channel	Content	Event types
`/ws/v1/network`	HTTP traffic stream	`replay`, `request`
`/ws/v1/logs`	Application log stream	`replay`, `log`
`/ws/v1/device/sensors`	Sensor readings	`subscribed`, `reading`
`/ws/v1/profiler`	Performance data	`samples`, `marker`, `span`
`/ws/v1/ui/events`	UI lifecycle events	`navigation`, `lifecycle`, `treeChange`

Subscription model

After connecting, send a subscribe message to configure filtering:

// Client → Agent
{ "type": "subscribe", "filter": { "sensor": "accelerometer", "throttleMs": 100 } }

// Agent → Client (confirmation)
{ "type": "subscribed", "timestamp": "2024-01-15T10:30:00.123Z",
  "sensor": { "name": "accelerometer", "type": "accelerometer", "available": true } }

// Agent → Client (live data)
{ "type": "reading", "timestamp": "2024-01-15T10:30:00.223Z",
  "reading": { "sensor": "accelerometer", "x": 0.02, "y": -9.81, "z": 0.01 } }

Channels that support historical replay (network, logs) send a replay batch of existing entries immediately after subscription, followed by live events.

Error handling

All errors use the RFC 7807 Problem Details format with a machine-readable errorCode field.

{
  "type": "https://devflow.dev/errors/element-not-found",
  "title": "Element Not Found",
  "status": 404,
  "detail": "No element found with id 'btn_submit'. The element may have been removed from the tree.",
  "errorCode": "element-not-found",
  "instance": "/api/v1/ui/actions/tap"
}

Standard error codes

Error code	HTTP status	Description
`element-not-found`	404	No element matches the given ID or selector
`stale-element-reference`	404	Element ID was valid but the UI has changed and it no longer exists
`element-not-interactable`	400	Element exists but cannot be acted on (hidden, disabled, obscured)
`invalid-selector`	400	Malformed selector or unsupported locator strategy
`timeout`	408	Operation did not complete within the allowed time
`unknown-command`	404	Route or action type not recognized
`unsupported-capability`	501	Agent does not implement the requested capability
`internal-error`	500	Unexpected agent-side failure

Cross-framework element mapping

The ElementInfo schema provides a unified model across frameworks. Here's how key concepts map:

Concept	.NET MAUI	React Native	Flutter	Android	iOS
Element type	`Button`	`TouchableOpacity`	`ElevatedButton`	`Button`	`UIButton`
Automation ID	`AutomationId`	`testID`	`ValueKey<String>`	resource ID, `setAilohaAutomationId(...)`, or Compose `Modifier.testTag(...)`	`accessibilityIdentifier`
Role	Inferred from type	`accessibilityRole`	Semantics role	Accessibility role	`accessibilityTraits`
Native view	Platform handler view	Host view	RenderObject / PlatformView	Self	Self

Framework-specific data that doesn't fit the standard model goes in frameworkProperties:

{
  "id": "elem_abc",
  "type": "Button",
  "framework": "maui",
  "frameworkProperties": {
    "maui:bindingContext": "LoginViewModel",
    "maui:visualStateGroup": "CommonStates",
    "maui:currentVisualState": "Normal"
  }
}

Extensions

Third-party libraries can register their own tools without forking the protocol. Extensions own a reverse-domain namespace (e.g. com.acme.featureflags) and declare self-describing tool descriptors that mirror MCP tool metadata, so CLI and MCP clients can discover and invoke them with no extension-specific code.

GET  /api/v1/ext/com.example.analytics/events
POST /api/v1/ext/com.example.analytics/flush
GET  /api/v1/ext/io.sentry.devflow/breadcrumbs

{
  "extensions": {
    "com.example.analytics": {
      "version": "1.0.0",
      "description": "Analytics event inspector and replayer",
      "tools": [
        {
          "name": "list_events",
          "description": "List recent analytics events captured in-app",
          "method": "GET",
          "path": "/api/v1/ext/com.example.analytics/events",
          "parameters": {
            "type": "object",
            "properties": {
              "limit": { "type": "integer", "default": 50 }
            }
          },
          "annotations": { "readOnly": true, "idempotent": true, "category": "analytics" }
        }
      ]
    }
  }
}

Each tool descriptor carries name, description, method, path, optional parameters/returns JSON schemas, and behavioral annotations (readOnly, idempotent, destructive, category). Extension namespaces must use reverse-domain notation to prevent collisions.

For the practical authoring surface per platform, see the Building extensions guide.

Implementation checklist

MUST implement (core)

Endpoint	Purpose
`GET /api/v1/agent/status`	Identity, platform, app info, uptime
`GET /api/v1/agent/capabilities`	Declare supported capabilities and features

Without these, clients cannot discover or interact with the agent.

SHOULD implement (UI inspection & interaction)

Endpoint	Purpose
`GET /api/v1/ui/tree`	Visual tree with windows as root nodes
`GET /api/v1/ui/elements/{id}`	Single element lookup by ID
`GET /api/v1/ui/elements?strategy=…`	Element query with locator strategies
`GET /api/v1/ui/screenshot`	Capture screenshot
`POST /api/v1/ui/actions/tap`	Tap element or coordinates
`POST /api/v1/ui/actions/fill`	Fill text input
`POST /api/v1/ui/actions/scroll`	Scroll
`POST /api/v1/ui/actions/batch`	Batch actions

MAY implement (extended capabilities)

Implement as appropriate for the framework and use case: profiler, network, logs, device info & sensors, preference & secure storage, WebView, all WebSocket channels, and extensions.

Complete route reference

Agent

GET    /api/v1/agent/status
GET    /api/v1/agent/capabilities

UI inspection

GET    /api/v1/ui/tree
GET    /api/v1/ui/elements/{id}
GET    /api/v1/ui/elements?strategy={strategy}&value={value}
GET    /api/v1/ui/hit-test?x={x}&y={y}
GET    /api/v1/ui/screenshot

UI actions

POST   /api/v1/ui/actions/tap
POST   /api/v1/ui/actions/fill
POST   /api/v1/ui/actions/clear
POST   /api/v1/ui/actions/focus
POST   /api/v1/ui/actions/scroll
POST   /api/v1/ui/actions/navigate
POST   /api/v1/ui/actions/resize
POST   /api/v1/ui/actions/back
POST   /api/v1/ui/actions/key
POST   /api/v1/ui/actions/gesture
POST   /api/v1/ui/actions/batch

UI element properties

GET    /api/v1/ui/elements/{id}/properties/{name}
PUT    /api/v1/ui/elements/{id}/properties/{name}

WebView

GET    /api/v1/webview/contexts
POST   /api/v1/webview/evaluate
GET    /api/v1/webview/dom
POST   /api/v1/webview/dom/query
GET    /api/v1/webview/source
POST   /api/v1/webview/navigate
POST   /api/v1/webview/input/click
POST   /api/v1/webview/input/fill
POST   /api/v1/webview/input/text
GET    /api/v1/webview/screenshot

Profiler

GET    /api/v1/profiler/capabilities
POST   /api/v1/profiler/sessions
DELETE /api/v1/profiler/sessions/{id}
GET    /api/v1/profiler/sessions/{id}/samples
POST   /api/v1/profiler/markers
POST   /api/v1/profiler/spans
GET    /api/v1/profiler/hotspots

Network

GET    /api/v1/network/requests
GET    /api/v1/network/requests/{id}
DELETE /api/v1/network/requests

Logs

GET    /api/v1/logs

Device

GET    /api/v1/device/info
GET    /api/v1/device/display
GET    /api/v1/device/battery
GET    /api/v1/device/connectivity
GET    /api/v1/device/app
GET    /api/v1/device/sensors
POST   /api/v1/device/sensors/{name}/start
POST   /api/v1/device/sensors/{name}/stop
GET    /api/v1/device/permissions
GET    /api/v1/device/permissions/{name}
GET    /api/v1/device/geolocation

Storage

GET    /api/v1/storage/preferences
GET    /api/v1/storage/preferences/{key}
PUT    /api/v1/storage/preferences/{key}
DELETE /api/v1/storage/preferences/{key}
DELETE /api/v1/storage/preferences
GET    /api/v1/storage/secure/{key}
PUT    /api/v1/storage/secure/{key}
DELETE /api/v1/storage/secure/{key}
DELETE /api/v1/storage/secure

Extensions

GET    /api/v1/ext/{namespace}/{path}
POST   /api/v1/ext/{namespace}/{path}
PUT    /api/v1/ext/{namespace}/{path}
DELETE /api/v1/ext/{namespace}/{path}

Extension routes are defined by registered extensions using reverse-domain namespaces. Use GET /api/v1/agent/capabilities to discover available extensions and their tools.

WebSocket channels

WS     /ws/v1/network
WS     /ws/v1/logs
WS     /ws/v1/device/sensors
WS     /ws/v1/profiler
WS     /ws/v1/ui/events

Inspired by WebDriver & Appium

The protocol incorporates battle-tested patterns from the W3C WebDriver specification and the Appium ecosystem:

Formalized locator strategies instead of ad-hoc query parameters — unambiguous and extensible.
Standard error codes familiar to anyone using Selenium or Appium (element-not-found, stale-element-reference, element-not-interactable).
W3C Actions for gestures — a simplified pointer-action model lets you express any gesture (swipe, long-press, drag-and-drop, pinch-zoom) without a dedicated endpoint per gesture. Simple actions (tap, fill, scroll) remain as convenience endpoints.
WebDriver state model — every element exposes displayed, enabled, selected, focused consistently across frameworks.

What's next

Pick a framework and add the agent to your app.
Build a custom extension to expose your library's tools to AI agents.
Skip manual setup and let your AI coding agent run ailoha init.