ailoha.
Home/ Docs/ DevFlow Agent Protocol
The contract

DevFlow Agent Protocol

A framework-agnostic HTTP + WebSocket contract for inspecting and interacting with running applications. Every Ailoha agent - MAUI, Android, Flutter, React Native, WinUI, and platforms beyond - implements the same surface, so one client speaks to all of them.

OpenAPI 3.1 AsyncAPI 3.0 JSON Schema 2020-12 RFC 7807 errors v1

Overview

An agent is a lightweight in-process HTTP server embedded inside a target application. External clients — CLIs, test drivers, AI tools, IDE extensions — connect to it to:

  • Inspect the visual tree and element properties
  • Perform UI actions (tap, fill, scroll, gesture, navigate)
  • Capture screenshots
  • Monitor network traffic, logs, and performance
  • Interact with embedded web content (WebViews)
  • Read and write application storage

The protocol is framework-agnostic. The same client code works whether the target app is built with .NET MAUI, native Android, React Native, Flutter, WinUI, or anything else. Each framework provides its own agent implementation that translates the standard protocol into framework-specific operations.

i
Adding the agent to your app? Skip ahead to a platform guide: .NET MAUI, Native Android, Flutter, React Native, Expo, or WinUI.

Reference docs & downloads

The human-readable protocol page is paired with generated reference docs and downloadable bundles produced from the canonical spec during the website build. The canonical authoring files stay in docs/openapi.yaml, docs/asyncapi.yaml, docs/schemas/*.json, and docs/examples/*.json; generated bundles are published for consumers and code generators.

ArtifactDownload
Complete spec packageailoha-devflow-agent-protocol-v1.zip
OpenAPI 3.1 YAML bundleopenapi.yaml
OpenAPI 3.1 JSON bundleopenapi.json
OpenAPI 3.0 client compatibility JSONopenapi-client-3.0.json
AsyncAPI YAML and JSON bundlesasyncapi.yaml · asyncapi.json

Architecture

The agent runs in-process inside the target application, giving it direct access to the UI framework's APIs, the visual tree, and application state. All communication is local — the agent binds to localhost:{port} over HTTP and WebSocket.

  • Broker handles discovery. A separate broker daemon manages port assignment and agent registration. Clients ask the broker "where is my app?" and get back a port number.
  • Any client language works. Standard HTTP + JSON. Anything that can make HTTP requests can be a client.

For Ailoha's CLI and MCP host, the practical resolution order is: explicit command argsenvironment or local project selection (AILOHA_AGENT_HOST, AILOHA_AGENT_PORT, .devflow) → broker discoverydefault localhost port.

URL structure & versioning

All HTTP routes share a common shape:

http://localhost:{port}/api/v1/{namespace}/{resource}
ws://localhost:{port}/ws/v1/{channel}

Routes are organized under versioned namespaces:

NamespacePurposeExample
agentIdentity & capabilities/api/v1/agent/status
uiVisual tree, actions, screenshots/api/v1/ui/tree
webviewEmbedded web content/api/v1/webview/contexts
profilerPerformance monitoring/api/v1/profiler/sessions
networkHTTP traffic monitoring/api/v1/network/requests
logsApplication log access/api/v1/logs
deviceHardware, sensors, permissions/api/v1/device/info
storagePreferences & secure storage/api/v1/storage/preferences
extThird-party extensions/api/v1/ext/{namespace}/...

URL path versioning is used everywhere. The version segment (v1) appears in both HTTP and WebSocket URLs. Future breaking changes increment the version (v2, v3) while maintaining backward compatibility on previous versions.

Quick start

1. Discover the agent

Verify the agent is running and inspect its identity:

GET /api/v1/agent/status
{
  "agent": {
    "name": "devflow-maui",
    "version": "1.0.0",
    "framework": "maui",
    "frameworkVersion": "10.0"
  },
  "platform": "ios",
  "device": { "model": "iPhone 15 Pro", "manufacturer": "Apple", "osVersion": "18.0" },
  "app": { "name": "MyApp", "packageId": "com.example.myapp", "version": "2.1.0" },
  "running": true,
  "uptime": 42.5
}

2. Check capabilities

Find out what this agent supports before calling feature endpoints:

GET /api/v1/agent/capabilities
{
  "capabilities": {
    "ui.tree":       { "version": 1, "features": ["css-selectors", "hit-test", "native-layer"] },
    "ui.actions":    { "version": 1, "features": ["tap", "fill", "scroll", "navigate", "batch"] },
    "ui.screenshot": { "version": 1, "features": ["element-capture", "window-capture"] },
    "webview":       { "version": 1, "features": ["evaluate", "dom-query", "source"] },
    "profiler":      { "version": 1, "features": ["samples", "markers", "spans"] },
    "network":       { "version": 1, "features": ["capture", "stream"] },
    "logs":          { "version": 1, "features": ["query", "stream"] }
  }
}

3. Inspect → act → verify

The most common workflow is three steps: get the tree, perform an action, and check the result. Bundle them in two HTTP calls using the include parameter:

GET /api/v1/ui/tree?depth=5

POST /api/v1/ui/actions/tap
Content-Type: application/json

{ "elementId": "btn_submit", "include": ["screenshot", "tree"] }

The action response now contains the success flag, the resulting screenshot, and the new visual tree — no follow-up calls needed.

Capability discovery

Before calling any feature endpoint, clients should check what the agent supports. Not every agent implements every capability - a Flutter agent may not expose storage.secure; the native Android agent currently returns explicit unsupported responses for secure storage, BLE, jobs, and WebSocket streams.

Capabilities are organized into namespaces with a version number and a list of supported features:

{
  "capabilities": {
    "ui.tree": {
      "version": 1,
      "features": ["css-selectors", "hit-test", "native-layer"]
    },
    "ui.actions": {
      "version": 1,
      "features": ["tap", "fill", "clear", "scroll", "focus", "navigate", "resize", "back", "key", "gesture", "batch"]
    },
    "device.sensors": {
      "version": 1,
      "features": ["accelerometer", "gyroscope", "compass"]
    }
  }
}

Client usage pattern

caps = client.get("/api/v1/agent/capabilities")

if "ui.actions" in caps["capabilities"]:
    actions = caps["capabilities"]["ui.actions"]
    if "batch" in actions["features"]:
        client.post("/api/v1/ui/actions/batch", { ... })
    else:
        client.post("/api/v1/ui/actions/tap", { ... })

if "webview" not in caps["capabilities"]:
    print("WebView features not available for this framework")

Extension marker on status

GET /api/v1/agent/status includes a lightweight extension marker so clients can skip the heavier capabilities request when nothing's registered:

{ "extensions": { "count": 2, "hash": "a1b2c3d4..." } }

Cache extension metadata by hash and only re-fetch GET /api/v1/agent/capabilities when the hash changes.

Element queries & locator strategies

Elements in the visual tree are located using named locator strategies, inspired by the W3C WebDriver specification.

StrategyDescriptionExample
accessibility-idMatch by automation/accessibility ID?strategy=accessibility-id&value=submit-btn
css-selectorCSS selector adapted for native UI trees?strategy=css-selector&value=Button:visible.primary
typeMatch by element type name?strategy=type&value=Entry
textMatch by visible text?strategy=text&value=Submit
xpathXPath expression over the tree?strategy=xpath&value=//Button[@text='Submit']

Element model

Every element in the tree conforms to the ElementInfo schema — a cross-framework model with consistent fields regardless of the underlying UI framework:

{
  "id": "elem_abc123",
  "parentId": "window_0",
  "type": "Button",
  "fullType": "Microsoft.Maui.Controls.Button",
  "framework": "maui",
  "automationId": "submit-btn",
  "text": "Submit",
  "role": "button",
  "traits": ["interactive", "focusable"],
  "state": {
    "displayed": true, "enabled": true, "selected": false, "focused": false, "opacity": 1.0
  },
  "bounds": { "x": 10, "y": 100, "width": 200, "height": 50, "coordinate": "window" },
  "gestures": ["tap"],
  "nativeView": {
    "type": "android.widget.Button",
    "properties": { "elevation": "4.0" }
  },
  "frameworkProperties": {
    "maui:bindingContext": "SubmitViewModel"
  },
  "children": []
}
FieldPurpose
idGlobally unique identifier — use this to target actions
type / fullTypeShort and fully-qualified type name
roleSemantic role (button, textbox, checkbox, list, window, …)
traitsBehavioral traits (interactive, focusable, scrollable, adjustable)
stateCurrent interactive state — displayed, enabled, selected, focused
boundsBounding rectangle in window or screen coordinates
nativeViewUnderlying platform view type and properties
frameworkPropertiesFramework-specific data not captured by standard fields

Tree layers

Agents may support multiple tree representations:

GET /api/v1/ui/tree?layer=framework    # Default: framework component tree
GET /api/v1/ui/tree?layer=native       # Underlying platform views
GET /api/v1/ui/tree?layer=render       # Render objects (Flutter-specific)

Windows as tree nodes

Windows are root-level elements in the visual tree, not a separate API resource. There is no ?window=N parameter — windows appear naturally as top-level nodes:

{
  "tree": [
    { "id": "window_0", "type": "Window", "role": "window",
      "state": { "focused": true, "displayed": true, "enabled": true, "selected": false, "opacity": 1.0 },
      "children": [ { "id": "page_main", "type": "NavigationPage", "children": [ ... ] } ] },
    { "id": "window_1", "type": "Window", "role": "window",
      "state": { "focused": false, "displayed": true, "enabled": true, "selected": false, "opacity": 1.0 },
      "children": [ ... ] }
  ]
}
  • One call shows everything. The full tree includes all windows.
  • Element IDs are globally unique. Tap elem_abc123 and it targets the correct window automatically.
  • CSS selectors work across windows. Window:focused Button.submit naturally scopes to the focused window.
  • Filter when needed. GET /api/v1/ui/tree?rootId=window_0 returns just one window's subtree.

AI-optimized patterns

The protocol is designed for AI agents that loop inspect → act → verify. Every unnecessary HTTP round-trip slows the AI down, so two patterns minimize calls.

Composite responses

Every action endpoint accepts an include parameter to bundle additional data in the response:

POST /api/v1/ui/actions/tap
Content-Type: application/json

{ "elementId": "btn_login", "include": ["screenshot", "tree"] }
{
  "success": true,
  "screenshot": "iVBORw0KGgoAAAANSUhEUgAA...",
  "tree": [ { "id": "window_0", "type": "Window", "children": [ ... ] } ]
}

screenshot and tree are only populated when requested — no wasted bandwidth if you don't need them.

Batch actions

Execute multiple actions in a single request with POST /api/v1/ui/actions/batch:

POST /api/v1/ui/actions/batch
Content-Type: application/json

{
  "actions": [
    { "action": "fill", "elementId": "txt_email",    "text": "user@example.com" },
    { "action": "fill", "elementId": "txt_password", "text": "s3cret!" },
    { "action": "tap",  "elementId": "btn_login" }
  ],
  "include": ["screenshot", "tree"],
  "continueOnError": false
}

One request fills two fields, taps a button, and returns the resulting screenshot and tree. Two calls (tree + batch) instead of five.

WebSocket channels

Real-time streaming uses WebSocket connections. All channels share a standard message envelope.

{
  "type": "event_type",
  "timestamp": "2024-01-15T10:30:00.123Z",
  "data": { ... }
}
ChannelContentEvent types
/ws/v1/networkHTTP traffic streamreplay, request
/ws/v1/logsApplication log streamreplay, log
/ws/v1/device/sensorsSensor readingssubscribed, reading
/ws/v1/profilerPerformance datasamples, marker, span
/ws/v1/ui/eventsUI lifecycle eventsnavigation, lifecycle, treeChange

Subscription model

After connecting, send a subscribe message to configure filtering:

// Client → Agent
{ "type": "subscribe", "filter": { "sensor": "accelerometer", "throttleMs": 100 } }

// Agent → Client (confirmation)
{ "type": "subscribed", "timestamp": "2024-01-15T10:30:00.123Z",
  "sensor": { "name": "accelerometer", "type": "accelerometer", "available": true } }

// Agent → Client (live data)
{ "type": "reading", "timestamp": "2024-01-15T10:30:00.223Z",
  "reading": { "sensor": "accelerometer", "x": 0.02, "y": -9.81, "z": 0.01 } }

Channels that support historical replay (network, logs) send a replay batch of existing entries immediately after subscription, followed by live events.

Error handling

All errors use the RFC 7807 Problem Details format with a machine-readable errorCode field.

{
  "type": "https://devflow.dev/errors/element-not-found",
  "title": "Element Not Found",
  "status": 404,
  "detail": "No element found with id 'btn_submit'. The element may have been removed from the tree.",
  "errorCode": "element-not-found",
  "instance": "/api/v1/ui/actions/tap"
}

Standard error codes

Error codeHTTP statusDescription
element-not-found404No element matches the given ID or selector
stale-element-reference404Element ID was valid but the UI has changed and it no longer exists
element-not-interactable400Element exists but cannot be acted on (hidden, disabled, obscured)
invalid-selector400Malformed selector or unsupported locator strategy
timeout408Operation did not complete within the allowed time
unknown-command404Route or action type not recognized
unsupported-capability501Agent does not implement the requested capability
internal-error500Unexpected agent-side failure

Cross-framework element mapping

The ElementInfo schema provides a unified model across frameworks. Here's how key concepts map:

Concept.NET MAUIReact NativeFlutterAndroidiOS
Element typeButtonTouchableOpacityElevatedButtonButtonUIButton
Automation IDAutomationIdtestIDValueKey<String>resource ID, setAilohaAutomationId(...), or Compose Modifier.testTag(...)accessibilityIdentifier
RoleInferred from typeaccessibilityRoleSemantics roleAccessibility roleaccessibilityTraits
Native viewPlatform handler viewHost viewRenderObject / PlatformViewSelfSelf

Framework-specific data that doesn't fit the standard model goes in frameworkProperties:

{
  "id": "elem_abc",
  "type": "Button",
  "framework": "maui",
  "frameworkProperties": {
    "maui:bindingContext": "LoginViewModel",
    "maui:visualStateGroup": "CommonStates",
    "maui:currentVisualState": "Normal"
  }
}

Extensions

Third-party libraries can register their own tools without forking the protocol. Extensions own a reverse-domain namespace (e.g. com.acme.featureflags) and declare self-describing tool descriptors that mirror MCP tool metadata, so CLI and MCP clients can discover and invoke them with no extension-specific code.

GET  /api/v1/ext/com.example.analytics/events
POST /api/v1/ext/com.example.analytics/flush
GET  /api/v1/ext/io.sentry.devflow/breadcrumbs
{
  "extensions": {
    "com.example.analytics": {
      "version": "1.0.0",
      "description": "Analytics event inspector and replayer",
      "tools": [
        {
          "name": "list_events",
          "description": "List recent analytics events captured in-app",
          "method": "GET",
          "path": "/api/v1/ext/com.example.analytics/events",
          "parameters": {
            "type": "object",
            "properties": {
              "limit": { "type": "integer", "default": 50 }
            }
          },
          "annotations": { "readOnly": true, "idempotent": true, "category": "analytics" }
        }
      ]
    }
  }
}

Each tool descriptor carries name, description, method, path, optional parameters/returns JSON schemas, and behavioral annotations (readOnly, idempotent, destructive, category). Extension namespaces must use reverse-domain notation to prevent collisions.

i
For the practical authoring surface per platform, see the Building extensions guide.

Implementation checklist

MUST implement (core)

EndpointPurpose
GET /api/v1/agent/statusIdentity, platform, app info, uptime
GET /api/v1/agent/capabilitiesDeclare supported capabilities and features

Without these, clients cannot discover or interact with the agent.

SHOULD implement (UI inspection & interaction)

EndpointPurpose
GET /api/v1/ui/treeVisual tree with windows as root nodes
GET /api/v1/ui/elements/{id}Single element lookup by ID
GET /api/v1/ui/elements?strategy=…Element query with locator strategies
GET /api/v1/ui/screenshotCapture screenshot
POST /api/v1/ui/actions/tapTap element or coordinates
POST /api/v1/ui/actions/fillFill text input
POST /api/v1/ui/actions/scrollScroll
POST /api/v1/ui/actions/batchBatch actions

MAY implement (extended capabilities)

Implement as appropriate for the framework and use case: profiler, network, logs, device info & sensors, preference & secure storage, WebView, all WebSocket channels, and extensions.

Complete route reference

Agent

GET    /api/v1/agent/status
GET    /api/v1/agent/capabilities

UI inspection

GET    /api/v1/ui/tree
GET    /api/v1/ui/elements/{id}
GET    /api/v1/ui/elements?strategy={strategy}&value={value}
GET    /api/v1/ui/hit-test?x={x}&y={y}
GET    /api/v1/ui/screenshot

UI actions

POST   /api/v1/ui/actions/tap
POST   /api/v1/ui/actions/fill
POST   /api/v1/ui/actions/clear
POST   /api/v1/ui/actions/focus
POST   /api/v1/ui/actions/scroll
POST   /api/v1/ui/actions/navigate
POST   /api/v1/ui/actions/resize
POST   /api/v1/ui/actions/back
POST   /api/v1/ui/actions/key
POST   /api/v1/ui/actions/gesture
POST   /api/v1/ui/actions/batch

UI element properties

GET    /api/v1/ui/elements/{id}/properties/{name}
PUT    /api/v1/ui/elements/{id}/properties/{name}

WebView

GET    /api/v1/webview/contexts
POST   /api/v1/webview/evaluate
GET    /api/v1/webview/dom
POST   /api/v1/webview/dom/query
GET    /api/v1/webview/source
POST   /api/v1/webview/navigate
POST   /api/v1/webview/input/click
POST   /api/v1/webview/input/fill
POST   /api/v1/webview/input/text
GET    /api/v1/webview/screenshot

Profiler

GET    /api/v1/profiler/capabilities
POST   /api/v1/profiler/sessions
DELETE /api/v1/profiler/sessions/{id}
GET    /api/v1/profiler/sessions/{id}/samples
POST   /api/v1/profiler/markers
POST   /api/v1/profiler/spans
GET    /api/v1/profiler/hotspots

Network

GET    /api/v1/network/requests
GET    /api/v1/network/requests/{id}
DELETE /api/v1/network/requests

Logs

GET    /api/v1/logs

Device

GET    /api/v1/device/info
GET    /api/v1/device/display
GET    /api/v1/device/battery
GET    /api/v1/device/connectivity
GET    /api/v1/device/app
GET    /api/v1/device/sensors
POST   /api/v1/device/sensors/{name}/start
POST   /api/v1/device/sensors/{name}/stop
GET    /api/v1/device/permissions
GET    /api/v1/device/permissions/{name}
GET    /api/v1/device/geolocation

Storage

GET    /api/v1/storage/preferences
GET    /api/v1/storage/preferences/{key}
PUT    /api/v1/storage/preferences/{key}
DELETE /api/v1/storage/preferences/{key}
DELETE /api/v1/storage/preferences
GET    /api/v1/storage/secure/{key}
PUT    /api/v1/storage/secure/{key}
DELETE /api/v1/storage/secure/{key}
DELETE /api/v1/storage/secure

Extensions

GET    /api/v1/ext/{namespace}/{path}
POST   /api/v1/ext/{namespace}/{path}
PUT    /api/v1/ext/{namespace}/{path}
DELETE /api/v1/ext/{namespace}/{path}

Extension routes are defined by registered extensions using reverse-domain namespaces. Use GET /api/v1/agent/capabilities to discover available extensions and their tools.

WebSocket channels

WS     /ws/v1/network
WS     /ws/v1/logs
WS     /ws/v1/device/sensors
WS     /ws/v1/profiler
WS     /ws/v1/ui/events

Inspired by WebDriver & Appium

The protocol incorporates battle-tested patterns from the W3C WebDriver specification and the Appium ecosystem:

  • Formalized locator strategies instead of ad-hoc query parameters — unambiguous and extensible.
  • Standard error codes familiar to anyone using Selenium or Appium (element-not-found, stale-element-reference, element-not-interactable).
  • W3C Actions for gestures — a simplified pointer-action model lets you express any gesture (swipe, long-press, drag-and-drop, pinch-zoom) without a dedicated endpoint per gesture. Simple actions (tap, fill, scroll) remain as convenience endpoints.
  • WebDriver state model — every element exposes displayed, enabled, selected, focused consistently across frameworks.

What's next