experimental

turn old phones into
ai agents

give it a goal in plain english. it reads the screen, thinks about what to do, taps and types via adb, and repeats until the job is done.

droidclaw
$ bun run src/kernel.ts
enter your goal: open youtube and search for "lofi hip hop"

--- step 1/30 ---
think: i'm on the home screen. launching youtube.
action: launch (842ms)

--- step 2/30 ---
think: youtube is open. tapping search icon.
action: tap (623ms)

--- step 3/30 ---
think: search field focused.
action: type "lofi hip hop" (501ms)

--- step 4/30 ---
action: enter (389ms)

--- step 5/30 ---
think: search results showing. done.
action: done (412ms)

perceive, reason, act, adapt

every step is a loop. dump the accessibility tree, filter interactive elements, send to an llm, execute the action, repeat.

1. perceive

captures the screen via uiautomator dump and parses the accessibility xml into tappable elements with coordinates and state.

2. reason

sends screen state + goal to an llm. the model returns think, plan, action - it explains its reasoning before acting.

3. act

executes the chosen action via adb - tap, type, swipe, launch, press back. 22 actions available.

4. adapt

if screen doesn't change for 3 steps, stuck recovery kicks in. empty accessibility tree falls back to screenshots.

interactive, workflows, or flows

type a goal, chain goals across apps with ai, or run deterministic steps with no llm calls.

interactive

just type

run it and describe what you want. the agent figures out the rest.

$ bun run src/kernel.ts
enter your goal: send "running
late, 10 mins" to Mom on whatsapp

workflows

ai-powered · json

chain goals across multiple apps. natural language steps, the llm navigates.

{
  "name": "weather to whatsapp",
  "steps": [
    { "app": "com.google...",
      "goal": "search chennai weather" },
    { "goal": "share to Sanju" }
  ]
}

flows

instant · yaml

fixed taps and types. no llm, instant execution. for repeatable tasks.

appId: com.whatsapp
name: Send WhatsApp Message
---
- launchApp
- tap: "Contact Name"
- type: "hello from droidclaw"
- tap: "Send"

workflows

  • json format, uses ai
  • handles ui changes and popups
  • slower (llm calls each step)
  • best for complex multi-app tasks

flows

  • yaml format, no ai needed
  • breaks if ui changes
  • instant execution
  • best for simple repeatable tasks

what you can build with this

delegate to on-device ai apps, control phones remotely, turn old devices into always-on agents.

delegate to ai apps on-device

open google's ai mode, ask a question, grab the answer, forward it to whatsapp. or ask chatgpt something and share the response to slack. the agent uses apps on your phone as tools - no api keys for those services needed.

remote control with tailscale

install tailscale on phone + laptop. connect adb over the tailnet. your phone is now a remote agent - control it from anywhere. run workflows from a cron job at 8am every morning.

# from anywhere:
adb connect <phone-tailscale-ip>:5555
bun run src/kernel.ts --workflow morning.json

old phones, always on

that android in a drawer can now send standups to slack, check flight prices, digest telegram channels, forward weather to whatsapp. it runs apps that don't have apis.

automation with ai intelligence

unlike predefined button flows, the agent actually thinks. if a button moves, a popup appears, or the layout changes - it adapts. it reads the screen, understands context, and makes decisions.

things it can do right now

across any app installed on the device.

messaging

  • send whatsapp to saved or unsaved numbers
  • reply to latest sms
  • compose emails via gmail
  • telegram messages to groups
  • post standups to slack
  • broadcast to multiple contacts

research

  • search google, collect results
  • ask chatgpt / gemini, grab answer
  • check weather, stocks, flights
  • compare prices across apps
  • translate via google translate
  • compile multi-source digests

social

  • post to instagram, twitter/x
  • like and comment on posts
  • check engagement metrics
  • save youtube to watch later
  • follow / unfollow accounts
  • check linkedin notifications

productivity

  • morning briefing across apps
  • create calendar events
  • capture notes in google keep
  • check github pull requests
  • set alarms and reminders
  • triage notifications

lifestyle

  • order food from delivery apps
  • book an uber ride
  • play songs on spotify
  • check commute on maps
  • log workouts, track expenses
  • toggle do not disturb

device control

  • toggle wifi, bluetooth, airplane
  • adjust brightness, volume
  • force stop or clear cache
  • grant/revoke permissions
  • install/uninstall apps
  • run any adb shell command

what works and what doesn't

22 actions + 6 multi-step skills. here's the reality.

works well

  • native android apps with standard ui
  • multi-app workflows that chain goals
  • device settings via shell commands
  • text input, navigation, taps
  • stuck detection + recovery
  • vision fallback for empty trees

unreliable

  • flutter, react native, games
  • webviews (incomplete tree)
  • drag & drop, multi-finger
  • notification interaction
  • clipboard on android 12+
  • captchas and bot detection

can't do

  • banking apps (FLAG_SECURE)
  • biometrics (fingerprint, face)
  • bypass encrypted lock screen
  • access other apps' private data
  • audio or camera streams
  • pinch-to-zoom gestures

getting started

1

clone and install

git clone https://github.com/thisuxhq/droidclaw.git
cd droidclaw && bun install
cp .env.example .env
2

configure an llm provider

edit .env - fastest way to start is groq (free tier):

LLM_PROVIDER=groq
GROQ_API_KEY=gsk_your_key_here
providercostvisionnotes
groqfreenofastest to start
openrouterper tokenyes200+ models
openaiper tokenyesgpt-4o
bedrockper tokenyesclaude on aws
3

connect your phone

enable usb debugging in developer options, plug in via usb.

adb devices   # should show your device
bun run src/kernel.ts
4

tune (optional)

keydefaultwhat
MAX_STEPS30steps before giving up
STEP_DELAY2seconds between actions
STUCK_THRESHOLD3steps before stuck recovery
VISION_MODEfallbackoff / fallback / always
MAX_ELEMENTS40ui elements sent to llm

35 workflows + 5 flows

ready to use. workflows are ai-powered (json), flows are deterministic (yaml).

messaging 10 workflows
social 4 workflows
productivity 8 workflows
research 6 workflows
lifestyle 8 workflows
flows 5 deterministic

10 files in src/

kernel.ts          main loop
actions.ts         22 actions + adb retry
skills.ts          6 multi-step skills
workflow.ts        workflow orchestration
flow.ts            yaml flow runner
llm-providers.ts   4 providers + system prompt
sanitizer.ts       accessibility xml parser
config.ts          env config
constants.ts       keycodes, coordinates
logger.ts          session logging