give it a goal in plain english. it reads the screen, thinks about what to do, taps and types via adb, and repeats until the job is done.
$ bun run src/kernel.ts enter your goal: open youtube and search for "lofi hip hop" --- step 1/30 --- think: i'm on the home screen. launching youtube. action: launch (842ms) --- step 2/30 --- think: youtube is open. tapping search icon. action: tap (623ms) --- step 3/30 --- think: search field focused. action: type "lofi hip hop" (501ms) --- step 4/30 --- action: enter (389ms) --- step 5/30 --- think: search results showing. done. action: done (412ms)
every step is a loop. dump the accessibility tree, filter interactive elements, send to an llm, execute the action, repeat.
captures the screen via uiautomator dump and parses the accessibility xml into tappable elements with coordinates and state.
sends screen state + goal to an llm. the model returns think, plan, action - it explains its reasoning before acting.
executes the chosen action via adb - tap, type, swipe, launch, press back. 22 actions available.
if screen doesn't change for 3 steps, stuck recovery kicks in. empty accessibility tree falls back to screenshots.
type a goal, chain goals across apps with ai, or run deterministic steps with no llm calls.
run it and describe what you want. the agent figures out the rest.
$ bun run src/kernel.ts enter your goal: send "running late, 10 mins" to Mom on whatsapp
chain goals across multiple apps. natural language steps, the llm navigates.
{
"name": "weather to whatsapp",
"steps": [
{ "app": "com.google...",
"goal": "search chennai weather" },
{ "goal": "share to Sanju" }
]
}
fixed taps and types. no llm, instant execution. for repeatable tasks.
appId: com.whatsapp name: Send WhatsApp Message --- - launchApp - tap: "Contact Name" - type: "hello from droidclaw" - tap: "Send"
delegate to on-device ai apps, control phones remotely, turn old devices into always-on agents.
open google's ai mode, ask a question, grab the answer, forward it to whatsapp. or ask chatgpt something and share the response to slack. the agent uses apps on your phone as tools - no api keys for those services needed.
install tailscale on phone + laptop. connect adb over the tailnet. your phone is now a remote agent - control it from anywhere. run workflows from a cron job at 8am every morning.
# from anywhere: adb connect <phone-tailscale-ip>:5555 bun run src/kernel.ts --workflow morning.json
that android in a drawer can now send standups to slack, check flight prices, digest telegram channels, forward weather to whatsapp. it runs apps that don't have apis.
unlike predefined button flows, the agent actually thinks. if a button moves, a popup appears, or the layout changes - it adapts. it reads the screen, understands context, and makes decisions.
across any app installed on the device.
22 actions + 6 multi-step skills. here's the reality.
git clone https://github.com/thisuxhq/droidclaw.git cd droidclaw && bun install cp .env.example .env
edit .env - fastest way to start is groq (free tier):
LLM_PROVIDER=groq GROQ_API_KEY=gsk_your_key_here
| provider | cost | vision | notes |
|---|---|---|---|
| groq | free | no | fastest to start |
| openrouter | per token | yes | 200+ models |
| openai | per token | yes | gpt-4o |
| bedrock | per token | yes | claude on aws |
enable usb debugging in developer options, plug in via usb.
adb devices # should show your device bun run src/kernel.ts
| key | default | what |
|---|---|---|
| MAX_STEPS | 30 | steps before giving up |
| STEP_DELAY | 2 | seconds between actions |
| STUCK_THRESHOLD | 3 | steps before stuck recovery |
| VISION_MODE | fallback | off / fallback / always |
| MAX_ELEMENTS | 40 | ui elements sent to llm |
ready to use. workflows are ai-powered (json), flows are deterministic (yaml).
kernel.ts main loop actions.ts 22 actions + adb retry skills.ts 6 multi-step skills workflow.ts workflow orchestration flow.ts yaml flow runner llm-providers.ts 4 providers + system prompt sanitizer.ts accessibility xml parser config.ts env config constants.ts keycodes, coordinates logger.ts session logging