Skip to main content

Overview

AutoGLM-Phone-Multilingual is a mobile intelligent assistant framework built on vision-language models. It understands phone screen content in a multimodal manner and helps users complete tasks through automated operations. The system controls devices via ADB (Android Debug Bridge), perceives screens, and generates and executes operation workflows through intelligent planning. Users simply describe their needs in natural language, such as “Open eBay and search for wireless earphones.” and AutoGLM-Phone-Multilingual will complete the entire workflow.
New model launched, free for a limited time!

Input Modality

Task Instructions

Output Modality

Task Action

Supported Languages

English & Chinese

Supported Hardware Devices

Android Phone

Usage

Place orders for specific products from designated merchants on food delivery platforms, or request to reorder the meal you most recently purchased.
Place orders on shopping websites or check product reviews.
Route planning, nearby searches, flight and ticket booking, hotel reservations, and more.
Search for news, play songs and videos, and interact through likes, comments, and favorites.
Search for rentals based on location, budget, layout, and other criteria.

Resources

Introducing AutoGLM-Phone-Multilingual

1

Model Highlights

  • Technical Breadth:  Powered by the AutoGLM multimodal model combined with ADB-based device control, integrating a complete capability stack including visual understanding, task planning, and tool execution.
  • Commercial Validation:  Its practicality and stability have been verified across multiple partnerships and testing scenarios.
  • Application Value:  Delivers true end-to-end intelligence, enabling a “say it, get it” mobile control experience.
2

Supported Apps

AutoGLM-Phone-Multilingual supports 50+ mainstream applications:
CategoryApps
Social & MessagingX, Tiktok, WhatsApp, Telegram, FacebookMessenger, GoogleChat, Quora, Reddit, Instagram
Productivity & OfficeGmail, GoogleCalendar, GoogleDrive, GoogleDocs, GoogleTasks, Joplin
Life, Shopping & FinanceAmazon shopping, Temu, Bluecoins, Duolingo, GoogleFit, ebay
Utilities & MediaGoogleClock, Chrome, GooglePlayStore, GooglePlayBooks, FilesbyGoogle
Travel & NavigationGoogleMaps, Booking.com, Trip.com, Expedia, OpenTracks
To see the full list of supported apps, run the scripts in github (feel free to give us a star~)
3

Available Actions

AutoGLM-Phone-Multilingual can perform the following actions:
ActionDescription
LaunchLaunch an app
TapTap at specified coordinates
TypeInput text
SwipeSwipe the screen
BackGo back to previous page
HomeReturn to home screen
Long PressLong press
Double TapDouble tap
WaitWait for page to load
Take_overRequest manual takeover (login/captcha)

Examples

Play a Taylor Swift song for me.

Invocation Guide

Environment Setup

1. Python Environment

It is recommended to use Python 3.10.

2. ADB (Android Debug Bridge)

  • Download the official ADB package and extract it to a custom directory.
https://developer.android.com/tools/releases/platform-tools?hl=zh-cn
  • Configure environment variables:
    • MacOS:export PATH=${PATH}:~/Downloads/platform-tools
    • Windows:  Refer to third-party tutorials to configure environment variables.
  • Verify whether ADB is installed successfully:
# adb --version

Android Debug Bridge version 1.0.41
Version 36.0.0-13206524
Installed as /opt/homebrew/bin/adb
Running on Darwin 22.4.0 (arm64)

3. Android Device Configuration

  • Android 7.0+ device or emulator
  • Enable Developer Mode: Settings → About phone → Tap “Build number” 10 times consecutively
  • Enable USB Debugging: Settings → Developer options → USB debugging

4. Install ADB Keyboard

Download ADBKeyboard.apk and install it on the device. After installation, go to Settings → Input method and enable ADB Keyboard. https://github.com/senzhk/ADBKeyBoard/blob/master/ADBKeyboard.apk

Deployment Preparation

1. Clone the Repository

git clone https://github.com/zai-org/Open-AutoGLM.git

2. Install Dependencies

pip install -r requirements.txt
pip install -e .

3. Configure ADB Connection

# Check connected devices
adb devices
# Output should show your device, e.g.
# List of devices attached
# emulator-5554   device

4. Configure Model API

python main.py --base-url https://api-inference.modelscope.cn/v1 --model "ZAI/AutoGLM-Phone-9B" --apikey "your-zai-api-key" ""Open Chrome browser"