Overview
AutoGLM-Phone-Multilingual is a mobile intelligent assistant framework built on vision-language models. It understands phone screen content in a multimodal manner and helps users complete tasks through automated operations. The system controls devices via ADB (Android Debug Bridge), perceives screens, and generates and executes operation workflows through intelligent planning. Users simply describe their needs in natural language, such as âOpen eBay and search for wireless earphones.â and AutoGLM-Phone-Multilingual will complete the entire workflow.Input Modality
Task Instructions
Output Modality
Task Action
Supported Languages
English & Chinese
Supported Hardware Devices
Android Phone
Usage
Order Food Delivery
Order Food Delivery
Place orders for specific products from designated merchants on food delivery platforms, or request to reorder the meal you most recently purchased.
Product Purchase
Product Purchase
Place orders on shopping websites or check product reviews.
Transportation Services
Transportation Services
Route planning, nearby searches, flight and ticket booking, hotel reservations, and more.
News & Information
News & Information
Search for news, play songs and videos, and interact through likes, comments, and favorites.
Housing & Rentals
Housing & Rentals
Search for rentals based on location, budget, layout, and other criteria.
Resources
- API Documentation: Learn how to call the API.
Introducing AutoGLM-Phone-Multilingual
1
Model Highlights
- Technical Breadth: Â Powered by the AutoGLM multimodal model combined with ADB-based device control, integrating a complete capability stack including visual understanding, task planning, and tool execution.
- Commercial Validation: Â Its practicality and stability have been verified across multiple partnerships and testing scenarios.
- Application Value: Â Delivers true end-to-end intelligence, enabling a âsay it, get itâ mobile control experience.
2
Supported Apps
AutoGLM-Phone-Multilingual supports 50+ mainstream applications:
To see the full list of supported apps, run the scripts in github (feel free to give us a star~)
| Category | Apps |
|---|---|
| Social & Messaging | X, Tiktok, WhatsApp, Telegram, FacebookMessenger, GoogleChat, Quora, Reddit, Instagram |
| Productivity & Office | Gmail, GoogleCalendar, GoogleDrive, GoogleDocs, GoogleTasks, Joplin |
| Life, Shopping & Finance | Amazon shopping, Temu, Bluecoins, Duolingo, GoogleFit, ebay |
| Utilities & Media | GoogleClock, Chrome, GooglePlayStore, GooglePlayBooks, FilesbyGoogle |
| Travel & Navigation | GoogleMaps, Booking.com, Trip.com, Expedia, OpenTracks |
3
Available Actions
AutoGLM-Phone-Multilingual can perform the following actions:
| Action | Description |
|---|---|
| Launch | Launch an app |
| Tap | Tap at specified coordinates |
| Type | Input text |
| Swipe | Swipe the screen |
| Back | Go back to previous page |
| Home | Return to home screen |
| Long Press | Long press |
| Double Tap | Double tap |
| Wait | Wait for page to load |
| Take_over | Request manual takeover (login/captcha) |
Examples
Play a Taylor Swift song for me.Invocation Guide
Environment Setup
1. Python Environment
It is recommended to use Python 3.10.2. ADB (Android Debug Bridge)
- Download the official ADB package and extract it to a custom directory.
-
Configure environment variables:
- MacOSďź
export PATH=${PATH}:~/Downloads/platform-tools - Windows: Â Refer to third-party tutorials to configure environment variables.
- MacOSďź
- Verify whether ADB is installed successfully:
3. Android Device Configuration
- Android 7.0+ device or emulator
- Enable Developer Mode:Â Settings â About phone â Tap âBuild numberâ 10 times consecutively
- Enable USB Debugging:Â Settings â Developer options â USB debugging