Instant Motion Tracking with MediaPipe

Posted by Vikram Sharma, Software Engineering Intern; Jianing Wei, Staff Software Engineer; Tyler Mullen, Senior Software Engineer
Today, we are excited to release the Instant Motion Tracking solution in MediaPipe. It is built upon the MediaPipe Box Tracking solution we released previously. With Instant Motion Tracking, you can easily place fun virtual 2D and 3D content on static or moving surfaces, allowing them to seamlessly interact with the real world. This technology also powered MotionStills AR. Along with the library, we are releasing an open source Android application to showcase its capabilities. In this application, a user simply taps the camera viewfinder in order to place virtual 3D objects and GIF animations, augmenting the real-world environment.
Instant Motion Tracking in MediaPipe
Instant Motion Tracking
The Instant Motion Tracking solution provides the capability to seamlessly place virtual content on static or motion surfaces in the real world. To achieve that, we provide the six degrees of freedom tracking with relative scale in the form of rotation and translation matrices. This tracking information is then used in the rendering system to overlay virtual content on camera streams to create immersive AR experiences.
The core concept behind Instant Motion Tracking is to decouple the camera’s translation and rotation estimation, treating them instead as independent optimization problems. This approach enables AR tracking across devices and platforms without initialization or calibration. We do this by first finding the 3D camera translation using only the visual signals from the camera. This involves estimating the target region's apparent 2D translation and relative scale across frames. The process can be illustrated with a simple pinhole camera model, relating translation and scale of an object in the image plane to the final 3D translation.

By finding the change in relative size of our tracked region from view position V1 to V2, we can estimate the relative change in distance from the camera.
Next, we obtain the device’s 3D rotation from its built-in IMU (Inertial Measurement Unit) sensor. By combining this translation and rotation data, we can track a target region with six degrees of freedom at relative scale. This information allows for the placement of virtual content on any system with a camera and IMU functionality, and is calibration free. For more details on Instant Motion Tracking, please refer to our paper.
A MediaPipe Pipeline for Instant Motion Tracking
A diagram of Instant Motion Tracking pipeline is shown below, consisting of four major components: a Sticker Manager module, a Region Tracking module, a Matrices Manager module, and lastly a Rendering System. Each of the components consists of MediaPipe calculators or subgraphs.

Diagram of Instant Motion Tracking Pipeline
The Sticker Manager accepts sticker data from the application and produces initial anchors (tracked region information) based on user taps, and user gesture controls for every sticker object. Initial anchors are then sent to our Region Tracking module to generate tracked anchors. The Matrices Manager combines this data with our device’s rotation matrix to produce six degrees-of-freedom poses as model matrices. After integrating any user-specified transforms like asset scaling, our final poses are forwarded to the Rendering System to render all virtual objects overlaid on the camera frame to produce the output AR frame.
Using the Instant Motion Tracking Solution
The Instant Motion Tracking solution is easy to use by leveraging the MediaPipe cross-platform framework. With camera frames, device rotation matrix, and anchor positions (screen coordinates) as input, the MediaPipe graph produces AR renderings for each frame, providing engaging experiences. If you wish to integrate this Instant Motion Tracking library with your system or application, please visit our documentation to build your own AR experiences on any device with IMU functionality and a camera sensor.
Augmenting The World with 3D Stickers and GIFs
Instant Motion Tracking solution allows bringing both 3D stickers and GIF animations into Augmented Reality experiences. GIFs are rendered on flat 3D billboards placed in the world, introducing fun and immersive experiences with animated content blended into the real environment.Try it for yourself!
Demonstration of GIF placement in 3D
MediaPipe Instant Motion Tracking is already helping PixelShift.AI, a startup applying cutting-edge vision technologies to facilitate video content creation, to track virtual characters seamlessly in the view-finder for a realistic experience. Building upon Instant Motion Tracking’s high-quality pose estimation, PixelShift.AI enables VTubers to create mixed reality experiences with web technologies. The product is going to be released to the broader VTuber community later this year.

Instant Motion Tracking helps PixelShift.AI create mixed reality experiences
Follow MediaPipe
We look forward to publishing more blog posts related to new MediaPipe pipeline examples and features. Please follow the MediaPipe label on Google Developers Blog and Google Developers twitter account (@googledevs).
Acknowledgement
We would like to thank Vikram Sharma, Jianing Wei, Tyler Mullen, Chuo-Ling Chang, Ming Guang Yong, Jiuqiang Tang, Siarhei Kazakou, Genzhi Ye, Camillo Lugaresi, Buck Bourdon, and Matthias Grundman for their contributions to this release.
ML Kit Pose Detection Makes Staying Active at Home Easier

Posted by Kenny Sulaimon, Product Manager, ML Kit; Chengji Yan and Areeba Abid, Software Engineers, ML Kit

Two months ago we introduced the standalone version of the ML Kit SDK, making it even easier to integrate on-device machine learning into mobile apps. Since then we’ve launched the Digital Ink Recognition API, and also introduced the ML Kit early access program. Our first two early access APIs were Pose Detection and Entity Extraction. We’ve received an overwhelming amount of interest in these new APIs and today, we are thrilled to officially add Pose Detection to the ML Kit lineup.

A New ML Kit API, Pose Detection


Examples of ML Kit Pose Detection
ML Kit Pose Detection is an on-device, cross platform (Android and iOS), lightweight solution that tracks a subject's physical actions in real time. With this technology, building a one-of-a-kind experience for your users is easier than ever.
The API produces a full body 33 point skeletal match that includes facial landmarks (ears, eyes, mouth, and nose), along with hands and feet tracking. The API was also trained on a variety of complex athletic poses, such as Yoga positions.

Skeleton image detailing all 33 landmark points
Under The Hood

Diagram of the ML Kit Pose Detection Pipeline
The power of the ML Kit Pose Detection API is in its ease of use. The API builds on the cutting edge BlazePose pipeline and allows developers to build great experiences on Android and iOS, with little effort. We offer a full body model, support for both video and static image use cases, and have added multiple pre and post processing improvements to help developers get started with only a few lines of code.
The ML Kit Pose Detection API utilizes a two step process for detecting poses. First, the API combines an ultra-fast face detector with a prominent person detection algorithm, in order to detect when a person has entered the scene. The API is capable of detecting a single (highest confidence) person in the scene and requires the face of the user to be present in order to ensure optimal results.
Next, the API applies a full body, 33 landmark point skeleton to the detected person. These points are rendered in 2D space and do not account for depth. The API also contains a streaming mode option for further performance and latency optimization. When enabled, instead of running person detection on every frame, the API only runs this detector when the previous frame no longer detects a pose.
The ML Kit Pose Detection API also features two operating modes, “Fast” and “Accurate”. With the “Fast” mode enabled, you can expect a frame rate of around 30+ FPS on a modern Android device, such as a Pixel 4 and 45+ FPS on a modern iOS device, such as an iPhone X. With the “Accurate” mode enabled, you can expect more stable x,y coordinates on both types of devices, but a slower frame rate overall.
Lastly, we’ve also added a per point “InFrameLikelihood” score to help app developers ensure their users are in the right position and filter out extraneous points. This score is calculated during the landmark detection phase and a low likelihood score suggests that a landmark is outside the image frame.
Real World Applications


Examples of a pushup and squat counter using ML Kit Pose Detection
Keeping up with regular physical activity is one of the hardest things to do while at home. We often rely on gym buddies or physical trainers to help us with our workouts, but this has become increasingly difficult. Apps and technology can often help with this, but with existing solutions, many app developers are still struggling to understand and provide feedback on a user’s movement in real time. ML Kit Pose Detection aims to make this problem a whole lot easier.
The most common applications for Pose detection are fitness and yoga trackers. It’s possible to use our API to track pushups, squats and a variety of other physical activities in real time. These complex use cases can be achieved by using the output of the API, either with angle heuristics, tracking the distance between joints, or with your own proprietary classifier model.
To get you jump started with classifying poses, we are sharing additional tips on how to use angle heuristics to classify popular yoga poses. Check it out here.
Learning to Dance Without Leaving Home
Learning a new skill is always tough, but learning to dance without the aid of a real time instructor is even tougher. One of our early access partners, Groovetime, has set out to solve this problem.
With the power of ML Kit Pose Detection, Groovetime allows users to learn their favorite dance moves from popular short-form dance videos, while giving users automated real time feedback on their technique. You can join their early access beta here.

Groovetime App using ML Kit Pose Detection
Staying Active Wherever You Are
Our Pose Detection API is also helping adidas Training, another one of our early access partners, build a virtual workout experience that will help you stay active no matter where you are. This one-of-a-kind innovation will help analyze and give feedback on the user’s movements, using nothing more than just your phone. Integration into the adidas Training app is still in the early phases of the development cycle, but stay tuned for more updates in the future.
How to get started?
If you would like to start using the Pose Detection API in your mobile app, head over to the developer documentation or check out the sample apps for Android and iOS to see the API in action. For questions or feedback, please reach out to us through one of our community channels.
Digital Ink Recognition in ML Kit

Posted by Mircea Trăichioiu, Software Engineer, Handwriting Recognition
A month ago, we announced changes to ML Kit to make mobile development with machine learning even easier. Today we're announcing the addition of the Digital Ink Recognition API on both Android and iOS to allow developers to create apps where stylus and touch act as first class inputs.

Digital ink recognition: the latest addition to ML Kit’s APIs
Digital Ink Recognition is different from the existing Vision and Natural Language APIs in ML Kit, as it takes neither text nor images as input. Instead, it looks at the user's strokes on the screen and recognizes what they are writing or drawing. This is the same technology that powers handwriting recognition in Gboard - Google’s own keyboard app, which we described in detail in a 2019 blog post. It's also the same underlying technology used in the Quick, Draw! and AutoDraw experiments.

Handwriting input in Gboard

Turning doodles into art with Autodraw
With the new Digital Ink Recognition API, developers can now use this technology in their apps as well, for everything from letting users input text and figures with a finger or stylus to transcribing handwritten notes to make them searchable; all in near real time and entirely on-device.
Supports many languages and character sets
Digital Ink Recognition supports 300+ languages and 25+ writing systems including all major Latin languages, as well as Chinese, Japanese, Korean, Arabic, Cyrillic, and more. Classifiers parse written text into a string of characters
Recognizes shapes
Other classifiers can describe shapes, such as drawings and emojis, by the class to which they belong (circle, square, happy face, etc). We currently support an autodraw sketch recognizer, an emoji recognizer, and a basic shape recognizer.
Works offline
Digital Ink Recognition API runs on-device and does not require a network connection. However, you must download one or more models before you can use a recognizer. Models are downloaded on demand and are around 20MB in size. Refer to the model download documentation for more information.
Runs fast
The time to perform a recognition call depends on the exact device and the size of the input stroke sequence. On a typical mobile device recognizing a line of text takes about 100 ms.
How to get started
If you would like to start using Digital Ink Recognition in your mobile app, head over to the documentation or check out the sample apps for Android and iOS to see the API in action. For questions or feedback, please reach out to us through one of our community channels.
Calling everyone across the country to participate in a one-of-a-kind AI musical experience for India’s Independence Day
Popular Posts
Labels
- #11WeeksOfAndroid
- #11WeeksOfAndroid Android TV
- #Android
- #Android11
- #AndroidDev
- #Featured #android11 #5G
- #Game
- #Media
- #NextBillionUsers #Files
- 11 Weeks of Android
- 5G
- accessibility
- actions on google
- ActivityResult
- ad_manager_api
- ad_manager_api_v201908
- ad_manager_api_v202008
- ad_speed
- admob
- admob_api
- admob_api_v1
- adsense_api
- adwords_api
- AI
- AI for Social Good
- Analytics
- Analytics 360
- Analytics 360 Suite
- Android
- Android 11
- Android App Development
- Android Architecture
- Android Architecture Components
- Android Auto
- Android Automotive OS
- Android Developer
- android developers
- Android Developers Best Practices
- Android for cars
- Android Jetpack
- Android Studio
- Android TV
- Android Wear
- android11
- androidstudio
- Animation
- announcement
- App Startup
- Artificial Intelligence
- augmented reality
- beginner
- beta
- Beta updates
- case study
- Cast Connect
- Chemistry
- chrome
- chrome apps
- Chrome Beta for Android
- chrome extensions
- Chrome for Android
- Chrome for iOS
- Chrome OS
- chrome security
- chrome web store
- chromeos
- chromeos.dev
- Compose
- Computational Photography
- Computer Vision
- conference
- conferences
- constraintlayout
- content_api
- core web vitals
- crawling and indexing
- crisis response
- Data Discovery
- datasets
- DataStore
- dbm_api
- Deep Learning
- DeepMind
- deprecation
- Desktop Update
- dev update
- Dev updates
- developer stories
- developer website
- developers
- Developing Media Apps
- dfa_api
- DNS-over-HTTPS
- DoH
- dv360_api
- endpoints
- events
- Fast badging
- Featured
- flood forecasting
- G Suite
- Game
- Game Development
- Games
- Games and Media
- google assistant
- Google Cloud
- Google for India Digitization Fund
- Google Home app
- Google Maps
- google pay
- Google Play
- Google Play Console
- Google Play Indie Games Festival
- Google Public Alerts
- google_ads_api
- google_ads_api_v5
- google_ads_scripts
- GooglePlay
- Handwriting
- hangout
- Health
- High Dynamic Range Imaging
- http
- HTTPS
- ICML
- independence day
- indie developers
- Indie Games
- Indie Games Festival
- Information Retrieval
- insecure forms
- Intent
- intermediate
- Internet access
- Investment
- ios
- JetBrains
- Jetpack
- Jetpack Compose
- json
- Kotlin
- labelling
- latest
- launch
- Machine Learning
- Machine Perception
- Media
- media controls
- MediaPipe
- mixed forms
- ml
- ML Fairness
- ML Kit
- MotionLayout
- Natural Language Processing
- Natural Language Understanding
- Neural Networks
- News
- Next Billion Users
- NLP
- On-device Learning
- open source
- People cards
- permissions
- Physics
- Pixel
- Play Console
- Play Store
- playback resumption
- Pose Detection
- Pose Estimation
- privacy
- product experts
- Profile Guided Optimization
- Publications
- publisher_ads_audits
- Quantum AI
- Quantum Computing
- Recommender Systems
- Reinforcement Learning
- release
- reports
- Research
- Robotics
- safety
- schema.org
- seamless transfer
- search
- Secure DNS
- security
- Self-Supervised Learning
- seo
- Serverless
- SharedPreferences
- shopping
- Social Good
- Sounds of India
- speed
- spyware
- Stable updates
- startup
- Static Sites
- statistics
- sunset
- targeted spyware
- TensorFlow
- update
- Video Analysis
- virtual visiting card
- Wear OS
- Wearables
- wearos
- webmaster community
- webmasters
- women techmakers