Google Blog: accessibility

Showing posts with label accessibility. Show all posts

Chrome 86 introduces two new features that improve both the user and developer experience when it comes to working with focus.

The :focus-visible pseudo-class is a CSS selector that lets developers opt-in to the same heuristic the browser uses when it's deciding whether to show a default focus indicator. This makes styling focus more predictable.

The Quick Focus Highlight is a user preference that causes the currently focused element to display an indicator for two seconds. The Quick Focus Highlight will always display, even if a page has disabled focus styles using CSS. It will also cause all CSS focus styles to match regardless of the input device that is interacting with the page.

An animation of the quick focus highlight showing how it temporarily highlights a link in a line of text and then fades out to not obscure the text content.

What is focus?

When a user interacts with an element the browser will often show an indicator to signal that the element has "focus". This is sometimes referred to as the "focus ring" because browsers typically put a solid or dashed ring around the focused element.

A button with a default blue focus ring

The focus ring signals to the user which element will receive keyboard events. If a user is tabbing through a form, the focus ring indicates which text field they can type into, or if they've focused a submit button they will know that pressing Enter or Spacebar will activate that button.

Problems with focus

For users who rely on a keyboard or other assistive technology to access the page, the focus ring acts as their mouse pointer - it's how they know what they are interacting with.

Unfortunately, many websites hide the focus ring using CSS. Oftentimes they do this because the underlying behavior of focus can be difficult to understand, and styling focus can have surprising consequences.

For example, a custom dropdown menu should use the tabindex attribute to make itself keyboard operable. But adding a tabindex to an element causes all browsers to show a focus ring on that element if it is clicked with a mouse. If a developer is surprised to see the focus ring when they click the menu, they might use the following CSS to hide it:

.custom-dropdown-menu:focus {
  outline: none;
}

This "fixes" their issue, insofar as they no longer see the focus ring when they click the menu. However, they have unknowingly broken the experience for users relying on a keyboard to access the page. As mentioned earlier, for users who rely on a keyboard to access the page, the focus ring acts as their mouse pointer. Therefore, CSS that removes the focus ring (without providing an alternative) is akin to hiding the mouse pointer.

To improve on this situation, developers need a better way to style focus - one that matches their expectations of how focus should work, and doesn't run the risk of breaking the experience for users. At the same time, users need to have the final say in the experience and should be able to choose when and how they see focus. This is where :focus-visible and the Quick Focus Highlight come in.

:focus-visible

Whenever you click on an element, browsers use an internal heuristic to determine whether they should display a default focus indicator. This is why in Chrome tabbing to a <button> shows a focus ring, but clicking it with a mouse does not.

When you use :focus to style an element, it tells the browser to ignore its heuristic and to always show your focus style. For some situations this can break the user's expectation and lead to a confusing experience.

:focus-visible, on the other hand, will invoke the same heuristic that the browser uses when it's deciding whether to show the default focus indicator. This allows focus styles to feel more intuitive. In Chrome 86 and beyond, this should be all you need to style focus:

/* Focusing the button with a keyboard will show a dashed black line. */
button:focus-visible {
  outline: 4px dashed black;
}

By combining :focus-visible with :focus you can take things a step further and provide different focus styles depending on the user's input device. This can be helpful if you want the focus indicator to depend on the precision of the input device:

/* Focusing the button with a keyboard will show a dashed black line. */
button:focus-visible {
  outline: 4px dashed black;
}
  
/* Focusing the button with a mouse, touch, or stylus will show a subtle drop shadow. */
button:focus:not(:focus-visible) {
  outline: none;
  box-shadow: 1px 1px 5px rgba(1, 1, 0, .7);
}

The snippet above says that if the browser would normally show a focus indicator, then it should do so using a 4px dashed black outline. Additionally, the example relies on the existing :focus behavior and says that if an element has focus, but the browser would not normally show a default focus ring, then it should show a drop shadow.

Since the browser doesn't usually show a default focus ring when a user clicks on a button, the :focus:not(:focus-visible) pattern can be an easy way to specifically target mouse/touch focus.

Note that not all browsers set focus in the same way, so the above snippet will work in Chromium based browsers, but may not work in others.

The :focus-visible heuristic

Understanding the browsers’ heuristics for focus indicators will help you understand when to use :focus-visible. Unfortunately, the heuristic has never been specified, so the behavior is subtly different in every browser. The :focus-visible specification suggests one possible heuristic based on the behavior browsers currently demonstrate. Here's a quick breakdown:

Has the user expressed a preference to always see a focus indicator?

If the user has indicated that they always want to see a focus indicator, then :focus-visible will always match on the focused element, just like :focus does.

Does the element require text input?

:focus-visible will always match when an element which requires text input (for example, <input type="text">) is focused.

A quick way to know if an element is likely to require text input is to ask yourself "If I were to tap on this element using a mobile device, would I expect to see a virtual keyboard?" If the answer is "yes" then the element will match :focus-visible.

What input device is being used?

If the user is using a keyboard to navigate the page, then :focus-visible will match on any interactive element (including any element with tabindex) which becomes focused. If they're using a mouse or touch screen, then it will only match if the focused element requires text input.

Was focus moved programmatically?

If focus is moved programmatically by calling focus(), the newly focused element will only match :focus-visible if the previously focused element matched it as well.

For example, if a user presses a physical key, and the event handler opens a menu and moves focus to the first menu item, :focus-visible will still match and the menu item will have a focus style.

Because mouse users may frequently use keyboard shortcuts, Chrome's implementation will bypass "keyboard mode" if a meta key (such as command, control, etc.) is pressed. For example, if a user who was previously using a mouse pressed a keyboard shortcut which shows a settings dialog, :focus-visible would not match on the focused element in the settings dialog.

Support and polyfill

Currently, :focus-visible is only supported in Chrome 86 and other Chromium-based browsers, though there's work underway to add support to Firefox. Refer to the MDN browser compatibility table to keep track of current support.

If you'd like to use :focus-visible today you can do so with the help of the :focus-visible polyfill. Once the polyfill is loaded, you can use the .focus-visible class instead of :focus-visible to achieve similar results:

/* Define mouse/touch focus indicators. */
.js-focus-visible :focus:not(.focus-visible) {
  …
}
  
/* Define keyboard focus indicators. */
.js-focus-visible .focus-visible {
  …
}

Note that the MDN support table shows Firefox supports a similar selector known as :-moz-focusring which :focus-visible is based on; however the behavior between the two selectors is quite different and it's recommended to use the :focus-visible polyfill if you need cross-browser support.

Quick Focus Highlight

:focus-visible makes it easier for developers to selectively style focus and avoids pitfalls with the existing :focus selector. While this is a great addition to the developer toolbox, for a subset of users, particularly those with cognitive impairments, it can be helpful to always see a focus indicator, and they may find it distressing when the focus indicator appears less often due to selective styling with :focus-visible.

For these users, Chrome 86 adds a setting called Quick Focus Highlight.

Quick Focus Highlight temporarily highlights the currently focused element, and causes :focus-visible to always match.

To enable Quick Focus Highlight:

Go to Chrome's settings menu (or type chrome://settings into the address bar).
Click Advanced then Accessibility.
Enable the toggle switch to Show a quick highlight on the focused object.

Once Quick Focus Highlight is enabled, focused elements will show a white-blue outline with a blue glow. (See the image below.). The Highlight uses these alternating colors to ensure that it has proper contrast on any background.

The quick focus highlight on a white, black, and blue background. The rings are visible in all scenarios.

The Highlight is outset from the focused element to avoid interfering with that element's existing focus styles or drop shadows. The Highlight will fade out after two seconds to avoid obscuring page content, such as text.

FAQ

User-input can be multi-modal, for example some 2-in-1 laptops support mouse, keyboard, touch, and stylus. How does :focus-visible work with these devices?

Because :focus-visible uses the same heuristic as a default focus indicator, the experience should match what users expect on these platforms when they interact with unstyled HTML elements.

In other words, if developers use :focus-visible as their primary means to style focus, then the experience should be more consistent for all users regardless of their input device.

Does :focus-visible expose sensitive information?

Most of the time, :focus-visible matching only indicates that a user is using the keyboard, or has focused an element which takes text input.

:focus-visible could potentially be used to detect that a user has enabled a preference to always show a focus indicator, by tracking mouse and keyboard events and checking matches(":focus-visible") on elements which were focused when the keyboard is not being used. Since the precise details of when :focus-visible should match are left up to the browser's implementation, this would not be a completely reliable method.

What's the impact on users with low vision or cognitive impairments?

:focus-visible and the Quick Focus Highlight were designed to work together to help these users.

:focus-visible aims to address the common anti-pattern of developers removing the focus indicator from all of their controls. Using the browser's focus heuristic helps by creating fewer surprises for developers when the focus ring appears, meaning fewer reasons to use CSS to hide the ring.

For some users the browser's default behavior may still be insufficient. They may want to see a focus ring regardless of the type of control they're interacting with, or the input device they're using. That's where the Quick Focus Highlight can help.

The Quick Focus Highlight lets users increase the visibility of the focus indicator, and makes it so :focus-visible always matches, regardless of their input device. This combination of effects should make the currently focused element much easier to identify.

Why not have an "alway on" focus indicator?

The Quick Focus Highlight does not currently support an "always on" mode because it's difficult to design a universal focus overlay that does not obscure page content. As a result, the Highlight will fade out after two seconds, and rely on either the browser's default focus indicator, or the page author's :focus and :focus-visible styles.

Because the Highlight is a user preference its behavior can be changed in the future if users would prefer that it always stay on.

Should we also add :focus-visible-within?

There has been discussion around adding :focus-visible-within, but a proposal will require additional use cases. If you feel like you have a good use case for :focus-visible-within please add it to the discussion issue.

We welcome your feedback!

:focus-visible and the Quick Focus Highlight are the product of years of work and feedback from developers in the :focus-visible WICG repo and the standards bodies. We'd like to say thank you to everyone who helped shape these features.

Give :focus-visible and the new Highlight a shot, and tell us what you think.

If you've found an issue with the Quick Focus Highlight, attach a screenshot and send it to our support tracker.

If you've found an issue with :focus-visible, use this template to file a chromium bug.

Posted by Chao Chen, Software Engineer, Google Research

One of the greatest challenges faced by users who are visually impaired is identifying packaged foods, both in a grocery store and also in their kitchen cupboard at home. This is because many foods share the same packaging, such as boxes, tins, bottles and jars, and only differ in the text and imagery printed on the label. However, the ubiquity of smart mobile devices provides an opportunity to address such challenges using machine learning (ML).

In recent years, there have been significant improvements in the accuracy of on-device neural networks for various perception tasks. When coupled with the increased computing power in modern smartphones, it is now possible for many vision tasks to yield high performance while running entirely on a mobile device. The development of on-device models such as MnasNet and MobileNets (based on resource-aware architecture search) in combination with on-device indexing allows one to run a full computer vision system, such as labeled product recognition, entirely on-device, in real time.

Leveraging developments such as these, we recently released Lookout, an Android app that uses computer vision to make the physical world more accessible for users who are visually impaired. When the user aims their smartphone camera at the product, Lookout identifies it and speaks aloud the brand name and product size. To accomplish this, Lookout includes a supermarket product detection and recognition model with an on-device product index, along with MediaPipe object tracking and an optical character recognition model. The resulting architecture is efficient enough to run in real-time entirely on-device.

Why On-Device?
A completely on-device system has the benefit of being low latency and with no reliance on network connectivity. However, this means that for a product recognition system to be truly useful to the users, it must have a on-device database with good product coverage. These requirements drive the design of the datasets used by Lookout, which consist of two million popular products chosen dynamically according to the user’s geographic location.

Traditional Solutions
Product recognition using computer vision has traditionally been solved using local image features extracted by, for example, the SIFT algorithm. These non ML-based approaches provide fairly reliable matching but are storage intensive per index image (typically ranging from 10KB to 40KB per image) and are less robust to poor lighting and blur in images. Additionally, the local nature of these descriptors means that it typically does not capture more global aspects of the product’s appearance.

An alternative approach that has a number of advantages would be to use ML and run an optical character recognition (OCR) system over the query image and database images to extract the text present on the product packaging. The text on the query image can be matched to the database using N-Grams to be robust to OCR errors such as spelling mistakes, misrecognitions, failed recognition of words on product packaging. N-Grams can also allow for partial match between query document and index document using measures such as Jaccard similarity coefficient, as opposed to requiring an exact match. However, with OCR, the index document size can grow very large since one would need to store N-Grams for product packaging text along with other signals like TF-IDF. Furthermore, the reliability of the matches is a concern with the OCR+N-Gram approach since it can easily over trigger in situations where there are a lot of common words present on the packaging of two different products.

In contrast to both the SIFT and OCR+N-Gram methods, our neural network-based approach, which generates a global descriptor (i.e., an embedding) for each image, requires only 64 bytes, significantly reducing the storage requirements from the 10-40KB per image needed for each SIFT feature index entry, or the few KBs per image for the less reliable OCR+N-gram approach. With fewer bytes consumed for each index image, more products can be included as a part of the index, yielding more complete product coverage and a better overall user experience.

Design
The Lookout system consists of a frame cache, frame selector, detector, object tracker, embedder, index searcher, OCR, scorer and result presenter.

Product recognition pipeline internal architecture.

Frame cache
The frame cache manages the lifecycle of the input camera frames in the pipeline. It efficiently delivers the data, including YUV/RGB/gray images, as requested by the other model components and manages the data life cycle to avoid duplicated conversions for the same camera frame requested by multiple components.
Frame selector
When a user points the camera viewfinder towards a product, a lightweight IMU-based frame selector is run as a prefiltering stage. It selects the frames that best match a certain quality criterion (e.g., balanced image quality and latency) from the continuously incoming image stream, based on the jitter as measured by the angular rotation rate (deg/sec). This approach minimizes energy consumption by selectively processing only the high quality image frames and skipping the blurry frames.
Detector
Each selected frame is then passed to a product detector model, which proposes regions of interest(a.k.a. Detection bounding boxes) in the frames. The detector model architecture is a single-shot detector with an MnasNet backbone that strikes a balance between high quality and low latency.
Object tracker
MediaPipe Box tracking is used to track the detected box in real-time, and plays an important role in filling the gap between the detection of different objects and reducing the detection frequency, thus reducing energy consumption. The object tracker also maintains an object map in which each object is assigned a unique object ID during runtime, which are later used by the result presenter to differentiate between objects and to avoid repeating the announcement of a single object. For each detection result, the tracker either registers a new object in the map or updates an existing object with the detection bounding box, using the Intersection over Union (IoU) between existing object bounding boxes with the detection result.
Embedder
The regions of interest (ROIs) from the detector are sent to the embedder model, which then computes a 64-dimension embedding. The embedder model is initially trained from a large classification model (i.e., the teacher model, based on NASNet), which spans tens of thousands of classes. An embedding layer is added in the model to project the input image into an ‘embedding space’, i.e., a vector space where two points being close means that the images they represent are visually similar (e.g., two images show the same product). Analyzing only the embeddings ensures that the model is flexible and does not need to be retrained every time it is to be expanded to new products. However, because the teacher model is too large to be used directly on-device, the embeddings it generates are used to train a smaller, mobile-friendly student model that learns to map the input images to the same points in the embedding space as the teacher network. Finally, we apply principal component analysis (PCA) to reduce the dimensionality of the embedding vectors from 256 to 64, streamlining the embeddings for storing on-device.
Index searcher
The index searcher performs KNN search over a pre-built, compatible ScaNN index using a query embedding. As a result, it returns the top-ranked index documents containing their metadata, such as product names, packaging size, etc. To reduce the index lookup latency, all embeddings are k-means clustered into clusters. At query time, the relevant clusters of data are loaded in memory for the actual distance computation. To reduce the index size without sacrificing quality, we use product quantization at indexing time.
OCR
OCR is executed on the ROI for each camera frame in order to extract additional information, such as packet size, product flavor variant, etc. Whereas traditional solutions used the OCR result for index searching, here we only use it for scoring. A proper scoring algorithm informed by the OCR text assists the scorer (below) in determining the correct result and improves the precision, especially in the case where multiple products have similar packages.
Scorer
The scorer takes the input from the embeddings (with index results) and the OCR module and scores each of the previously retrieved index documents (embeddings and metadata retrieved via the index searcher). The top result after scoring is used as the final recognition from the system.
Result presenter
Result presenter takes in all the results above, and surfaces the results to users by speaking the product name via text-to-speech service.

Early experiments with on-device product recognition in a Swiss supermarket.

Conclusion/Future Work
The on-device system outlined here can be used to enable a spectrum of new in-store experiences, including the display of detailed product information (nutritional facts, allergens, etc.), customer ratings, product comparisons, smart shopping lists, price tracking, and more. We are excited to explore some of these future applications, while continuing research into advancing the quality and robustness of the underlying on-device models.

Acknowledgements
The work described here was authored by Abhanshu Sharma, Chao Chen, Lukas Mach, Matt Sharifi, Matteo Agosti, Sasa Petrovic and Tom Binder. This work wouldn’t have been possible without the support and help we received from Alec Go, Alessandro Bissacco, Cédric Deltheil, Eunyoung Kim, Haoran Qi, Jeff Gilbert and Mingxing Tan.

Giving users and developers more control over focus