LSM-2: Learning from incomplete wearable sensor data

LSM-2: Learning from incomplete wearable sensor data

LSM-2 with Adaptive and Inherited Masking (AIM), a novel self-supervised learning method that performs well in classification, regression, and generative tasks without explicit imputation, learns directly from incomplete wearable sensor data. By providing continuous, multimodal physiological and behavioral data, such as heart signals, sleep patterns, activity levels, and stress indicators, wearable devices have revolutionized health monitoring. Due to advances in sensor technology, it is increasingly feasible to capture a large volume of data, but the cost of labeling remains high, requiring real-time user annotations or laborious clinical studies. By directly utilizing unlabeled data to learn underlying structures, such as subtle physiological relationships, self-supervised learning (SSL) addresses this limitation. SSL has the potential to facilitate the development of foundation models that produce rich and generalizable representations useful for a wide range of subsequent health tasks when applied at a scale. State-of-the-art SSL methods, on the other hand, assume complete, uninterrupted data when applied to the wearable domain.

This is rare in real-world wearable sensor streams, where gaps inevitably occur as a result of device removal, charging, intermittent loosening, motion artifacts, battery-saving modes, or environmental noise, which we refer to as “missingness.” In fact, none of our 1.6 million day-long windows contained zero missing samples. Historically, the challenge of fragmented data has forced researchers to rely on either imputation methods to fill missing segments, or aggressive filtering to remove instances with incomplete data. In light of the possibility of introducing unintended biases and the loss of valuable data, neither approach is ideal. We present Adaptive and Inherited Masking (AIM), a novel SSL training framework that learns directly from incomplete data, in “LSM-2: Learning from Incomplete Wearable Sensor Data.” Rather than treating data-gaps as erroneous measures that must be filled in, AIM learns directly from incomplete recordings by treating missingness as a natural artifact of real-world data. Leveraging AIM, we develop a Large Sensor Model (LSM-2) that improves upon our previous foundation model for wearable sensor data (LSM-1, presented at ICLR ‘25). We demonstrate that LSM-2 achieves strong performance even when sensors fail or temporal windows are removed, exhibiting significantly less degradation than models trained on imputed data.

Using adaptive inherited masking to take AIM AIM’s innovative approach to addressing the inevitable gaps in real-world sensor data is at the heart of its innovation. AIM embraces these gaps as inherent characteristics of wearable data, in contrast to conventional SSL methods, which either discard incomplete data or attempt to fill in missing values. AIM reconstructs masked input samples to learn the underlying structure of sensor data as an extension of the masked autoencoder (MAE) pre-training framework. However, while traditional MAE methods rely on a fixed masking ratio to enable the efficient drop out of masked tokens (i.e., a fixed number of masked tokens are not passed through the encoder thus reducing computational complexity), fragmentation in sensor data is unpredictable, resulting in a variable number of masked tokens. By combining attention masking and token drop out, AIM addresses this fundamental problem with wearable data. The set of tokens to be masked during pre-training consists of those inherited from the wearable sensor data and inherent in it, as well as those deliberately masked for the reconstruction training objective. AIM first applies drop out to a fixed number of masked tokens, improving pre-training computational efficiency by reducing the sequence length processed by the encoder. AIM then adaptively handles any remaining masked tokens — whether naturally missing or part of the reconstruction task — via attention masking in the encoder’s transformer block. AIM employs attention masking for all masked tokens during discriminative task fine-tuning and evaluation, where masked tokens only consist of naturally occurring data gaps. Through this dual masking approach, and by treating naturally occurring and artificially masked tokens as equivalent, AIM teaches the model to work with variable fragmentation inherent to wearable sensors.

Training and evaluation

We leverage a dataset with 40 million hours of wearable data sampled from over 60,000 participants during the period from March to May 2024. In order to ensure that participant information was removed and privacy was maintained, the dataset was thoroughly anonymized or de-identified. The subjects consented to the use of their data for the research and development of new health and wellness products and services and wore a variety of smartwatches and trackers from Fitbit and Google Pixel. The subjects were asked to self-report sex, age, and weight.

We use the AIM SSL method, which was discussed in the previous section, to pre-train LSM-2. AIM implements a masked reconstruction training objective, and learns to understand data that is naturally missing, and impute data that is artificially masked. This unified framework allows LSM-2 to learn the underlying structure (including missingness) inherent in wearable sensor data.

We curate a set of downstream tasks to evaluate the pre-trained model, using meta-data that was collected alongside the sensor signals for the purposes of research and development. These include self-reported diagnoses of hypertension and anxiety as well as user-annotated activities from a set of 20 different categories, such as running, skiing, kayaking, and golf. These data were split into fine-tuning and evaluation sets where data from each individual was only in either the tuning or the evaluation set and not both. Data from individuals used in the pretraining stage was also not included in the fine-tuning or evaluation stages

The generative capabilities of LSM-2 are evaluated through the tasks of random imputation, temporal interpolation, temporal extrapolation (forecasting), and sensor imputation, described in our LSM-1 work.

The utility of the LSM-2 embeddings are evaluated via linear probe on a number of discriminative tasks. Specifically we gauge the applicability of the LSM-2 embeddings to the tasks of binary hypertension classification, binary anxiety classification, and 20-class activity recognition. We evaluate LSM-2’s ability to model physiology via age and BMI regression tasks.

Key results

The AIM-based LSM-2 model demonstrates remarkable versatility, outperforming its predecessor, LSM-1, across three key areas: classifying health conditions and activities (like hypertension, anxiety, and 20-class activity recognition), reconstructing missing data (e.g., recovery of missing sensor signals), and predicting continuous health metrics (like BMI with improved correlation). Additional comparisons to supervised and pre-trained baselines may be found in our paper.

LSM-2 excels in realistic scenarios where sensors fail or data is incomplete. The figure below simulates situations where whole sensor feeds or data for entire portions of the day may be missing. This mirrors the reality that different wearables may host different sensor load-outs, or that an individual may only wear their device for portions of the day. LSM-2, which is based on AIM, is found to be more resistant to these ablations than LSM-1. Finally, LSM-2 exhibits improved scaling across users, data volume, compute, and model size as compared to LSM-1. While its predecessor shows signs of plateauing, LSM-2 continues to improve with more data and has yet to saturate.

Conclusion

The LSM-2 foundation model, pre-trained with AIM represents progress towards more useful and usable wearable health technology. Fundamentally, AIM teaches LSM-2 to understand and leverage the natural gaps in real-world sensor streams to derive reliable insights from imperfect data. This innovation means wearable AI can finally embrace the messy reality of sensor data, preserving data integrity, while utilizing all available information.