AI & Visualization

Exploration of Projection Spaces

Dataset Origin & Context

Source: UCI Machine Learning Repository

Publication: Anguita et al., 2013

Institution: Smartlab - University of Genoa

Purpose: HAR from smartphone sensors for ambient assisted living

Collection: Controlled lab environment, 30 volunteers (ages 19-48)

Dataset Statistics

Samples

10,299

Features

561

Activities

6

Train/Test Split

70% (7,352) / 30% (2,947)

Activity Classes

Static (3)

STANDINGSITTINGLAYING

Dynamic (3)

WALKINGWALKING_DOWNWALKING_UP

Balanced distribution (~1,700 samples/activity)

Preprocessing Pipeline

1

Data Loading: Merged train/test splits (X_all, y_all)

2

Normalization: StandardScaler (zero mean, unit variance)

3

Label Encoding: Activity names → integers (0-5)

4

Features: All 561 pre-extracted features retained

Pre-Extracted Features

Time domain: mean, std, mad, max, min, sma, energy, iqr, entropy

Frequency domain: FFT coefficients, spectral energy, entropy, skewness, kurtosis

Signal processing: Butterworth filter for body/gravity separation

Jerk signals: Time derivatives of acceleration and angular velocity

Why This Dataset?

Combines high dimensionality (561D) with clear structure (6 activities). Pre-extracted features allow focus on visualization and pattern discovery rather than signal processing, perfect for comparing DR methods like PCA, t-SNE, UMAP, and autoencoders.

1 / 9
Click edges or use arrow keys