Exploring the Scaling Laws of Consumer EEG: Introducing Alljoined-1.6M

Jonathan Xu

2026-03-09

Today, we are formally introducing Alljoined-1.6M, a large EEG-image dataset comprising over 1.6 million visual stimulus trials collected from 20 participants on consumer-grade hardware. As the largest consumer-grade EEG-image dataset to date, it provides a foundation for analyzing the tradeoff between dataset scale and hardware quality in neural decoding.

In this post, we share our analysis of the dataset to contribute to the field of neural decoding. We find that deep neural networks can extract semantic content from consumer EEG and that AI scaling laws do apply to low-SNR hardware. However, the scaling exponent is significantly lower. Because of this lower signal-to-noise ratio, consumer-grade data yields smaller incremental gains. As we train on more data, this penalty compounds; by the upper bounds of our dataset, achieving the same accuracy requires over 20x the data volume of lab-grade hardware.

Use the dataset

Full Paper

The Challenge: High Fidelity vs. Accessibility

The ability to decode rich visual information from the human brain is a critical frontier in brain-computer interface (BCI) research. This has prompted a push to drastically scale brain scan datasets for machine learning models, particularly in the EEG domain. Naturally, this raises questions about the tradeoff between dataset size and signal quality. Can sheer scale overcome the noise of affordable hardware to facilitate state-of-the-art deep neural decoding?

Answering these questions allows us to understand whether emergent behaviors and improved performance manifest as we scale dataset size, analogous to trends in text and vision domains. Because neural data involves complex nuances in hardware, experiment design, and signal processing, evaluating dataset quality in isolation is difficult. Therefore, our work is motivated by reproducing a popular baseline dataset, Things-EEG2, using consumer-grade hardware to establish a direct comparison.

Experimental design

To directly compare consumer and professional EEG hardware, we recreated a baseline experiment under identical conditions using drastically different equipment:

The Hardware: We recorded our dataset using the Emotiv Epoc Flex 2 Gel, a $2,200, 32-channel wireless consumer headset. The baseline dataset used Brainvision ActiChamp, a $60,000, 64-channel lab-grade system with active electrodes designed to minimize signal noise.
The Experiment: Participants watched rapid-fire sequences of 20 images, with each image flashing on the screen for 100ms each.
The Focus Task: To ensure participants were actually paying attention without distracting them, we gave them a simple task: in 6% of the sequences, we would present a picture of Woody from Toy Story. At the end of a sequence, they would be asked to report if they spotted him. We would be able to measure reaction time and accuracy from this task as well.

Diagram of the experimental paradigm: each participant completed 4 sessions of 19 blocks each, with 51 sequences per block in test and 56 in train. Each sequence consists of a rapid serial visual presentation of 20 images at 100ms each, with an occasional Woody from Toy Story image used as a focus task for attention verification.

Understanding the Data

While consumer-grade devices lower the financial barrier to entry, they traditionally suffer from a reduced signal-to-noise ratio (SNR). To validate the physiological authenticity of our consumer-grade data, we conducted several analyses prior to training EEG-to-image reconstruction models.

1. ERP analysis We divided the continuous EEG recordings into one-second segments, starting at the exact moment a participant saw a new image. By averaging these segments together, we extracted Event-Related Potentials (ERPs) to observe global signal patterns. Consistent with visual processing, we expected the strongest activity in the occipital cortex. As shown in our analysis, occipital channels exhibited a distinct early visual response, with a negative deflection at 100ms followed by a prominent positive peak between 125ms and 150ms. This confirms the hardware captured physiologically meaningful data related to visual processing.

Event-Related Potential waveforms from occipital EEG channels showing a negative deflection at 100ms followed by a prominent positive peak between 125ms and 150ms, confirming physiologically meaningful visual processing signals in the consumer-grade data

2. Pairwise LDA decoding To confirm statistically significant differences between image categories, we conducted pairwise linear discriminant analyses. In both Alljoined-1.6M and Things-EEG2, decoding performance was well above chance. Notably, the successful decoding clusters focus mostly on the 220 ms to 400 ms window, a timeframe we know is associated with later-stage, higher-level visual processing.

Pairwise linear discriminant analysis decoding accuracy heatmaps for Alljoined-1.6M and Things-EEG2, showing above-chance classification performance concentrated in the 220ms to 400ms post-stimulus window associated with higher-level visual processing

3. Verifying the Signal: Where is the model looking? To trust our decoding results, we needed to know where our model was pulling its signal from. We used “saliency maps” as an interpretability tool to visualize what the model pays attention to. Across all image categories, it consistently homed in on the locations at the back of the head around 200 milliseconds post-stimulus. This matches established neuroscience understanding for visual processing. It confirms that even on low-cost consumer hardware, deep learning models are anchoring their predictions in true visual cortex activity rather than experimental artifacts.

Saliency maps showing model attention across EEG channels and time, consistently highlighting occipital electrode locations around 200ms post-stimulus, confirming the model relies on true visual cortex activity rather than artifacts

Decoding Semantics and Scaling Laws

We trained various deep learning methods to reconstruct original images from EEG data. To measure true visual quality, we had human reviewers test the outputs by visually matching the reconstructions to the originals (2AFC identification task). Using our ENIGMA architecture, these human raters achieved an 83.06% identification accuracy on the lab-grade THINGS-EEG2 dataset, compared to a 65.43% accuracy on the Alljoined-1.6M dataset.

Grid of image reconstruction results on Alljoined-1.6M, showing original stimulus images alongside their EEG-based reconstructions, from the ENIGMA, Perceptogram, and ATM-S models

Well-established neural scaling laws demonstrate that deep learning models predictably improve in a log-linear fashion as dataset size increases. We tested whether these same laws applied to low-SNR brain data by training the ENIGMA model on progressively larger subsets of our dataset. We found that decoding performance increased linearly with the log-trial count and showed no signs of saturating at the full dataset size.

As expected, models trained less efficiently on the consumer-grade recordings due to the inherently noisier signals of lower-cost hardware. Yet, the affordability of these devices allows for data collection on a massive scale, enabling sheer volume that partially compensates for reduced signal clarity.

Log-linear scaling plot showing decoding performance vs. trial count for both consumer-grade Alljoined-1.6M and lab-grade Things-EEG2, demonstrating that neural scaling laws apply to both but with a lower scaling exponent for consumer hardware, requiring over 20x more data to match lab-grade performance

Conclusion

Historically, breakthroughs in understanding the brain have been bottlenecked by the prohibitive costs of pristine data collection. Alljoined-1.6M explores an alternative paradigm by optimizing instead for massive scale and hardware accessibility. We demonstrate that scale alone can enable advanced semantic decoding and image reconstruction on $2,200 hardware, proving there is still valuable signal in lower-quality data.

However, our findings also strongly advocate for improving data quality and collection techniques before scaling naively. We note that contact impedance and amplifier quality explain the poor scaling performance on consumer EEG hardware, leading to a severe drop in scaling efficiency. In fact, the performance of ENIGMA trained at the absolute limits of our consumer-grade dataset is matched by a lab-grade BrainVision system using about 23x less data.

To unlock the real-world potential of neural decoding, we need the ability to extract highly expressive, semantically rich data using hardware that is both scalable and of uncompromising quality. Moving forward, our internal efforts are entirely focused on this intersection. At our company, we are actively pursuing multi-modal experiments designed to truly decode thought, maintaining a strict, rigorous focus on improving data quality and fundamentally understanding where the true signal originates.

We are very excited to share the Alljoined-1.6M dataset with the research community to build stronger intuitions for exploring high-level semantic decoding on consumer-grade hardware.

Appendix

Hardware Ablations: Channel Count and Signal Quality

To isolate the variables driving the performance gap between the datasets, we analyzed specific hardware differences.

Electrode Density (64 vs. 32 channels)

We performed an ablation analysis to evaluate how channel count affects decoding performance by sub-sampling the 64-channel baseline dataset while maintaining coverage over the occipital cortex.

Bar chart showing ENIGMA decoding performance as a function of EEG channel count, demonstrating that performance drops with fewer channels but gains plateau after 24 channels

While performance drops with fewer channels, it is not the primary factor explaining the dataset differences. Crucially, performance gains began to drop off after 24 channels. This suggests future real-world BCIs might achieve robust decoding performance with highly streamlined channel arrays.

Diagrams of different 32-channel electrode layout configurations on the scalp, showing that Layout 1 with strategic coverage of both occipital and ventral streams performed best for semantic object recognition

Furthermore, our experiments with 32-channel configurations revealed that optimal layouts require strategic coverage of both the occipital and the ventral streams to capture the specific spatial nuances of semantic object recognition. In this experiment, Layout 1 performed the best.

Electrode and Amplifier Noise

The performance disparity is fundamentally driven by contact quality and hardware noise. Scalp EEG signals are extremely faint, typically ranging from 10 to 100 microvolts (µV).

Electrodes: The BrainVision uses active electrodes, allowing contact impedance to be reliably measured and kept below 10 kOhms. Emotiv relies on passive electrodes where impedance can easily exceed 100 kOhms, leading to significant signal loss.
Amplifier: The lab-grade BrainVision amplifier introduces roughly 2.5 µV RMS of baseline noise. Conversely, the cheaper components in the Emotiv introduce 30-50 µV RMS noise. Our ablations confirm that semantic decoding capabilities degrade rapidly once hardware noise exceeds 5 µV RMS.

Taken together, this variance in noise and contact quality significantly degrades sample efficiency at scale. By the upper bounds of our 1.6M-trial dataset, the consumer-grade hardware proved over 20x less data-efficient than the lab-grade baseline for reaching equivalent performance.

Exploring the Scaling Laws of Consumer EEG: Introducing Alljoined-1.6M

Use the dataset

Full Paper

The Challenge: High Fidelity vs. Accessibility

Experimental design

Understanding the Data

Decoding Semantics and Scaling Laws

Conclusion

Appendix

Hardware Ablations: Channel Count and Signal Quality

Electrode and Amplifier Noise

Share

Recent Articles

Exploring the Scaling Laws of Consumer EEG: Introducing Alljoined-1.6M

Toward accessible, real-time brain decoding: Introducing ENIGMA