Wildfires and Combustion

Learning the Tar–Char Continuum in Wildfire Smoke from SP2 Waveforms

Harshit Gujral

M. Mhanna [1], T. Sipkens[1], J. Corbin[1]

Department of Computer Science, University of Toronto, Toronto, Canada.

Background: Emerging evidence suggests that a substantial share of particles in wildfire smoke lies along the tar–char continuum rather than the soot continuum. Quantifying and understanding this continuum will allow the accurate quantification and modelling of the climate impacts, atmospheric chemistry, and toxicology of wildfire smoke.

Aim: We conduct a series of experiments using autoencoder machine learning models and develop a training and validation framework to classify tar-ball, charball, and black carbon in the wildfire smoke using the measurements from SP2.

Methods: We process four-channel SP2 waveforms, remove low-quality and saturated events, align the particle signals, and use an autoencoder to learn a compact representation of each event. In the current framework, incandescent and non-incandescent particles are handled separately, and similar events are then grouped in the learned space and assigned to particle families. To evaluate these assignments, we join clustered events with labeled reference data, collapse the truth labels into three target classes, and assess agreement using confusion matrices, balanced accuracy, macro-F1, class composition, and latent-space visualizations.

Results: The framework recovers a clear structure in the SP2 data and shows a considerable difference between the two incandescent and non-incandescent branches. Incandescent clusters are highly pure, while non-incandescent clusters are more mixed. Performance is strongest for black carbon, whereas most confusion appears between tar-ball and charball-like particles, especially in the non-incandescent subset. The cluster-based labeling reaches a balanced accuracy of 0.62, while an auxiliary feature-based audit reaches 0.67, indicating that the learned representation captures useful particle-level information but that separation within the tar–char side remains more difficult.

Implication: These results show that autoencoder-based representation learning, combined with a structured validation framework, is a promising way to classify wildfire smoke particles from SP2 measurements. The approach already black carbon reliably and makes visible where tar-ball and charball separation remains uncertain, providing a path for improving labels, training choices, and physically grounded validation.