Starting values for optimization

Nonlinear regression and other optimization algorithms are iterative and require proper starting values, which are not easy to find. A new article [1] gives some new insight into this topic.

References

  1. F. Vogt, "A self-guided search for good local minima of the sum-of-squared-error in nonlinear least squares regression", Journal of Chemometrics, pp. n/a-n/a, 2014. http://dx.doi.org/10.1002/cem.2662

New software

Despite of vacation time, we can found a description of two new interesting pieces of software: PML: A Parallel Machine Learning Toolbox for Data Classification and Regression [1] and Hot PLS – a framework for Hierarchically Ordered Taxonomic classification by Partial Least Squares [2].

References

  1. R. Jing, J. Sun, Y. Wang, M. Li, and X. Pu, "PML: A Parallel Machine Learning Toolbox for Data Classification and Regression", Chemometrics and Intelligent Laboratory Systems, 2014. http://dx.doi.org/10.1016/j.chemolab.2014.07.005
  2. K.H. Liland, A. Kohler, and V. Shapaval, "Hot PLS – a framework for Hierarchically Ordered Taxonomic classification by Partial Least Squares", Chemometrics and Intelligent Laboratory Systems, 2014. http://dx.doi.org/10.1016/j.chemolab.2014.07.010

Partial Least Squares-Slice transform hybrid model

This new idea is recently proposed in Chemolab [1]. As PLS-SLT hybrid model is equivalent to the PLS-based piecewise linear model in the y-space, it sounds interesting for everyone involved in multivariate calibration.

References

  1. P. Shan, S. Peng, Y. Bi, L. Tang, C. Yang, Q. Xie, and C. Li, "Partial Least Squares-Slice transform hybrid model for nonlinear calibration", Chemometrics and Intelligent Laboratory Systems, 2014. http://dx.doi.org/10.1016/j.chemolab.2014.07.015

Baseline filtering in NMR

A recent article [1] by Yaroshchyk and Eberhardt deals with some problems of baseline filtering in NMR spectral data. As a starting point, I recommend this article [2] together with further discussion [3] [4] and new findings [5] [6].

References

  1. P. Yaroshchyk, and J.E. Eberhardt, "Automatic correction of continuum background in LIBS using a model-free algorithm", Spectrochimica Acta Part B: Atomic Spectroscopy, 2014. http://dx.doi.org/10.1016/j.sab.2014.06.020
  2. . Komsta, "Comparison of Several Methods of Chromatographic Baseline Removal with a New Approach Based on Quantile Regression", Chromatographia, vol. 73, pp. 721-731, 2011. http://dx.doi.org/10.1007/s10337-011-1962-1
  3. Z. Zhang, and Y. Liang, "Comments on the Baseline Removal Method Based on Quantile Regression and Comparison of Several Methods", Chromatographia, vol. 75, pp. 313-314, 2012. http://dx.doi.org/10.1007/s10337-012-2192-x
  4. . Komsta, "Response to Letter to the Editor Regarding: Comparison of Several Methods of Chromatographic Baseline Removal with a New Approach Based on Quantile Regression", Chromatographia, vol. 75, pp. 315-316, 2012. http://dx.doi.org/10.1007/s10337-012-2191-y
  5. . Górski, F. Ciepiela, and M. Jakubowska, "Automatic baseline correction in voltammetry", Electrochimica Acta, vol. 136, pp. 195-203, 2014. http://dx.doi.org/10.1016/j.electacta.2014.05.076
  6. K.H. Liland, E. Rukke, E.F. Olsen, and T. Isaksson, "Customized baseline correction", Chemometrics and Intelligent Laboratory Systems, vol. 109, pp. 51-56, 2011. http://dx.doi.org/10.1016/j.chemolab.2011.07.005

Random Forests with missing data

Random Forests are not very popular technique in chemometrics, but there are reports of its use in QSRR [1], NIR multivariate calibration [2] and metabolomics [3]. The problem of missing data in this technique (together with variable selection) is a topic of new article in CSDA [4] by Hapfelmeier and Ulm. Enjoy reading!

References

  1. T. Hancock, R. Put, D. Coomans, Y. Vander Heyden, and Y. Everingham, "A performance comparison of modern statistical techniques for molecular descriptor selection and retention prediction in chromatographic QSRR studies", Chemometrics and Intelligent Laboratory Systems, vol. 76, pp. 185-196, 2005. http://dx.doi.org/10.1016/j.chemolab.2004.11.001
  2. D. Donald, D. Coomans, Y. Everingham, D. Cozzolino, M. Gishen, and T. Hancock, "Adaptive wavelet modelling of a nested 3 factor experimental design in NIR chemometrics", Chemometrics and Intelligent Laboratory Systems, vol. 82, pp. 122-129, 2006. http://dx.doi.org/10.1016/j.chemolab.2005.05.013
  3. M. Eliasson, S. Rannar, and J. Trygg, "From Data Processing to Multivariate Validation - Essential Steps in Extracting Interpretable Information from Metabolomics Data", CPB, vol. 12, pp. 996-1004, 2011. http://dx.doi.org/10.2174/138920111795909041
  4. A. Hapfelmeier, and K. Ulm, "Variable selection by Random Forests using data with missing values", Computational Statistics & Data Analysis, vol. 80, pp. 129-139, 2014. http://dx.doi.org/10.1016/j.csda.2014.06.017

K-CM neural network

The group of prof. Todeschini presented new method called K-CM [1]. It combines an neural network approach with sample fuzzing profiling and k-NN. Indeed, interesting idea.

References

  1. M. Buscema, V. Consonni, D. Ballabio, A. Mauri, G. Massini, M. Breda, and R. Todeschini, "K-CM: a new artificial neural network. Application to supervised pattern recognition", Chemometrics and Intelligent Laboratory Systems, 2014. http://dx.doi.org/10.1016/j.chemolab.2014.06.013

New approaches in MCR-ALS

People involved in MCR-ALS methodology should take a look to two interesting recent papers. First describes performance of the method in quadrilinear constraints with noise [1], the second one proposes algorithm for incomplete datasets [2]. Both papers are authored by inventors and main developers of MCR-ALS methodology, so enjoy reading!

References

  1. A. Malik, and R. Tauler, "Performance and validation of MCR-ALS with quadrilinear constraint in the analysis of noisy datasets", Chemometrics and Intelligent Laboratory Systems, vol. 135, pp. 223-234, 2014. http://dx.doi.org/10.1016/j.chemolab.2014.04.002
  2. M.D. Luca, G. Ragno, G. Ioele, and R. Tauler, "Multivariate curve resolution of incomplete fused multiset data from chromatographic and spectrophotometric analyses for drug photostability studies", Analytica Chimica Acta, vol. 837, pp. 31-37, 2014. http://dx.doi.org/10.1016/j.aca.2014.05.056

Independence in high dimensional data

In new issue of Statistics & Probability Letters, Mao proposes a new test for independence in high dimensional data [1]. This approach can be useful in some chemometrics applications (even if it is only another approach besides existing ones, for example this referenced article [2]).

References

  1. G. Mao, "A new test of independence for high-dimensional data", Statistics & Probability Letters, vol. 93, pp. 14-18, 2014. http://dx.doi.org/10.1016/j.spl.2014.05.024
  2. J.R. Schott, "Testing for complete independence in high dimensions", Biometrika, vol. 92, pp. 951-956, 2005. http://dx.doi.org/10.1093/biomet/92.4.951

LMM and goodness of fit

Linear Mixed Models (LMM) are not so often used in chemometrics. However, I would like to point the attention to a very interesting article proposing goodness of fit tests for such models [1]. It really widens understanding of them and stimulates to further study.

References

  1. M. Tang, E.V. Slud, and R.M. Pfeiffer, "Goodness of fit tests for linear mixed models", Journal of Multivariate Analysis, vol. 130, pp. 176-193, 2014. http://dx.doi.org/10.1016/j.jmva.2014.03.012

SIMCA extension

A. Pomerantsev and O. Rodionova propose in newest Journal of Chemometrics a new method for type II error calculation in SIMCA [1]. If someone is new to SIMCA, should start the study from this chapter [2].

References

  1. A.L. Pomerantsev, and O.Y. Rodionova, "On the type II error in SIMCA method", Journal of Chemometrics, vol. 28, pp. 518-522, 2014. http://dx.doi.org/10.1002/cem.2610
  2. S. WOLD, and M. SJÖSTRÖM, "SIMCA: A Method for Analyzing Chemical Data in Terms of Similarity and Analogy", ACS Symposium Series, pp. 243-282, 1977. http://dx.doi.org/10.1021/bk-1977-0052.ch012