Code/Software available from Clayton Scott

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

If you use code found here in a publication or presentation, I would appreciate it if you would acknowledge the source.

Domain generalization by marginal transfer learning.
Mixture proportion estimation via kernel mean embedding: Technique for estimating the maximum proportion of one distribution present in another, based on kernel mean embedding
Mixture proportion estimation: ROC-based technique for estimating the maximum proportion of one distribution present in another
Sparse approximation of a kernel mean: For scaling kernel density estimates and kernel mean embeddings of distributions.
Robust kernel density estimation: Views the kernel density estimate as a mean in a Hilbert space, and estimates the mean robustly via M-estimation
Surrogate losses for label dependent costs: Figures.
Cluster nearest neighbor algorithm for file matching, and associated EM algorithm for fitting a mixture of PPCA model with missing attributes.
TCEM: EM algorithm for fitting a multivariate Gaussian mixture model with truncated and censored data.
Nested support vector machines for cost-sensitive and one-class classification
SVM path algorithms for cost-sensitive and one-class classification
$L_2$ kernel classification, optimizing the integrated squared error of the difference of densities
MN-SCAnn: Nonparametric annotation of multivariate, contaminated data
Weighted L2E for partial mixture estimation
2$\nu$-SVM, a cost-sensitive extension of the $\nu$-SVM
Dyadic decision trees with free-splits, for classification and other set estimation problems
COPAP: Cyclic order preserving assigment problem for shape matching

Details

Click the link to download.

Domain generalization my marginal transfer learning. Implements the method described in
G. Blanchard, A. Deshmukh, U. Dogan, G. Lee, and C. Scott, ``Domain Generalization by Marginal Transfer Learning."
Mixture proportion estimation via kernel mean embedding. This code implements the algorithm described in
H. Ramaswamy, C. Scott, and A. Tewari, "Mixture Proportion Estimation via Kernel Embedding of Distributions," avXiv:1603.02501.
The code is in python 2.7 and requires the scipy, numpy, matplotlib, and cvxopt packages.
Mixture proportion estimation (version 2). This code implements the algorithm described in
C. Scott, ``A Rate of Convergence for Mixture Proportion Estimation, with Application to Learning from Noisy Labels," AISTATS 2015.
Under the hood this code contains a scalable implementation (programmed by Daniel LeJeune) of kernel logistic regression using random Fourier features, which should be useful in a number of other contexts.
Sparse approximation of a kernel mean.
E. Cruz Cortes and C. Scott, ``Sparse approximation of a kernel mean."
Robust kernel density estimation.
J. Kim and C. Scott, ``Robust kernel density estimation, Journal of Machine Learning Research, vol. 13, pp. 2529-2565, 2012.
Surrogate losses for label-dependent costs. Generates the figures in this paper:
C. Scott, "Calibrated Surrogate Losses for Classification with Label-Dependent Costs," Electronic Journal of Statistics, vol. 6, pp. 958-992, 2012.
Cluster nearest neighbor algorithm for file matching, and associated EM algorithm for fitting a mixture of PPCA models with missing attributes.
G. Lee, W. Finn, and C. Scott, "Statistical file matching of flow cytometry data," J. Biomedical Informatics, vol. 44, no. 4., pp. 663-676, 2011.
TCEM: EM algorithm for fitting a multivariate Gaussian mixture model with truncated and censored data.
G. Lee and C. Scott, ``EM algorithms for multivariate Gaussian mixture models with truncated and censored data," Computational Statistics and Data Analysis, vol. 56, no. 9, pp. 2816-2829, 2012.
Nested support vector machines: Matlab code to generate cost-sensitive and one-class SVMs that are properly nested (unlike standard SVMS) as the cost-asymmetry or density level parameter is varied. The solution paths are piecewise linear with a user-selected number of breakpoints.
G. Lee and C. Scott, ``Nested support vector machines," to be published in IEEE Trans. Signal Processing.
SVM path algorithms: Matlab code to generate solution paths for the cost-sensitive SVM with varying cost-asymmetry, and the one-class SVM with varying density level parameter. The algorithms were inspired by the path algorithm of Hastie et al., which varies a regularization parameter, and were implemented for comparison with the nested SVM code above. The OC-SVM path algorithm was detailed here:
G. Lee and C. Scott, ``The one class support vector machine solution path," Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007), vol. 2, II-521--II-524, Honolulu, USA, April 2007.
The CS-SVM algorithm is different from the one developed by Bach et al. in that we capture the cost asymmetry in a single parameter. This algorithm first finds the path of the regularization parameter when the cost asymmetry parameter is set to a specific value (the negative sample size divided by the total sample size). Then, for any fixed value of the regularization parameter, it finds the solution path as the cost asymmetry parameter varies. the first of these two path algorithms is detailed in the following class project report by Gyemin.
G. Lee, ``The Solution Path for the Balanced 2C-SVM," EECS 559 Class Project Report, University of Michigan, Fall 2006.

The second path algorithm has no documentation, but follows similar principles to the other algorithms.
L_2 kernel classification: Matlab code to implement a method of classification based on the $L_2$ distance or integrated squared error, and detailed in
J. Kim and C. Scott, ``$L_2$ kernel classification," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 10, Oct. 2010, 1822 - 1831.
MN-SCAnn: Matlab code for for nonparametric annotation of multivariate, contaminated data, detailed here:
C. Scott and E. Kolaczyk, ``Nonparametric assessment of contamination in multivariate data using generalized quantile sets and FDR," J. Computational and Graphical Statistics, June 1, 2010, 19(2): 439-456.
Partial mixture estimation: R code and documentation for semi-parametric partial mixture estimation using a weighted L2 distance, applied to microarray differential expression analysis and detailed in
D. Rossell, R. Guerra and C. Scott, ``Semi-parametric differential expression analysis via partial mixture estimation," Statistical Applications in Genetics and Molecular Biology, vol. 7, no. 1, article 15, 2008.
2nu-SVM: An implementation of the 2$\nu$-SVM, a cost-sensitive extension of the $\nu$-SVM, based on the LIBSVM package and described in this paper:
M. Davenport, R. Baraniuk, and C. Scott, ``Tuning support vector machines for minimax and Neyman-Pearson classification," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 10, Oct. 2010, 1888-1898.
Dyadic decision trees: Matlab/mex code for solving several set estimation problems, including traditional binary and multi-class classification, Neyman-Pearson classification, minimum volume set estimation, and density level set estimation. Estimates are based on free-split recursive dyadic partitions. Practical for problems of dimension less than 10. Thanks to Gilles Blanchard for helpful discussions regarding the implementation of the dyadic binning algorithm.
C. Scott and R. Nowak, ``Minimax-optimal classification with dyadic decision trees," IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1335--1353, April 2006.
C. Scott and R. Nowak, ``Learning minimum volume sets," Journal of Machine Learning Research, vol. 7, pp. 665--704, April 2006.
Cyclic contour matching: Matlab code for aligning two point sets obtained by sampling cyclic contours. Implements the algorithms and reproduces the examples found in this paper:
C. Scott and R. Nowak, ``Robust contour matching via the order preserving assignment problem," IEEE Transactions on Image Processing, vol. 15, no. 7, pp. 1831-1838, July 2006.

This work was supported in part by NSF Awards 0830490 and 0953135.

Menu

Details