Monte Carlo Methods and Their Applications in Big Data Analysis

Due to recent emergence of many big data problems, Monte Carlo methods tend to become powerful tools for analyzing big data sets. In this chapter, we first review the fundamental principles of Monte Carlo methods. Then, we describe several popular variance reduction techniques, including stratified sampling, control variates, antithetic variates, and importance sampling, to improve Monte Carlo sampling efficiency. Finally, application examples of estimation of sum, Monte Carlo linear solver, image recovery, matrix multiplication, and low-rank approximation are shown as case studies to demonstrate the effectiveness of Monte Carlo methods in data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic €32.70 /Month

Buy Now

Price includes VAT (France)

eBook EUR 96.29 Price includes VAT (France)

Softcover Book EUR 126.59 Price includes VAT (France)

Hardcover Book EUR 126.59 Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

Statistical Leveraging Methods in Big Data

Chapter © 2018

General Introduction to Monte Carlo and Multi-level Monte Carlo Methods

Chapter © 2019

Comparative Study of Inference Methods for Bayesian Nonnegative Matrix Factorisation

Chapter © 2017

References

  1. M. Hilbert, P. López, The world’s technological capacity to store, communicate, and compute information. Science 332, 60–65 (2011) ArticleGoogle Scholar
  2. R.L. Burden, J.D. Faires, Numerical Analysis (Brooks/Cole, Cengage Learning, Boston, 2011) MATHGoogle Scholar
  3. J.M. Hammersley, D.C. Handscomb, Monte Carlo Methods (Chapman and Hall, Methuen & Co., London, and Wiley, New York, 1964) BookMATHGoogle Scholar
  4. J.S. Liu, Monte Carlo Strategies in Scientific Computing (Springer, New York, 2008) MATHGoogle Scholar
  5. Y. Li, M. Mascagni, Grid-based Monte Carlo application, in Grid Computing Third International Workshop/Conference (2002), pp. 13–24 Google Scholar
  6. Y. Li, M. Mascagni, Analysis of large-scale grid-based Monte Carlo applications. Int. J. High Perform. Comput. Appl. 17, 369–382 (2003) ArticleGoogle Scholar
  7. G.E. Forsythe, R.A. Leibler, Matrix inversion by a Monte Carlo method. Math. Comput. 4, 127–129 (1950) ArticleMathSciNetGoogle Scholar
  8. H. Ji, Y. Li, Reusing random walks in Monte Carlo methods for linear systems, in Proceedings of International Conference on Computational Science, vol. 9 (2012), pp. 383–392 Google Scholar
  9. H. Ji, M. Mascagni, Y. Li, Convergence analysis of Markov chain Monte Carlo linear solvers using Ulam–von Neumann Algorithm. SIAM J. Numer. Anal. 51, 2107–2122 (2013) ArticleMathSciNetMATHGoogle Scholar
  10. I.T. Dimov, B. Philippe, A. Karaivanova, C. Weihrauch, Robustness and applicability of Markov chain Monte Carlo algorithms for eigenvalue problems. Appl. Math. Model. 32, 1511–1529 (2008) ArticleMathSciNetMATHGoogle Scholar
  11. A. Weber, SIPI image database, The USC-SIPI Image Database, Signal & Image Processing Institute, Department of Electrical Engineering, Viterbi School of Engineering, Univ. of Southern California (2012), http://sipi.usc.edu/database
  12. J.F. Cai, E.J. Candès, Z. Shen, A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20, 1956–1982 (2010) ArticleMathSciNetMATHGoogle Scholar
  13. P. Drineas, R. Kannan, M.W. Mahoney, Fast Monte Carlo algorithms for matrices I: approximating matrix multiplication. SIAM J. Comput. 36, 132–157 (2006) ArticleMathSciNetMATHGoogle Scholar
  14. S. Eriksson-Bique, M. Solbrig, M. Stefanelli, S. Warkentin, R. Abbey, I.C. Ipsen, Importance sampling for a Monte Carlo matrix multiplication algorithm, with application to information retrieval. SIAM J. Sci. Comput. 33, 1689–1706 (2011) ArticleMathSciNetMATHGoogle Scholar
  15. G.H. Golub, C.F. Van Loan, Matrix Computations (Johns Hopkins University Press, Baltimore, 2012) MATHGoogle Scholar
  16. N. Halko, P.G. Martinsson, J.A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53, 217–288 (2011) ArticleMathSciNetMATHGoogle Scholar
  17. M.W. Mahoney, Randomized algorithms for matrices and data. Found. Trends Mach. Learn. 3, 123–224 (2011) MATHGoogle Scholar
  18. H. Ji, Y. Li, GPU accelerated randomized singular value decomposition and its application in image compression, in Proceedings of Modeling, Simulation, and Visualization Student Capstone Conference, Sulfolk (2014) Google Scholar

Acknowledgements

This work is partially supported by NSF grant 1066471 for Yaohang Li and Hao Ji acknowledges support from ODU Modeling and Simulation Fellowship.

Author information

Authors and Affiliations

  1. Department of Computer Science, Old Dominion University, Norfolk, VA, 23529, USA Hao Ji & Yaohang Li
  1. Hao Ji