Blogs

Posts

Jun 19, 2025
Bayesian Parameter Estimation: Laplace Approximation, MCMC, and Variational Inference
In this note, we estimate distribution parameters from observed data using the posterior distribution. Essentially, the posterior contains all the information about the distribution parameters. There are various methods to infer these parameters. The exact distribution can be numerically estimated with Markov Chain Monte Carlo (MCMC) sampling. However, MCMC can be computationally intensive for large problems. Instead, approximation methods are used. One such method, called Laplace’s Approximation, approximates the distribution with a multivariate normal distribution. Another method, known as variance inference (VI), approximates the posterior with a simpler distribution that is optimized to be as close as possible to the true posterior.
Mar 14, 2025
Estimating True Functional Dependencies Between Observed Variables with Known Causal Relationships
This post presents a method to estimate the true functional dependencies between observed variables, assuming the causal relationships between them are known. Using a two-step regression approach, we show how to isolate and quantify the sensitivity of a target variable to its predictors, even when these predictors are interdependent. We demonstrate this approach through three models that vary in the complexity of their functional relationships, providing both linear and nonlinear examples. The results indicate that the method is effective in recovering the true relationships, even in the presence of complex dependencies.
Feb 7, 2025
Mean and Variance Network
We use a feedforward neural network to analyze heteroscedastic data in nonlinear regression. The network is trained in two steps, each using a different loss function, to estimate the dependent variable’s mean and variance.
Dec 7, 2024
Building A LLM From Scratch with Wolfram Language
Inspired by Raschka’s book on building a large language model from scratch, we implemented our own version using the Wolfram Language (WL). We opted for WL over Python due to its higher level of abstraction, rich set of built-in functions, and powerful computational capabilities, which streamline the development process.
Dec 1, 2024
Book Summary and Review: A Thousand Brains
A Thousand Brains: A New Theory of Intelligence
Nov 9, 2024
Effect of Attention on Text Classification Performance
In this post, we continue from our previous study on text classification using a recurrent neural network (RNN) model. In that study, we explored how different Byte-Pair Encoding (BPE) settings and RNN architectures impact classification performance. We found that BPE configurations significantly influenced the classification performances. This follow-up study focuses on understanding the effect of incorporating an attention mechanism into the RNN model.
Oct 31, 2024
Word Embedding and Text Classification Performance
This report investigates using a recurrent neural network (RNN) model for classifying sentences extracted from Chinese Wikipedia articles. We evaluate the classification performance across various Byte-Pair Encoding (BPE) settings and RNN architectures, finding that BPE settings significantly influence classification outcomes.
Oct 20, 2024
Power Law Distribution: Word Frequency
In a previous note, we explored one mechanism that leads to power law distributions: the probability of a random walk returning to its starting point for the first time. In this note, we will examine another mechanism that generates power law distribution of the word frequencies within a text.
Oct 7, 2024
Power Law Distribution: First Return Time of a Random Walk
Power law distributions are prevalent in various fields. This note derives the probability of a random walk returning to its starting point for the first time (the first return time), which can be approximated by a power law distribution over sufficiently long time scales. Simulation results closely align with both the exact and approximate probabilities.
Sep 9, 2024
Multiplicative Processes and the Log-normal Distribution
This note derives log-normal distribution from random multiplicative processes and confirms its application through investment simulations. We also examine how dispersion and inequality among investments grow over time and quantify inequality using the Gini index.
Aug 20, 2024
Central Limit Theorem and Cauchy Distribution
This note demonstrates the Central Limit Theorem (CLT) using the Fourier transform of the probability density function (PDF) and emphasizes the requirement that the mean and variance of the random variable must exist. It then contrasts this with the Cauchy distribution as a counter-example, where neither the mean nor the variance is defined, illustrating why the CLT does not hold in this case.
Jun 22, 2024
Biorthogonal Basis and Reproducing Kernels
This note initially explores the concept of a biorthogonal basis in a finite vector space. It subsequently applies a similar methodology to derive the reproducing kernel basis in function spaces, enabling the approximation of functions using their pointwise values and the associated dual basis.
Apr 29, 2024
Variational Autoencoder for CelebA Image Analysis
In this note, we implement a variational autoencoder (VAE) using the neural network framework in Wolfram Language and train it on the CelebFaces Attributes (CelebA) dataset. New images can be generated by sampling from the learned latent space. We explore how the VAE captures and manipulates image features, particularly the concept of attractiveness.
Mar 29, 2024
Comparison of Proportion Tests
This note compares several statistical methods for detecting differences in failure rates between two groups. We explore Fisher’s exact test, the Chi-squared test, and a Bayesian Monte Carlo approach, focusing on their conceptual simplicity, visual interpretability, and insights into uncertainty.
Feb 11, 2024
Unit Root Test in the AR(1) Time Series with Monte Carlo Method
Unit root testing is crucial in determining the stationarity of time series data, especially in autoregressive processes like AR(1). In this note, we explore the effectiveness of the Monte Carlo method in unit root testing for AR(1) processes compared to traditional methods like the Augmented Dickey-Fuller (ADF) test.
Jan 28, 2024
Unveiling Multidimensional Insights: Radviz Projection and Feature Importance in Regression
Radviz projection simplifies the representation of multidimensional data onto a 2D plane. In this note, we delve into the computation of Radviz projections and demonstrate their application in uncovering important features in multivariate regression analysis.
Jan 3, 2024
Numerical Investigation of the Lorenz System
We solve the ordinary differential equations of the Lorenz system to generate time series for future prediction with various models, including XGBoost and deep neural networks. Furthermore, we numerically compute the Lyapunov exponents of the Lorenz system to gain insights into its chaotic behavior.
Nov 28, 2023
Causal Inference By Regression
We use a simple example to illustrate that causal inference by regression is unreliable in realistic cases where measurement noises are present.
Oct 31, 2023
Simple Derivation and Intuitive Understanding of Independence Test Using HSIC
In this article, we will derive the HSIC formula in a clear and straightforward manner. We will also explore how to estimate the statistical significance using bootstrap sampling and gain an intuitive understanding of why mapping data into a feature space is crucial for independence testing.
Oct 26, 2023
Understanding Kernel Principal Component Analysis (Kernel PCA)
Kernel Principal Component Analysis (Kernel PCA) is a powerful technique used in machine learning for dimensionality reduction. It allows us to perform principal component analysis on data that has been nonlinearly mapped to a higher-dimensional feature space. This article will provide a step-by-step derivation of the Kernel PCA formula, followed by an illustrative example to showcase its practical application. We will also compare our results with explicit mapping in feature space and the Kernel PCA implementation in Scikit-Learn.
Sep 15, 2023
Encoding Rotated Images with Autoencoder
In this study, we explore the application of PyTorch-based autoencoders, featuring convolutional layers, to encode images from the Fashion-MNIST dataset. Our autoencoder effectively encodes and decodes original images. However, a notable challenge arises when we feed rotated images into the model. These rotated images are often decoded incorrectly and classified into different categories. We address this issue by training the model on a combination of original and randomly rotated images, enabling it to decode rotated input correctly.
Jun 27, 2023
Regression Uncertainty Estimation with Conformal Prediction
In this note, we estimate the regression prediction intervals using various conformal prediction methods. The regression model employed is a Gaussian Process regressor. We compare the confidence intervals generated by conformal prediction and Gaussian Process regression. Without conformal prediction, it is crucial to accurately estimate the variances of the observation noise and the predicted mean. This necessitates optimizing the kernel parameters and the noise variance to maximize the marginal likelihood. However, with conformal prediction, such a requirement is no longer necessary.
Jun 13, 2023
Positive Definiteness of Kernels
This post summarizes the proof demonstrating the positive definiteness of the multivariate squared exponential kernel (radial basis function) and exponential kernel. The proofs primarily rely on sources such as (Wendland 2004) and stackexchange.com. Additionally, Python numpy commands are included for numerically testing the positive definiteness of a matrix.
Jun 7, 2023
Singularity in Covariance Matrix in Gaussian Process Regression
This post discusses the issue of singularity in the covariance matrix when performing Gaussian Process regression, particularly when dealing with a large number of training data points, as shown in a previous post. We explore two approaches to handle this numerical problem: adjusting the kernel parameters and introducing jitter to the diagonal of the covariance matrix. Additionally, we evaluate the use of low-rank matrix approximations for the covariance matrix.
May 31, 2023
Gaussian Process Regression
This article discusses Gaussian Process regression, a non-parametric approach for modeling the relationship between input variables and their corresponding outputs. It presents the conditional probability distribution of a multivariate Gaussian and the covariance matrix computation using a kernel function. The article also covers the optimization of kernel hyperparameters to maximize the likelihood function. An implementation of Gaussian Process regression in Python is provided. The article includes examples of noiseless and noisy observation cases and demonstrates the prediction of values with mean and confidence intervals.
May 30, 2023
Conditional Distribution of Multivariate Gaussian Variables: A Simple Derivation
We present a straightforward derivation for calculating the conditional probability distribution of multivariate Gaussian variables.
May 17, 2023
Multivariate Gaussian Distribution As Linear Transformation of Independent Normally Distributed Random Variables
This note explores the relationship between multivariate Gaussian variables and the linear transformation of independent, normally distributed random variables. The main results include the derivation of the probability density function (PDF) for multivariate Gaussian distribution and the recognition that there are infinite linear transformations capable of transforming independent, normally distributed random variables into multivariate Gaussian variables. The report also demonstrates specific methods of constructing these transformations using decomposition techniques such as singular value decomposition and Cholesky decomposition.
Apr 26, 2023
Transformations Corresponding to Kernels
Mercer’s Theorem is a fundamental result in kernel theory. It states that if we have a positive semi-definite kernel that is symmetric, we can find a mapping function $\phi$ that maps the input vector $\mathbf{x}$ to a higher dimensional space such that the dot product of the transformed vectors equals the kernel function $K(\mathbf{x},\mathbf{y})$. This is what is referred to as the kernel trick in support vector machines (SVM).
Mar 21, 2023
Using Regression to Check Variable Dependence in Three Types of Directed Acyclic Graphs
This article explores how regression can be used to determine the dependence between variables in three types of directed acyclic graphs (DAGs): pipe, confounder, and collider. The theoretical analysis of these graphs can be found in the linked blog post.
Mar 20, 2023
Variable Dependence in Three Types of Directed Acyclic Graphs
This note examines the dependence between variables in three types of directed acyclic graphs (DAGs): pipe, confounder, and collider.
Jan 16, 2023
Maximum Likelihood Estimation is An Approximation to Minimization of KL Distance
This is a brief derivation that maximum likelihood estimation approximates the minimization of the Kullback-Leibler (KL) distance.
Dec 29, 2022
Effect of Noise in Data On Regression: Linear Model vs. Neural Network
We have observed that the performance of the linear model for regression is equivalent to or better than more complex nonlinear models like the neural network in cases where the data is noisy. In this note, we compare a linear model and a feed-forward neural network for regression with various amounts of noise in the data.
Nov 30, 2022
Data Smoothing with P-splines: An Implementation with scikit-learn and PyMC
This note uses P-splines (Penalized Splines) for data smoothing. Reducing the difference between the coefficients of spline bases makes the fit smoother. The smoothness control is implemented in two ways: 1) the difference between the coefficients as a regularization term in the least square minimization in scikit-learn; and 2) coefficients as Gaussian random walk in PyMC, a probabilistic programming library.
Jul 18, 2022
Bias in Poetntial Outcomes in Causal Inference
This note summarizes my understanding of the bias in potential outcomes while reading the book Causal Inference: The Mixtape (https://mixtape.scunning.com).
Jun 23, 2022
Derivation of Linear Regression Coefficients and Their Variation with Minimal Matrix Algebra
This is a simple calculation of linear regression coefficients and their variances using covariance and variance with minimal need for matrix algebra. This method can prove the regression anatomy theorem in a straightforward way.
Jun 16, 2022
XGBoost with GPUs and Multi-core CPU
We run XGBoost on a multi-core CPU and GPUs. On the CPU, the speed is maximum with 16 cores and does not improve with more cores. A speed-up of 29% can be obtained with a single GPU than the CPU with 16 cores. Interestingly, we do not observe speed-up from one GPU to two GPUs.
Jun 5, 2022
Hamiltonian Monte Carlo vs. Metropolis
This note compares Metropolis and Hamiltonian Monte Carlo algorithms, using autocorrelation and effective sample size as metrics. Unimodal target distribution is used in this note, and multimodal target distribution will be discussed in a future note.
May 12, 2022
Estimation of Variability From Observed Data: A Bayesian Perspective
The uncertainty of estimates from data is a direct result of the posterior distribution of the model for the data generation. This note uses the Bayesian approach to discuss two cases, one of which explains the bootstrap method.
May 9, 2022
Dirichlet Process in Mixture Model
We use the Dirichlet process to generate the weights in the mixture model to determine the optimal number of components automatically.
Apr 26, 2022
Multivariate Orthogonal Linear Regression Using PyMC
This note describes a multivariate orthogonal linear regression method using the PyMC probabilistic programming package. The formulation is based on an intuitive geometrical interpretation.
Feb 24, 2022
Flag of Ukraine with Matplotlib
In support of Ukraine and the Ukrainian people, I made a Ukrainian flag with matplotlib.
Jan 25, 2022
Logic of Science: Review of Bernoulli's Fallacy
The book is well researched and very lucid. The arguments for the Bayesian approach are convincing. It’s a beneficial book before reading Jaynes’ Probability Theory.
Jan 16, 2022
Numerical Solution to Monty Hall Problem using PyMC
We numerically solved the Monty Hall Problem with PyMC3, a probabilistic programming package in Python. The PyMC code is adapted from Austin Rochford’s Introduction to Probabilistic Programming with PyMC.
Jan 8, 2022
Student's t Mixture Model with PyMC
In this note, we compare the Gaussian mixture model and Student’s-t mixture model for some two-dimensional data with an unbalanced proportion of clusters, as shown in Figure 1. The result demonstrates that the Student’s-t mixture model performs much better.
Dec 31, 2021
A Simple Non-Bayesian Solution to Monty Hall Problem
This short note describes a simple non-Bayesian solution to the Monty Hall problem. A charming Bayesian analysis can be found in the book Bernoulli’s Fallacy.
Dec 20, 2021
Solution to An Example Problem in Bernoulli's Fallacy
In this note, we solve an example problem in the book Bernoulli’s Fallacy using three approaches: (1) maximum likelihood, (2) Bayes’ theorem, and (3) MCMC simulation with PyMC3 package in Python.
Dec 19, 2021
First Blog
Blogging on GitHub Pages is quick to get started. Jekyll, however, seems to require some effort to learn it well.