TLDR Takeaways:

Don't sit in the car nearest the engine. I measured 10% less volatility at the car furthest from the engine.

The local train that makes all the stops has about 45% less volatility side-to-side than the express train (fastest).

There are quite a few large jolts, and the largest ones are reliably in the side-to-side direction.

You can use similar robust quantitative techniques for analyzing the volatility of train vibrations as used in financial markets and other turbulent systems.

It's important to use dynamic measures in a dynamic system. I often see the mistake of using a static value for mean or kurtosis, unaware of how it changes over time.

Watch out for nonlinear change. Dynamic kurtosis and meta-distributions hold clues.

Listening and observing is foundational for this type of work. The proper statistical techniques emerge from first principles analysis of the probability distribution and minimum assumptions.

The Caltrain bounces up and down quite a bit as it rolls down the tracks. Not a lot of people are thrilled with the the roughness of the ride, especially during rush hour. I don’t mind it. It gives my accelerometer something to do.

An accelerometer measures how much something shakes. My accelerometer rode a rush hour bullet train and took 250 samples per second measuring vibrations side to side, up and down and forward and backward.

I was curious to do this because I spend a fair amount of time probing dynamic systems with time-series data and tools from quantitative finance, fluid dynamics, music theory, and complex systems. I like rhythm and vibration and I trade financial market volatility at sha.capital. There's chaos and complexity all around us, and volatility catches my attention. A few days ago I experienced a particularly volatile train ride and thought about analyzing the train's vibrations with a similar premise as a financial market volatility analysis. Both are matters of changes in vibrations, just different mediums.

## Vibration is Error

For most mechanical machines, vibration is error. It's waste and bad user-experience. Each vibration of the train on the tracks is a miniature derailment. Small vibrations come from the engine and internal passenger movement, but the contribution is negligible compared to the metal on metal rotational vibrations.

One characteristic of nonlinear dynamic systems is the occasional big change, or jump in volatility. What is the nature of the Caltrain turbulence? Do passengers get jolted more side-to-side or up-down? Proper functioning (staying on the tracks) implies an equilibrium system, so a stable mean should exist in the up-down dimension, but I'll need to look at the stability of the mean in the side-to-side dimension due to the track's curves.

A brief volatility analysis will answer these questions.

Rather than relying on assumptions up front, the proper methodology for analyzing volatility can only emerge once we have a good sense of the basic nature of the system's dynamics.

The first step is to start macro and inspect the distribution of data and get a sense of its to tail thickness. This tells us if we are in a linear or nonlinear paradigm, pointing us to the appropriate set of statistical and existential heuristics. I like starting all data analyses first with visual inspection. Then we'll look at distribution characteristics like kurtosis and central tendency and how they change over time. This is a dynamic system, so every measurement and statistical technique should respect the dynamism. It would be improper to use static, non-robust tools like mean or standard deviation in a non-Gaussian context. Instead, we'll be asking questions that can only be answered by looking at the dynamic kurtosis, stability of central tendency, volatility of volatility, and meta-probability distributions.

Let's begin with the data set. The data represents vibrations measured 250 times per second during a train ride starting from rest and acceleration to ~65mph for ~10 minutes before slowing and stopping. And this was during rush hour with near maximum bodies on board. Here’s the acceleration data (measured in ±g) and probability distribution for 182k total samples:

The distributions are quite similar in structure. Far from the normal distribution of a random process, we see an extremely pointed peak due to the high frequency of small vibrations. This makes intuitive sense. The high peak for up-down vibrations means small vibrations are more frequent compared with the other dimensions. Vibrations in the up-down dimension are on top of the 1g of force from gravity already present. While large deviations still occur, gravity dampens up-down vibrations, meaning more vibrations occur closer to the central tendency. These maximum deviations are constrained by things like max speed, and passengers rarely experience roughness of more than 1g (or 2g total vertically).

How do we measure volatility? This is a surprisingly deep question, depending on the system in question and how we think about change. Volatility is a measure of change, and there are many techniques. Acceleration, for example, measures change in velocity. If we go one layer deeper and look at the volatility of volatility, or how volatility changes over time, we can get a sense of shocks or impulses, which is where most passenger discomfort comes from.

What exactly are we trying to measure? Change. Change from what? From the central tendency. What's the central tendency? That's where we must start to reason from. We don't want to make any inaccurate assumptions about the data and end up using the wrong statistical techniques (cough, standard deviation).

The first check is for central tendency and stability.

The mean of observations in the Up-Down dimension slowly begins to stabilize after 50,000 observations, or around 3 minutes of data collection.

Notice how unstable the mean of vibrations is in the side-to-side dimension. It's not exactly a "tendency" if it's always meandering. Said another way, the volatility and skew of the mean itself is too high to be a reliable measure of central tendency. With this knowledge we can throw out any measure of volatility that assumes a stable, defined mean, such as standard deviation or mean deviation.

## Hello median

When a non-normal data set contains large deviations that destabilize the mean, median is the better estimator of central tendency. It's a more stable and robust measure, meaning extreme deviations don't cause it to wander very far.

Here's are dynamic measure of central tendency. Notice the difference in stability between the mean and median:

Conclusion: the median is much more stable than the mean for this data set. We'll therefore use median for estimating central tendency, and the robust median absolute deviation for estimating volatility.

Volatility of Vibrations using Median Absolute Deviation:

Up-Down: 0.085

Side-to-Side: 0.113

Front-Back: 0.088

The majority of vibrations felt on the train are side-to-side. They are ~30% more volatile.

## Meta-Distribution & Volatility of Volatility 🧐

Anything that changes has an associated measure of that change, or volatility. Volatility itself changes over time, and this can be measured by looking at the volatility of volatility. It's the next higher derivative, or the rate of change of acceleration. It's also the third derivative of position. In physics it's called a jerk, jolt, or jump.

It's important to look at volatility of volatility because we care more about how the train is jerking than the the smaller, ambient vibrations. The real action is in the jolts. We experience them quite unpleasantly, as anyone familiar with whiplash can attest.

Kurtosis is a statistical parameter that describes the shape of the tails of a particular distribution of data. High kurtosis describes the presence of impacts/shocks. It's defined as the 4th statistical moment of the vibration signal, which in this case, is acceleration data. Mechanical engineers often use kurtosis for fault identification in vibratory systems, so it makes sense to think about mechanical train vibration in terms of error. And kurtosis itself varies over time nonlinearly, so we have to analyze it dynamically.

Let's compute the volatility of volatility (changes in acceleration) and dynamic kurtosis to give us a sense of the jolts.

Again, we see that side-to-side is where we get hit with the largest jolts. 4 out of 5 of the largest deviations are side-to-side jerks. This holds for other trips I've measured as well. We can see that kurtosis generally increases as we take more samples from the system, and often jumps nonlinearly (these jumps are responsible for destabilizing the mean).

The good news is while the g-forces are greater side-to-side that up and down, the human body is better at absorbing g-forces perpendicular to the spine.

Takeaways

Don't sit in the car nearest the engine. I measured 10% less volatility at the car furthest from the engine.

The local train that makes all the stops has about 45% less volatility side-to-side than the express train (fastest).

Looking at data from a few different trips, Caltrain does a good job subjecting passengers to no more than 1g of roughness in any direction.

There are quite a few large jolts, and the largest ones are reliably in the side-to-side direction.

You can use similar robust quantitative techniques for analyzing the volatility of train vibrations as used in financial markets and other turbulent systems.

It's important to use dynamic measures in a dynamic system. I often see the mistake of using a static value for mean or kurtosis, unaware of how it changes over time.

Watch out for nonlinear change. Kurtosis and meta-distribution hold clues.

Vibration fatigue applies to human passengers as well as machines.

Magnetic levitation trains is the gold standard here because of the total removal of mechanical friction (coincidentally, I worked on a small scale maglev train for my electrical engineering senior design project). Unfortunately, maglev doesn't appear to be part of California's plans.

Listening and observing is foundational for this type of work. The proper statistical techniques emerge from first principles analysis of the probability distribution and minimum assumptions.

If you notice any errors, please let me know!

## コメント