Saturday, October 8, 2011

A Note about Data

The elevation profile for the Hecker Pass ride

Source Data

The elevation profiles presented here are based on data collected from my Garmin Forerunner 305. This device measures elevation solely through GPS; it doesn't have a barometric altimeter.

My experience with the device is that it gives accurate, absolute results when it has a good view of the sky, and generates random numbers when I'm riding in canyons or (sometimes) along steep ridges. Between these two extremes the readings appear to be realistic, but can be very slow to update and may vary by 100 feet or more.

I'm basing my opinion of the clear-sky results on the readings I get when the absolute elevation is known, such as on mountaintops or at the ocean. I'm basing the canyon observation on the way the value can vary wildly as I slowly ride along steady grades. In some cases, the values can rise and fall so much, so quickly, that the only explanation would be that I was shot out of a cannon.

I use the Garmin Connect web site, which applies some level of correction, apparently through USGS data or similar. The GPX files I'm using to generate these graphs come through that web site, and I'm not sure whether they apply those corrections to the downloaded data or not.

Calculating accumulated climb


A simple approach to calculating the total climb would be to compare each elevation data point with the previous point, and accumulate any positive differences.

There are a couple of problems with that approach. First, the raw data coming from the device is not entirely accurate. Even on a perfectly constant grade the raw data fluctuates up and down, so the naive approach would give you some climb on flat roads. It might even register some climb on a consistent descent. You don't want that.

The second problem is that you really want to count only "meaningful" climbs. This problem isn't in the device, it's in our heads. Even if the measurements from the device were perfect, roads do in fact undulate. If you counted every little rise after every little dip, you would again accumulate climb even on roads that you perceive as flat. You don't want that, either.

To address the first problem, I'm calculating a decaying average of the raw elevation readings from the device, and using that for all subsequent calculations. In my case I'm using a decay level of 10, so:
    ele = (raw + ele*9) / 10.
The elevation graph is actually showing the decaying average, not the raw data, although at this scale the difference is minimal.

To address the second problem, I'm only accumulating climbs when they exceed a threshold of about 50 feet. I have a running "base" value, which is the lowest elevation (using the decaying average) I've seen. Once the elevation climbs 50 feet above the base, I credit the delta and reset the base. Thus if you're riding rollers on the way to the hill, you'll end up getting credited for those little rises... but just once. You will be cheated out of 25 feet (on average) at the top of each climb.

Ultimately this is a subjective measurement, but this approach gives me a result I can accept. If I look at the graph itself and estimate the climb by eye, I normally get to 80-90% of the calculated result, which is probably reasonable. Since it's based solely on the data from the device, if the elevation noise varies too much (more than 50 feet despite the decaying average) then I'll accumulate too much climb. The decaying average removes most of those noisy variations, but on some specific rides the effect has been noticeable.

Calculating Grade


It's easy to calculate the overall grade of a hill, and realistically that's the most important number to me as a rider. But in the interest of teasing some additional information from the data I've collected, I decided to try to calculate the grade at a finer granularity. An instantaneous grade, if possible.

The grade is basically the first derivative (with respect to distance) of noisy elevation data, so it's especially susceptible to bad data. I tried a number of methods to get sensible results. What I settled on was this: every 100 meters of linear travel, I calculate the change in the decaying average elevation and report that as a percentage.

This produces a satisfying result, apart from the rare regions of extremely noisy data. The spikes in grade normally correspond to actual steep ramps, the reported grade of those ramps is believable, and to the extent that you can get a sense of overall grade from the graph, it seems accurate.

The problem with this approach is that it's arbitrary. Two rides over the same road will produce different graphs. They'll be broadly similar, but one point of variation will almost always be in the maximum grade, which is the highest spike in the graph and therefore the most visible result.

The Map


To show a Google map, I start off with a KML file, again downloaded from connect.garmin.com. The raw KML files have thousands of points, but it appears that Google maps will only show about 200 points before starting to break the path into segments. So I needed to find a way to cut down the number of points dramatically while retaining as much of the overall shape as possible.

My first attempt was to try to eliminate intermediate points that didn't matter much. So I took each set of three points (let's call them A, B and C) and determined the angle between AB and AC. If the angle was small, then the B was close to the line AC, and we wouldn't miss it much. That might have worked in principle, but it was hard to debug the angle calculation. I also iterated over the points several times trying to get down to 200, and in later rounds the algorithm would fixate on one segment, gradually straightening it out. Harrumph.

As I was debugging that problem, viewing intermediate results in gnuplot, I finally noticed that just sampling every n'th point gave me much better results than my "smart" approach. It works best when I'm going slowly (which is most of the time), but it does tend to straighten out roads when going quickly (descending, perhaps). Maybe I'll get around to adjusting for speed, but for now simple sampling seems to give reasonable results.

The Tools


I normally upload my ride data to connect.garmin.com, then later pull it back down through the "export" function as GPX and KML files. To generate the graph I run the GPX file through a C++ program, and pump the results through gnuplot. I'd like to use an SVG result, but Blogger wants only bitmap images, so I use the pngcairo terminal.

The gnuplot script I generate looks something like this:
set grid
set y2tics
set xlabel 'Distance (miles)'
set ylabel 'Elevation (feet)'
set y2label 'Grade (%)'
set style line 1 linewidth 2 linecolor rgb '#60b060'
set style line 2 linewidth 1 linecolor rgb '#b06060'
set object 1 rectangle from graph 0, graph 0 to graph 1, graph 1
  behind fillcolor rgb '#f0ffff' fillstyle solid 1.0
set terminal pngcairo size 800,400
plot '-' using ($1*0.62137):(($2 > 0) ? $2 : 0)
         with filledcurves above x1 fill solid 0.6 border linestyle 2
         axis x1y2 title 'Grade (%)',
       '' using ($1*0.62137):($2*3.2808)
         with filledcurves above x1
         fill transparent solid 0.8 border linestyle 1"
         title 'Elevation'
The data is generated by the program, and of course is metric, converted to feet and miles only in this display step. The grade values include negative grades (ie descents), but I mask those off because they made the graph more difficult to read.

To produce the map I run the KML through a different C++ program that samples the points and removes any points near my house (otherwise these maps would all point directly to my house, which I'd prefer not to do). I upload the result to Google Maps using the My Places feature, then generate embedding code for that.

I started with KML because that's the natural format for Google Maps, but at this point I'm using none of the original file except the lat/lon/elevation points, and therefore could just use the GPX for both purposes.

No comments:

Post a Comment