Follow me on Twitter!


Monday, June 25, 2012

Are certain events "better" than others?

Today's post will go in a slightly different direction than the previous three. Instead of focusing on the athletes, I'd like to focus on the events themselves and try to address the question (in the context of CrossFit competitions): "Are certain events better than others?"

Intuitively, I think most would agree the answer is yes. An event like, say, "Fran"will almost certainly test a person's fitness better than something like a competition to hit the longest drive with a golf ball. One way to look at this is using the purely CrossFit definition of fitness: of the 10 general physical skills (http://library.crossfit.com/free/pdf/CFJ_Trial_04_2012.pdf), I would say Fran hits on just about all of them (except maybe accuracy, balance and agility), while a long-drive competition hits on maybe three (accuracy, coordination and speed).

But this gets tricky to prove which events are better than other simply using the 10 general physical skills. Just in the above example, there is some wiggle room in saying which skills are tested even by those two events. So let me propose another definition of what makes a good test of fitness: a good CrossFit event will provide a strong indication of an athlete's ability to perform well in a wide variety of OTHER tests. To help explain, consider this example:


My contention here (and most CrossFitters would probably agree) is that "Elizabeth" is a better test of fitness than either a 5K run or a max bench press. For one, it tests more of the 10 physical skills, but I believe it does a better job of indicating which athletes would perform well in a wide variety of other tests. In this example, I assumed that there is generally no correlation between running a 5K and bench pressing. As such, I assigned random rankings to those events. However, a person who is strong on "Elizabeth" will probably do fairly well at both a 5K and a max bench press. So in this example, the person who has the best rank combined on both the 5K and bench press also has the top rank on Elizabeth.

This is an extreme example, and obviously I have rigged it, but it gives us an idea of how we can get a feel for which other events are good tests of fitness. The way we can do this is to look at the correlation between an athlete's finish on one event and their combined finish on a variety of other events. In the above example, the correlation between "Elizabeth" and the combined ranks on the other two is 98%. The correlation for the 5K run vs. the other two is 0%, and for the bench press vs. the other two it is -8%. What this says is that the bench and the 5K don't tell us much as much as "Elizabeth." All we need to test is "Elizabeth," because that tells us just as much as testing all three events. Note that we are talking purely about TESTING fitness here, not training for it. It might be the case that the 5K is worthwhile in training, but not as much in testing.

So on this theoretical basis, I decided to look at the events we have seen thus far in 2012. Like I did in my first analysis, I limited the field to athletes who completed all 6 events at regionals, which gives us a sample of about 250 men and 250 women. I used my adjusted regional results (see first two posts), and my measure of how well an athlete did on each event was simply the rank*. For each event, I looked at the correlation between an athlete's rank and his/her combined rank on the other 10 events.

Let's start with a visual representation. Here is a scatter plot of the men's Open WOD 3 ranks (x-axis) vs. the combined ranks on all other events.


It is fairly clear from this plot that a better rank on Open Event 3 (further left) corresponds to better results on the other events. Now let's look at the same scatter plot for men's Regional WOD 1 ("Diane").


Whoa. While there does appear to be some weak correlation, it's pretty clear that the results from "Diane" don't do much to predict how an athlete will do on the other events. I don't find this particularly surprising. For years, we have seen otherwise solid athletes struggle with handstand push-ups, and at the elite level, there is just no way to make up any ground on the deadlifts at a weight as light as 225. So basically we are testing handstand push-ups, which do tell us something about an athlete's overall fitness, but not much - certainly not as much as we can learn by testing 18 minutes of box jumps, medium load push press and toes-to-bar.

So which how do the events stack up in terms of correlation**? Well, here are the results, with women first and men second:



Well, would you look at that? Men's Open Event 3 had the highest correlation and Men's Regional Event 1 had the lowest. You'd almost think I chose those two graphs on purpose. It is clear, though, that for both men and women, Open Event 3, Regional Event 4 and Regional Event 2 were strong predictors of success across the board, while Regional Event 3 and Open Event 1 did not tell us as much. Regional Event 1 did have a somewhat higher correlation for women than for men, possibly because the event was not so blazing fast.

We can see another trend from this chart as well: events with more movements tend to be better predictors of overall fitness. While this is not surprising, I think it is an important point. Single-modality events simply do not tell us as much about an athlete as a couplet, triplet or chipper***. I do not believe we should eliminate them from competitions for this reason, but I do think that there should be some consideration to weighting these events less heavily. The Games struck a good balance last year, in my opinion, by grouping the single-modality events together into "Skills Tests," which didn't put as much weight on any one of those movements. I think giving a max effort snatch or an extremely heavy dumbbell snatch the same weight as something like Regional Event 4 may not be appropriate (somewhere, Chris Spealler is nodding his head right now).

This is certainly not a topic with one absolute right or wrong answer. I would be very interested to see other opinions, not only on what I have done, but also on what defines a "good" CrossFit event.

*Note: I also looked at this another way, which was to give each athlete a score on each event that was equal to the percentage of work done relative to the overall top score/time. For now, I will ignore those results because they are generally the same as these.
**To give some perspective to what these correlations mean, you can square the values to get the "r-squared." Men's Open Event 3, for instance, has an r-squared of 56%, while Men's Regional Event 1 has an r-squared of 20%. One rough interpretation of the r-squared is that it tells you how much of the variance in the other events' scores is explained by the event we are using as a predictor. So Men's Open Event 3 explains about 56% of the variance in the other events' scores.

***Yes, Regional Event 5 actually had two movements, but the double-unders didn't have much of an impact other than as a tiebreaker. You could also argue that Regional Event 3 was basically only one movement, too, since the impact of the running was negligible for most athletes.


Sunday, June 17, 2012

So who CAN win the CrossFit Games?

Last week, I did some work to evaluate the regional performances of all of the Games Qualifiers. I tried to make this as fair as possible by adjusting the scores in each event based on the week of competition. You can see my previous two posts for the details.

This work produced some intriguing results, but in one respect, the results were not surprising at all: the champions of this analysis were Rich Froning and Annie Thorisdottir, the reigning champions and most people's choice to win the Games again this year. For Rich, the margin was fairly wide, but the race was a bit tighter on the women's side. Still, we know that the CrossFit Games are all about the unknown and unknowable, which means new events that will surely shake up those regional standings. Intuitively, we know other athletes have a "chance" to win the Games. But who really has a decent shot, and how good of a shot do they have?

On the CrossFit Games Update the past two weeks, Pat Sherwood, Rory McKernan and Miranda Oldroyd have thrown out some other names that they expect to be in the mix. There's no question that the athletes mentioned have a shot at dethroning Rich and Annie, but let's see if we can use the data we have available to back up those predictions (and maybe add some more of our own).

The basic concept of my work here was to take the results from this year's Regional and Open to "simulate" the Games. This is similar to Monte Carlo Simulation for those who are familiar with that technique. Here's how my system works:

We assume the Games have 10 events, as they did last year. For each simulated event, we randomly choose one of the events that has already occurred this year (either from the Regionals or the Open) and use the known results. We do this for all 10 events, add up the point totals and crown a winner. We then repeat this process 1000 times, which allows the elements of random chance to take effect and give us some rough probabilities of each athlete winning.

That's the basic idea. One item worth noting is that in my system, I allow events to be selected multiple times. For instance, it is possible that Regional Event 2 might be chosen three times, or it might not be chosen at all. The reason is that these previous results are meant to represent theoretical "events" at the Games. I'm not saying that at the Games, they will have three separate events that are identical to event to Regional Event 2. What I'm saying is that there could be three separate events at the Games that produce similar results. Also, if we didn't allow events to be picked twice, well then we'd only leave out one event each time (11 events in Regionals and Open combined, 10 events at the Games), and our results would be pretty dull.

I should also note that for the Regional results, I am using my adjusted results that I mentioned earlier. I feel that makes this as fair as possible.

Here's an example of one simulation of the Women's Competition:


OK, so what about the results? Well, if we run the simulation allowing all Regional and Open events to be selected, here's what happened: [Updated 6/19/2012 - Used updated results from regionals (see first post) and re-ran simulations. Results are largely the same but some values have shifted a bit.]

Men's winner: Rich Froning (886 times), Neal Maddox (68), Dan Bailey (41), Jason Khalipa (1) and Ben Smith (1)

Women's winner: Julie Foucher (439 times), Annie Thorisdottir (277), Kristan Clever (227), Camille Leblanc-Bazinet (35), Azadeh Boroumand (16) and Michelle Letendre (6)

Rich Froning, you are killing me. I'm trying to make this interesting and there you go winning almost 89% of the time.

However, I don't feel like our work is done here. Should we really be including events from the Open? We know some of these athletes put more effort into the Open events, while others focused on their Regional training and just did what was necessary to qualify. Plus there is the issue of inconsistent judging, as well as the fact that the Open will be 4-5 months old by the time the Games roll around. So what if we simulated the Games, but only using the Regional results? Well...

Men's winner: Rich Froning (702 times), Dan Bailey (152), Neal Maddox (101), Jason Khalipa (33) and Ben Smith (12)

Women's winner: Annie Thorisdottir (653 times), Azadeh Boroumand (207), Julie Foucher (104), Michelle Letendre (22), Kristan Clever (6), Camille Leblanc-Bazinet (6) and Elizabeth Akinwale (2)

Well, it's becoming pretty evident that the women's race is wide open. Even as the clear favorite under this scenario, there were 59 instances in which Annie didn't even finish on the podium. That only happened 31 times to Rich in this scenario (only 3 times in the previous scenario).

Let's try one more thing. Although I don't feel that the Open events are that indicative of the Games results, I'd rather not discount them entirely. After all, they hit on some movements that weren't represented as well the Regionals, such as burpees, box jumps and thrusters. So my not-exactly-scientific solution is to simulate 8 of 10 Games events using only the Regional results, then use the Open results for the other two Games events, but we'll treat them like the "skills tests" last year. For each "skills test," we'll draw three Open events, sum up the results and then rank everyone to get one score for the event.

And the results...

Men's winner: Rich Froning (765 times), Neal Maddox (121), Dan Bailey (106), Jason Khalipa (7) and Ben Smith (1)

Women's winner: Annie Thorisdottir (581 times), Julie Foucher (282), Kristan Clever (61), Azadeh Boroumand (48), Camille Leblanc-Bazinet (24), Michelle Letendre (2) and Lindsay Valenzuela (2)

So you take your pick on which simulation you feel is most appropriate. Of course, there are other ways to do this that might have their own merits.

A couple other random observations before we go: [Updated 6/19/2012 - Slight updates here due to re-running the simulations.]

I find it interesting that Jason Khalipa, despite being second in my Regional comparison, does not win often in these simulations. The reason, it seems, is that as good as Jason is, he is rarely better in any given event than Rich. He beat Rich in two events at Regionals, but not once in the Open. However, in these simulations, he was almost always on the podium (about 80% of the time). The takeaway here is that to be competitive in the Games, you have to be consistent.  But to WIN, you have to be absolutely great in a handful of events, and even if you may have a hole or two in your game somewhere, it is possible (but not likely) that it won't be exposed too badly.

Valerie Voboril was 15th in the Regional comparison, but she finished third overall nine times in the first scenario and two times in my last scenario. She was the lowest finisher to make the podium in the final simulation. In my first simulation, Denae Brown (32nd in regionals) pulled off a stunner and finished third once.

The men's race was more stable. In the last two scenarios, no one outside the top 5 finishers from the Regional Comparison finished on the podium. Kenneth Leverich (10th) finished third 12 times and Chase Daniels (18th) finished third once in the first scenario.

Please remember that this is all in fun. I'm not by any means saying that the athletes who don't win any of the 1,000 simulations CANNOT win the Games. There are so many variables that it's going to be very tough to predict the results of the Games based on Regional results only. But for one of those other athletes to do it, the fact is they WILL have to compete at a level higher than they have so far. Good luck to you all!

Monday, June 11, 2012

Results of the fairer Regional comparison

Let's get right to it. The ultimate goal of this analysis was not simply to quantify the impact of a week's preparation on the result in a workout. The ultimate goal is to crown a true champion.

Although I don't necessarily consider these results as a "prediction" for the Games results, I think the ideal way to compare these athletes is with the CrossFit Games scoring system. That means assigning 1 point per place in each event and assigning the final ranking based on the sum of those points. It also means ignoring all athletes who did not qualify for the Games. Those other athletes helped us to get to this point, but now it is time to leave them behind.

As mentioned in the previous post, we scaled each athlete's scores based on his/her week of competition. Now, we can compare these times/scores and crown our champion. Note that I did NOT make cuts after event 5. I'm not a big fan of cuts in the first place, and the fact is all of these people completed that workout, so in my opinion, they should all get credit for it. First, for the women. [Update 6/19/2012: Slight changes due to minor calculation error earlier. No athlete moved more than one spot up or down. Julie Foucher did move into a tie for second after being 2 points behind Azadeh Boroumand earlier.]




Due to the size of the table, it's a bit tricky for me to post all the details. I will look into a way to get them up here. In the meantime, here are some observations that may or may not be apparent on the chart.

1) No surprise at the top. Despite competing in week 5, the adjustments did not bring Annie's times back to Earth enough for anyone to catch her. It got a lot closer, though. Without the adjustments, the results, she would have been a full 20 points ahead of second place (in that case, Julie Foucher would have been second).
2) Along those lines, Azadeh Boroumand (competed in week 1), was one of the biggest winners here. She jumped from fourth to second thanks to the adjustments, cutting her point total from 53 to 40.
3) The biggest positive movers because of the adjustments were Talayna Fortunato (15th to 10th) and Candace Ruiz (29th to 24th).
4) The biggest negative mover was Cheryl Brost (17th to 23rd). Sorry, Cheryl.
5) A few world records changed hands. Kristan Clever took back her record on Event 1, Boroumand took back her record on Event 2, Lindsay Valenzuela took back her record on Event 3. No one messed with Julie Foucher's record on Event 4, though. Straight beast mode there.
*Note that no scores from Event 5 and 6 were impacted except Canada East getting added time on Event 6. I believe HQ didn't count Michele Letendre's record on Event 6 anyway.

OK, on to the men. [Update 6/19/2012: Slight changes due to minor calculation error earlier. No athlete moved more than one spot up or down.]



Observations:
1) Rich Froning didn't really make things interesting. He dominated despite competing in Week 3. I was halfway hoping for him or Annie to be displaced to make this analysis a bit more intriguing, but it just wasn't happening. The lesson: if there is a way to bet on the CrossFit Games, do NOT bet against Rich Froning.
2) No shifts at the very top due to the weekly adjustments. The gap got a bit more narrow between Dan Bailey (Week 3) and Jason Khalipa (Week 4), but not enough for Bailey to displace him.
3) The big winners here were Matt Chan, Brandon Phillips and Patrick Burke. Chan (Week 2) moved up from 14th to 7th, Phillips (Week 1) moved from 28th to 19th and Burke (Week 2) moved from 31st to 21st.
4) The losers, unsurprisingly, were from week 5. Austin Stack dropped from 22nd to 32nd, Spencer Hendel dropped from 8th to 17th and Frederik Aegidus dropped from 25th to 33rd.
5) World records? Well, Dan Bailey slipped past Neal Maddox for the record on Event 3, but that was it. Guido Trinidad moved to within a second of regaining the Event 4 record from Rich Froning, but could not quite do it. Again, do NOT bet against Rich Froning.

And there you have it. If anyone is interested in reviewing my work, I can send you the full Excel workbook used to create this. And again, I'll look into a way to get those detailed final results for both men and women. There have to be some other nerds out there like me with an interest in that.

I also welcome any feedback on this. Like I said, these are not my predictions for the Games, just an idea of who truly had the BEST regional performances. Maybe we'll get to the predictions in a later post.





A fairer Regional comparison

Welcome to CFG Analysis. I'll keep the intro short today (perhaps I'll save a more long-winded version for another day) and just say that I'm a CrossFit athlete as well as a bit of a math nerd (I get paid to do it, so at least someone thinks I have some credibility). I've been intrigued in the statistical side of CrossFit since day 1, the constant and meticulous record-keeping for every workout and the frequent monitoring of the progress over days, months and years. The CrossFit Games has taken this to another level.

I got the idea to start this blog after looking at the 2012 Regional Comparison featured on the CrossFit Games Update Show last week (http://scores.loadtimerounds.com/static/g12/regional-comparison.html?file=regional-women.json - great work that I truly appreciate). The comparison was interesting for sure, but in my opinion, it didn't tell the whole story. What was missing, at least in my mind, was some way to account for the advantage certain regions have over others because of the staggered schedule. Anecdotally, we heard accounts of athletes shaving minutes off their times in certain workouts over the course of several weeks of preparation, and as an athlete myself, I knew that the additional time for the later regions undoubtedly improved those scores. But to what degree?

I began my analysis by taking the results for the top 16 finishers from each region and comparing the averages across regions. A couple of notes here - first, I limited this to the top 16 because a) these were the only athletes to complete Event 6 and b) this reduced the impact of outliers at the low end of the competition; second, I disregarded Asia, Africa and Latin America from this portion of the analysis because the level of competition in those regions is simply not on par with the rest of the world at this point.

Additionally, I accounted for the time cap by adding 10 seconds for each missed rep on events 2 and 6 and 5 seconds for each missed rep on event 4. I did this because the 1 second per rep that was initially on the data would skew any averages I did. In reality, a finish of 17:10 on Event 6 meant the athlete had 6 burpee-box jumps, 1 farmers carry and 3 muscle-ups left - this would take far more than 10 seconds (perhaps even more than the 100 seconds I assumed). Also, I made one final adjustment to the Canada East women's Event 6, adding 60 seconds to all scores, because the bars were loaded with only 205 pounds for the deadlifts instead of 225 (http://games.crossfit.com/video/event-summary-canada-east-womens-workout-6).

OK, at this point, some trends started to emerge. Below are the average weekly scores for each workout in the men's division.


Although not drastic in the chart, it is clear that for most of the workouts, particularly 1-4, the times to tend to decrease (keep in mind that Event 5 is the only one where a higher score is an improvement). For instance, the average time in Event 1 for week 1 (for my sample) was 4:51, but by week 5 that average had dropped to 4:07. Initially, it seemed that a simple linear regression on each event might give us an idea of the effect of the additional weeks on the scores.

However, there were a couple of issues with this. First, the pattern is likely not truly linear. It is not reasonable to assume that athletes continue to cut time on a workout at the same rate for weeks on end. More likely, the improvement happens more rapidly early on and diminish as the weeks progress - otherwise, we'd be seeing times of 0:00 on these workouts eventually. A trickier issue, however, was the fact that the athletes comprising the regions themselves were not all equal. Clearly, I had to account for this in order to get a meaningful result.

My solution was perhaps not optimal, but it was relatively simple, not too time-consuming, and in my opinion, go the job done. For each region, I looked at the number of the top 16 finishers who were in the top 180 Worldwide in the open. This gave me some idea of the strength of each region, and the results generally jived with what we believe to be true: the most difficult men's regions were Mid Atlantic (75% of top 16 in the top 180 in the Open), Southern California (69%), Central East (63%), NorCal (63%) and Northeast (63%). The weakest (excluding Asia, Africa and Latin America) were Canada West (13%) and Europe (19%). A plot of the Event 1 men's scores compared with my metric for regional strength is below.


Now that I had a metric to control for regional strength, I decided to do a multivariate linear regression on each event, with region strength and week of competition as my two independent variables and average score on the event as the dependent variable. Although I do not believe a linear model is ideal (as discussed above), it was the simplest option here and because we are not projecting out into the future, it wouldn't produce any unreasonable results. The results, I believe, were in line with what we would expect. For almost every event across the men's and women's competition, the coefficient for the region strength indicated that it was positively correlated with an improved average score (except men's event 3, where the effect was basically negligible). But what we really care about is the effect of the week of competition on the scores, because we can use that to get a fairer comparison between athletes in different regions. Below are the coefficients for week of competition (-1.00 means each week caused a drop of 0:01 in the time or an increase of 1 lb. in the snatch).

Men
Event 1 -10.50, Event 2 -11.86, Event 3 -10.78, Event 4 -18.04, Event 5 +.13, Event 6 -1.1

Women

Event 1 -10.66, Event 2 -10.75, Event 3 -16.11, Event 4 -7.90, Event 5 -.33, Event 6 +37.69



For both men and women, I assumed the true coefficients for Events 5 and 6 were 0. The women's result was especially curious, but the extremely wide range of results on that workout (often because women struggled at muscle-ups) probably threw things off. It's probably also fair to assume that no significant snatching strength can be gained in 4 weeks, especially for athletes of this caliber.

Now it was time to adjust each athlete's results to compensate for their week of competition. At this point, I decided that adding or subtracting a flat amount per week was not fair, because athletes are more likely to improve as a percentage of their starting time/score. For instance, Dan Bailey is not gaining 11 seconds per week when he has a Diane time of under 2:00. To fix this, I took my coefficients and divided them by the average score for each event for all competitors in my analysis. To apply these percentages, for each athlete, I would take the week of competition, subtract 3 (the midpoint) and multiply that number by the percentage impact per week for that event. That would be used to adjust the athletes score. The final percentage impacts I used were:


Men
Event 1 -4.4%, Event 2 -1.3%, Event 3 -3.7%, Event 4 -1.4%, Event 5 +0%, Event 6 +0%

Women
Event 1 -3.7%, Event 2 -1.2%, Event 3 -3.4%, Event 4 -0.5%, Event 5 +0%, Event 6 +0%

Whew... This is getting long. We'll finish up the analysis (with your final leaderboard) in the next post.