How I Saved Time On Activity Classification Model Data Collection

Dan Beech
6 min readFeb 18, 2021

So you want to make an activity classification model? If you want to streamline the data collection process then read on for a fun way to cut out time that I learned during the journey to creating mine.

Photo by Unsplash on Unsplash

I used CreateML for this, the software is great and easy to use, but data collection can take forever. This method of data collection will apply to any use of time series motion data collection, so feel free to read if you aren’t using the same software.
When I first started the data collection process my plan was to use os_log to output the data from my Apple Watch, process that data, and I’d be done. Easy!
It turns out I needed a much larger set of data than I imagined, and to rub salt in the wound, the pandemic hit and due to an underlying health condition I was confined to my house.
That meant that I couldn’t pass off my Apple Watch to some friends, have them complete samples for me and sit back and relax. I was going to have to do all of them myself; and I really didn’t want to spend weeks spinning my arm around in circles. Yeah, I know the results will be skewed because the model will be trained to recognise my specific movements; but desperate times call for desperate measures and with deadlines slowly but steadily creeping up on me; these were indeed desperate times.

Photo by Julian O'hayon on Unsplash

The New Plan

I needed a way of getting many samples in a short time, so I put on my Watch and analysed where most of my time was lost during the collection.
Most of my time was spent stopping and starting the data collection between samples. So I decided to try getting a load of samples in a row to cut that down.
This was a step in the right direction, but it just wasn’t good enough. Samples were either completed too fast, or too slow and so I increased my processing time. It meant that I was still losing time ensuring that all the samples fit correctly into the prediction windows in CreateML.

That won’t do. I needed to refine the idea.

Photo by Shawn Sim on Unsplash

So I went back to the drawing board; I liked the idea of collecting large sample sets in one session, but wanted to minimise the processing of the data to cut out as much time as possible.
Now anybody who knows me knows that I’m a musician, and I’ve spent way too much time using a metronome over the years.
Metronomes are a drummers best friend (sometimes) and when you’re recording music you get well acquainted with them. For those who don’t know what a metronome is; it clicks at a constant BPM (beats per minute) and there’s one built into Google for you to use.

So I was going to use a metronome to keep my speed at a constant rate, but what about the processing time?

I searched the App Store for an app I could use on my Watch that would provide me with samples in CSV form, and where I could set my own sample rate.

SensorLog to the rescue!

This meant that I could collect the data at the sample rate that I wanted to use, and I could then use the samples per second (Hz) and the beats per minute of the metronome to work out how many rows of my CSV output would correspond to one sample, while using the clicks of the metronome to make sure that I hit my cues for starting and completing an action every time.

So for example if my action takes 1 second to complete, I can set my metronome to 60 BPM (or 120 BPM if you want more intermediary clicks), 60 BPM would equal one click per second. I set my sample rate to whatever Hz I want to feed into CreateML.
I put the metronome in 4/4, and count along with the clicks like this;

1… 2... 3... 4... 1... 2… etc.

I start the action on each 1 I count with the click of the metronome, and aim to finish it in time with when I count to 2, then start an action when I count 3 clicks, and finish that action when I count the 4th and final click.
So basically; start your action on odd numbers, and end it in time with the even numbers.

That means 30 samples per minute each minute you collect data!

You will need to change this to suit your samples and the time it takes to complete them, but it will allow you to keep your samples uniform in collection which means minimal processing!
If, like me, you’re collecting the samples only from yourself; try to vary how you do the action. It’s tough to avoid falling into a rhythm with the metronome clicking away but try to do some actions more pronounced, some more subtle. Do some that complete slightly faster than others, some slightly slower. This will aid the accuracy of your model during training.

Photo by Luke Chesser on Unsplash

Using this method I went from collecting single samples of each action, to collecting multiple samples and processing them to ensure they were all correct… To dropping in CSV files that needed next to no processing whatsoever.
I would set a timer, put on the metronome, start up SensorLog, and spend 10 minutes spinning my arm around in a particular action. When you consider that each action was roughly 1 second to complete; that’s a lot of samples gathered much quicker!

If you trim off any excess rows that occurred before your first sample (why not give yourself a 4 beat count in like a band?), and any rows that occurred between your last sample and you pressing to stop collection; CreateML will chop up the CSV into individual samples using the prediction window; and because they are so uniform it’s very unlikely that there will be any samples that don’t fit perfectly into those windows. To get the prediction windows to line up nicely took a couple of attempts of trial and error but once you know your settings fit your samples- you’re good to go!
I used a prediction window of 100, with a sample rate of 50Hz. This was the best fit for my data and provided the best training results.

The cherry on top is that you can Airdrop the CSV files straight onto your Mac from SensorLog, saving yourself even more time. This is quite a novel way of speeding up the data collection and it almost made my data collection fun.

A final word of warning: if your actions require you to hold your arm straight out in front of you at a 90 degree angle to complete them, you may end up with one super strong (or painful) shoulder and one normal one. So take regular breaks!
Thanks for reading! Good luck with your apps! :)

--

--

Dan Beech

Flutter Engineer @ Bloom Money 🌱. Liverpool, UK. Enjoy mobile development with a focus on UI/UX implementation 🧑‍💻