First things first, by "in Python", I mean "using
NumPy and
SciPy, two fantastic scientific computing packages for Python that I don't know what I'd do without." I need them to do my job, every day I do it.
Second things second, what the heck is "parameterizing curves"? A lot of times, when poking around for things I want to do with data, I pick up terms here and there that give me the sense that they're just important. If you search around for "smoothing curves" or "interpolating data", you're bound to end up reading a little about B-splines.
I'm not going to go into the mathy stuff, but I'll instead give you a nice picture that explains what B-splines can do.
Here's an extracted contour of a mouse. If you strain your eyes, you can see the head in the upper right, and the butt in the lower left, and the tail's been omitted. But, notice a couple things:
- Our eyes pick out a clear contour
- The contour is not sampled at even intervals
- There's some noise and jitter in the contour
Here's what points calculated from a B-spline fit to the above data look like:
Proof in the puddin', as they say. So how do you do it? Again, not going to get into the principles behind it, just going to show you the code that does it:
Okay, let's break it down, using those numbers as indices.
1. The smoothing factor of a spline defines how tightly it'll fit to the points you provide it. Keep your smooth factors within the range [ numPointsIn - sqrt( 2 * numPointsIn ) , numPointsIn + sort( 2 * numPoints In ) ]. Where'd I get that range?
The trusty documentation for the function in the next line.
2.Here we actually calculate the spline. We feed it our noisy, incomplete points, and two other parameters. One, the smoothing parameter we set before, and then a "per" parameter I'll 'splain down below. A couple things here we should talk about.
- What's "tck" stand for? "t" is for knots ( no idea why ), "c" is for coefficients, and k is for the degree of the polynomial ( k is often used for these things ).
- What's that underscore doing there? Well, that's a handy way to say in Python that you don't give a hoot about any other variables returned from a function. Basically says "give me the first, toss the rest". Comes in handy when functions spit out varying numbers of outputs based on how you call them, but you rarely end up caring about most of them.
- per = 1. What's that? Well, our contour array doesn't end where it starts, meaning it doesn't form a closed loop. When we set "per", standing for "periodic" to 1, we're asking the B-spline to close itself off. Useful for contours like this. Might not suit your needs, but it definitely does suit ours here.
3. Why a variable from 0 to 1? What a spline of the type we're making does is to map a set of points onto a new, synthetic continuously varying variable. That's called "parameterization", and it gives us the neat ability to tell where we are in the curve, in the sense of beginning to end. I'll show you more about that in a bit. But, what we're basically saying is we want 100 evenly-spaced values all around our contour.
4. Evaluate the spline, giving us our beautiful, smooth, even x and y values that are plotted in a rich blue above.
If you want to start interpolating and smoothing, that's all you need to know! But here's an extra tidbit.
So, what if you didn't add that underscore in #2? Well, the next variable you could get out of the spline evaluation function is "u", which we calculated separately ourselves. Why didn't we use the one they gave us? Let's plot the one they would give us (black), versus the one we made (blue).
Ew, jaggedy, ugly black u. Beautiful straight blue u. Why's that? The "u" given back to us from interp.splprep will correspond to the points that go into fit the B-spline, which means the same unevenly sampled points you had in the first place. What's neat about this, though, is it provides you a measure of where your uneven sampling is. That can actually be a pretty useful metric sometimes, if you need a single-dimension metric to quantify data quality along a winding contour, like a coast line.
There's another notion in B-splines called "control points", which I don't have fully figured out, but am digging into. If you, dear reader, know how to work with setting explicit control points in SciPy, please do let me know, I'm extremely interested. If I find out, or if somebody finds out and tells me, I'll add a little section about that here. Ciao, happy data wrangling!