Technical Article

How Sonic Tractor Beams Work

January 19, 2017 by Jeremy Lee

This practical tractor beam uses ultrasonic waves to levitate small spheres at short distances.

Asier Marzo and his colleagues have developed a practical "tractor beam" that uses ultrasonic waves to levitate small spheres at short distances using "dark trap" techniques pioneered using optical tweezers. But how does it work?

Did you know that tractor beams exist now?

In fact, there are two types: optical and sound.

Optical tractor beams (using lasers) have been around for a while, but they don't exert much force for the energy you expend to create them. You can move around cells and pollen on microscope slides with optical tweezers, but to levitate even a grain of sand you'd need so much light you'd end up vaporizing the thing you're trying to lift. Fun, but not useful.

If you really want to transfer mechanical energy to an object, there are more efficient methods, like air pressure waves (sound).

We should also distinguish what makes a tractor beam different from a jet of air or other means of levitation: its ability to hold an object in a static position within the beam. It's not about pushing or pulling; it's about holding something still.

So here's my first (and only) issue with these "tractor beams": the name. It might be the term of the art, but you know "tractor beams" are supposed to work in space (see: Star, both Trek and Wars) and these only work in air.

Everyone knows what you call a tool that manipulates small objects using ultrasonic waves. It's a Sonic Screwdriver!

Build your own portable Acoustic Tractor Beam. Courtesy of Asier Marzo

Now if you don't really care how or why it works and you just want to build one, then Asier Marzo Perez and his group at the University of Bristol have published a fantastic build video and article on Instructables. You'll need two dozen ultrasonic transducers, an Arduino (acting as the 40kHz frequency generator), an H-bridge stepper driver module as a power amp for the transducers, and a 3D printer to make a "dish" with the right geometry.

Image courtesy of Asier Marzo, from: Build your own portable Acoustic Tractor Beam


I'm not entirely sure I approve of university researchers doing their own cool electronics projects and YouTube videos (that's supposed to be my job!) but I do appreciate the difficult process of turning advanced theory into fun demonstrations.

If you're still with me (and not ordering parts off eBay) you're probably wondering "what sorcery is this?" So let's get into the details of how it actually works.

Creating Pressure Points in Thin Air

Most people have spent time rearranging their speaker systems to get better sound quality. You intuitively know that there's a sweet spot in between the stacks where the stereo sounds best and clearest. You might even have a setup with satellite speakers and whatnots that need careful placement around your favorite chair.

But have you ever considered just how hard it is to set up a sound system that works for 10,000 people at once?

Early mega-concerts solved it by just having a really, really big stereo system up at the main stage loud enough for everyone to hear, with a sort of "demilitarized zone" to keep people out of direct hearing-damage territory. But just like a home stereo, there was a sweet spot (beam) down the middle of the stage, with some funny side-lobes in the wings.

Another approach is to scatter many smaller speakers around the venue, but that's when you run into the echo problem. If you have, say, three powerful speaker systems separated by 100 meters, that's a 0.3 second delay between each, and most people will be in a position to hear one powerful nearby speaker and a ring of echoes from the others. You hear this all the time in sports PA systems... stems... ems..

If you have three big speaker stacks, logically there would be one equidistant point where the sound would be perfect. Even the phase of the incoming waves would be in synch. All the sound waves would add, none would cancel, and you'd have a peak in the sound energy at that point.


"blown away"  by Maxell


Going back to a bit of hand-waving oversimplification, now imagine that "static air pressure" and "sound energy" are basically the same thing. The more sound energy you have in a chunk of air, the more pressure it has compared to another equal volume of air.

This all gets complicated fast because it's not real air pressure. You can't get the flows that normally result from actual pressure gradients because any air that tries to move along the gradient wants to move straight back again once it gets out of the artificial party zone.

Plus, when it comes to levitating spheres, there are all kinds of other mechanical effects at the physical boundary that are related to “thermoviscous” heat transport; these effects are still being investigated. In the same way that ice-skates don't actually skate on ice (but rather on a thin layer of water created by the pressure of the blades) the air at the boundary might not be as thin as we think.

Fluid dynamics is tricky stuff. In the right situations, the non-obvious effects dominate—especially if we help them out. We’ll get into that. First, we need to get our sound system set up.

Moving the Sweet Spot

Going back to the rock concert analogy, imagine you're the sound engineer. If your mixing desk is a good one, you'll have "delay" sliders as well as the usual volume controls. If you can introduce small extra delays to some of the speakers, you can shift the "sweet spot" from the equidistant point to pretty much anywhere in the stadium you want. Sound travels about 300m/s, so for every millisecond of delay you add, you shift the sweet-spot about 30cm closer to that speaker.

Music only has one sweet spot where all the waves rejoin perfectly, but what if we broadcast something repeating, like a monotone? A single frequency wave can synchronize with time-shifted copies of itself, so we get multiple places where the waves add; a classic multi-source interference pattern.

This leads to the "fundamental equation" of tractor beams, which we're going to break down into its parts.



The air pressure at any point in the field is equal to the amount of "in phase" sound energy arriving at that point from all the speakers combined.



      The acoustic pressure created at point r


  Power put into the speaker


  Speaker efficiency at pushing air


  Radiation pattern (directivity field) calculated from off-axis angle


  Distance from speaker to point (in meters)


  Phase delay added to the signal by mixing desk


  Phase delay added by time-of-flight (i.e., distance divided by wavelength)


Each source is assumed to have a "radiation pattern" which encodes the idea that speakers are louder at the front (ultrasonic transducers especially have a fairly tight beam) and gets quieter the further away you are. If you know how powerful the speaker is, and how wide the beam of sound coming out the front is then you can calculate the raw volume you’ll hear from each speaker.

The Df() radiation pattern for circular speakers is a Bessel function that depends only on how many degrees θn from the centerline axis you are. 

The distance from our seat to each source speaker also matters as far as how many audio wavelengths will fit. The leftover fraction of whole wavelengths determines the phase change of the arriving signal due to the delay of traveling through the air, which adds to any delay we may have set with the desk.

The ei(n) term is a trick borrowed from the Fourier transform —think of it as a little "clock hand" that is rotated by the amount in the exponent—a way of turning the phase number into a compass direction, like when they say in old movies "They’re attacking from three o'clock!" (That’s fine, it’s only a quarter to two.)

What we're trying to calculate is how many of the signals hitting a point are in phase with each other, also known as a phase correlation. The simplest way to get there is to turn all the phases into "clock hand" vectors (complex numbers) and then lay all the vectors end to end (sum them) and see how far away from the origin you got.



If the vectors are correlated, they'll all point roughly in the same direction and you'll get far. If the phases are all random, you'll do a drunkard's walk around the origin and end up almost where you started. It doesn't matter what order you sum the steps, you'll get to the same place. 

In this case, we pre-scale all the little clock arrows with the "loudness" of that speaker from the given distance, so it's possible for many quiet in-phase sources to be canceled by one loud nearby out-of-phase speaker. Note that the final P(r) sum is also a complex number; phase is still part of the answer, not just magnitude. That has some subtle implications later.

And that's it. You just run that equation over all points in space to calculate how much "acoustic pressure" your speaker system is delivering to each seat, and you get essentially a 3D diffraction pattern.

Generally, we want to design a particular field shape, and the Bristol team used an optimization algorithm to work the problem backwards, looking for patterns which best satisfied their requirements. The algorithm is essentially “tuning the knobs” on the big mixing desk until the sweet spot (or spots) end up where they wanted them to be.

The Gor’kov Potential

Knowing where the pressure nodes are is not the name as knowing where our ball will end up when we put it into the acoustic field. Intuitively, the high-pressure zones will push our ball towards low-pressure. But there’s also a secondary effect which is very hard to explain, but critical to building an efficient trap.

As well as being repelled by pressure, it turns out levitated balls are attracted to flow; that is, they're attracted to the edges of the pressure nodes where there is a sudden transition. We don’t typically notice this because this effect is weak compared to direct pressure.

But this small ‘secondary’ term is there and it’s very, very important when creating tractor beams.

The exact nature of this secondary effect depends on a lot of factors, including the size of the particle and the wavelength of the audio. If you really want to know the math, these two papers are quite useful:

Regardless, just try to remember that as well as being blown by the wind, things also like to be in the flow.

The place where both of those things coincide is the point with the maximum Gor’kov potential.

Quiet Traps

In practice, a single pressure node is like a punchy fist—it can move things around by simple repulsion, but not with finesse. Intuitively, the best thing to do would be to surround the object we want to move in a “ring” of pressure zones, so the object is stable in the center. A trap!

In optics, this is called a "dark trap" because the stable point is where the laser is darkest. By analogy, in acoustics, it's called a "quiet trap". 

It turns out there are three fundamental field shapes that create maximum Gor’kov potentials at a single spot. They differ in the shape of the cage. 


Because you knew it was coming eventually.


The following diagrams are from the paper: Holographic acoustic elements for manipulation of levitated objects by Asier Marzo, Sue Ann Seah, Bruce W. Drinkwater, Deepak Ranjan Sahoo, Benjamin Long, and Sriram Subramanian.

They show the acoustic pressure field (front and top view) plus the pressure phase diagram, and some 3D models of the field surface for each trap configuration.

How to interpret the phase diagram: At a physical level, each high-pressure zone is a result of sound waves coinciding regularly at that point. They're not static pressure zones—they "pulse" at an ultrasonic rate. If two zones oscillate together, they are in phase. If they occur alternately, they are 180 degrees out of phase. Completely silent points have no phase. Since only the relative difference matters, the phase information is represented as a cyclic "rainbow" of colors.


The Bottle Trap

Image courtesy of Asier Marzo, from Holographic acoustic elements for manipulation of levitated objects

At first glance, this trap looks ideal. It’s got a nice ring, big top and bottom, and seems to hold the ball neatly from all sides. But it’s also quite weak and unstable because the ball has a lot of room in the middle to rattle around.

It does work and, with the walls on every side, it's the easiest to understand. But it's not the most efficient trap.

The Vortex Trap

Image courtesy of Asier Marzo, from Holographic acoustic elements for manipulation of levitated objects


This is an interesting one, called the “vortex” because the ball is a like a cow in a tornado. It's sucked up into the middle of the ring (the place of maximum flow, but minimum pressure) and the vortex motion will actually spin the particle around.

Think of this one as “screwdriver mode”. If you look at the top-down phase diagram (e), you can see the swirl of phases that tends to spin any objects in the trap.

The Twin Trap

Image courtesy of Asier Marzo, from Holographic acoustic elements for manipulation of levitated objects


The twin trap is actually the best configuration. In the same way the ball is sucked into the center of the vortex trap above, the enormous “flow” between the two out-of-phase pressure zones pulls the ball into the quiet zone between them.

That’s why the ball doesn’t fall out of the trap, either through the sides or the ends. It might seem like the cage only has two walls, but it’s balanced in the middle where those big primary forces cancel out, leaving only the “stay in the flow” term, which keeps the ball stable at the mid-point.

By concentrating all the energy into two polar-opposite pressure nodes (instead of spreading it out into a wider ring or more walls), we can create the most concentrated gradient of all the traps. Note how much “brighter” the pressure spots are compared to the vortex trap.

Projecting the Field

An acoustic phased array is just a wall of speakers all pointed in roughly the same direction. In ideal circumstances, all of the traps can be projected from the same array of transducers. It’s a software choice, set by our mixing desk knobs!


Flat phased array of ultrasonic transducers. Image from Realization of compact tractor beams using acoustic delay-lines, courtesy of Asier Marzo


This next diagram shows the amount of delay added to each speaker in the wall, expressed as a color. It's like our delay knobs glow in rainbow shades as we spin them.

The patterns look complicated at first, but can be broken down into the combination of four elements: a choice from three simple patterns to project the intended trap (which thankfully bear some resemblance to the traps they create) plus a common "focusing" phase correction which puts the trap at a particular focal distance.


Image from Holographic acoustic elements for manipulation of levitated objects, courtesy of Asier Marzo


Making It Easy

If we know exactly what shape we want (twin trap!) and have little need to change it, we don’t have to emit complicated wavefronts from a phased array. Instead of a unique signal going to each speaker, why not simplify the hardware by sending the same signal to lots of the speakers and control the time-of-flight delay by changing the distance?

That is, forget the complicated delay electronics—just drag the speakers around!

This is where the 3D-printed dish comes in. It creates an accurate frame that makes it easy to position the transducers to project a strong twin trap when driven by a single common 40kHz signal. No delay sliders needed—they’re baked-in to the geometry.

Even better, we can still move the position of the trap up or down along the center axis with identical phase shifts to about half of the transducers. So we can still "extend" or "retract" our tweezers' point (actually, a repeating conveyer-line of them) under software control, but without having to do complicated calculations on-the-fly. Just one knob will do it naturally.

Image from Build your own portable Acoustic Tractor Beam, courtesy of Asier Marzo


If you think about it, the “tractor beam projector” isn’t really beaming some levitation effect directly onto the sphere. It’s actually focusing on two spots immediately beside it. The ball is acting as a conduit between these zones and, by keeping a foot on each dance-floor, it gets the energy it needs to stay there.


Particle Size

This really only works on objects smaller than the wavelength of the sound involved. 40kHz ultrasound gets you 4mm particles. If you want to tractor larger things, the frequency drops into audible range and that's called a "siren". But one advantage is the size and shape aren't particularly relevant under that limit because the tractor force scales with volume.



The "scultped dish" device doesn't use a lot of power (<6 watts) at its optimal distance, but that grows quickly if you use more generic phased arrays (20-50 watts) or want to extend the focal distance. At some point, you reach the limit for how much noise you can get out of the little boom-boxes. 

The beam does not noticeably heat the target, but there is a detectable warm spot if you point the beam at a wall. (The acoustic power has to go somewhere!) Asier also warns that fragile/harmonic objects (like glass) could be damaged with sufficient power if you hit its resonant frequency, though they haven't seen that actually happen yet. He also pointed out that high-power ultrasonic systems are used in medicine for ablating tissue, so there's that to be aware of.


About a microgram is what you can lift with this hardware. Also, while it's trivial compared to other forces, remember the weight of the object is technically being distributed across the faces of the speaker cones in your array via back-pressure. (Conservation of energy still applies, kids. It's the law.)


The holding force also depends on the "acoustic contrast" of the object compared to the medium. If the particle has strange acoustic properties or the medium is almost the same density as the target, you will lose traction. In air, all solid materials have extremely strong contrast, but if you are working in thick liquids you'll need to make sure the target object is dense enough.

Asier Marzo and the tractor beam. Image courtesy of the Bristol Interaction and Graphics group

So... what’s it for?

When asked, Asier had this to say:

If microscopes are our new eyes, acoustic manipulation could be our new hands. Not for levitating humans or cars but to manipulate small particles in mid-air or even inside our body.