Some Really Cool DSP

I finally took the time to really understand the polyphase synthesis filterbank and how to use it to reconstruct previously channelized signals. Once again, I looked to fred harris' work on the subject and have cracked it. The keys are in creating an M-to-2fs PFB channelizer, a 2-to-Pn PFB synthesizer, and perfect reconstruction filters (-6 dB at the edge of the passband).

I plan on writing up a more thurough look into this, and all of the code to do the proper synthesis filter will be going into GNU Radio soon, but for now, some pretty pictures.

First, we start off with a QPSK signal with very little noise:

Here, I split the signal up into 4 equally spaced channels. Channel 0 is around DC (0 Hz) and goes from -1000 - 1000 Hz, channel 1 is from 1000 - 3000 Hz, and channel 3 is from -3000 to -1000 Hz. Now, channel 2 actually spans from 3000 - 4000 and then wraps around to go from -4000 to -3000 Hz. These have a 1000 Hz bandwidth but are sampled at 2 samps/symbol so the channel sample rate is 2000 Hz.

I then go through the 2-to-Pn synthesizer where Pn is actually 8. This synthesis filter takes 4 channels in and produces four channels but at twice the sampling rate, so we now have a signal that is the original signal but sampled at twice the orignal rate. Notice also that it's offset by a bit in frequency in the PSD. That's a result of my plotting that I don't feel like correcting (it's related to the reason why channel 2 spans the edge of the positive and negative spectrum). Obviously, the constellation is not rotating, so we're ok.

Notice also that this signal was first split up, filtered, and then put back together again. Perfectly. That constellation should be enough proof of that. This has some huge implications to what we can do with this concept once I generalize it. This was a simple demonstration to get the algorithm and math correct, but now we can have some serious fun!

For those of you into such things, this is the filter I used for this example:

I generated it using GNU Radio firdes.low_pass_2 filter generator with these parameters:

  • Gain = 1
  • Sample Rate = 8000 Hz
  • Bandwidth = 1000 Hz
  • Transition bandwidth = 200 Hz
  • Attenuation = 80 dB
  • Window: Blackman-harris



Volk Benchmarking

Benchmarking Volk in GNU Radio

The intention of Volk is to increase the speed of our signal processing capabilities in GNU Radio, and so there needs to be a way to look at this. In particular, there were some under-the-hood changes to the scheduler to allow us to more efficiently use Volk (by trying to provide only aligned Volk call; see Volk Integration to GNU Radio for more on that). These changes ended up putting more lines of code into the scheduler so that every time a block's work function is called, the scheduler has more computations and logic to perform.

Because of these changes, I am interested in understanding the performance hit taken by the change in the scheduler as well as the performance gain we would get from using Volk. If the hit to the scheduler is less than a few percentage points while the gain from Volk is much larger, then we win.

Performance Measuring

There is a lot of debate about what the right performance measurement is for something like this. In signal processing algorithms, we are interested in looking at the speed that it can process a sample (or bit, symbol, etc.), so a time-based measurement is what we are going to look at. Specifically, how much time does it take a block to process $N$ number of samples?

If we are interested in timing a block, the next question is to ask what clock to use? And if we look into this, everyone has their own opinion on it. There's wall time, but that's suspect because it doesn't account for interruptions by the OS. There's the user and system times, but they don't seem to really represent the time it actually takes a program to produce the output; and do we combine those times or just use one of them? This also really represents a lower bound if no sharing were occurring and with no other system overhead.

In the end, I decided what I cared about, and what our users would care about, is the expected time taken by a block to run. So I'm going with the wall clock method here. Then there's the question of mean, min, or max time? They all represent different ways to look at the results. It is, frankly, easy enough to capture all three measurements and let you decided later which is important (for that matte, it would be an easy edit to the benchmark tools to also collect the user and system time for those who want that info, too).

The results shown in this post simply represent the mean of the wall time for a certain number of iterations for processing a certain number of samples for each block. I am also going to show the results from only one machine here to keep this post relatively short. 

Measurement Tools

I built a few measurement tools to both help me keep track of things and allow anyone else who wants to test their system's performance to do so easily. These tools are located in gnuradio- examples/python/volk_benchmark. It includes two Python programs for collecting the data and a plotting program to display the data in different ways. I won't repost here the details of how to use them. There's a lengthy and hopefully complete README in the directory to describe their use.

Measurement Results

For these measurements, I have two data collection programs: and The first one runs through all of the math functions that were converted to using Volk and the second runs through all of the type conversions that were 'Volkified.' These could have easily been done as one program, but it makes a bit of logical sense to separate them.

The system I ran these tests on is an Intel quad-core i7 870 (first gen) at 2.93 GHz with 8 GB of DDR3 RAM. It has support for these SIMD architectures: sse sse2 ssse3 sse4_1 sse4_2.

I'm interested in comparing the results of three cases. The first case is the 'control' experiment, which is the 3.5.1 version of GNU Radio which has no edits to the scheduler or the use of Volk. Next, I want to look at the scheduler with the edits but still no Volk, which I will refer to as the 3.5.2 version of GNU Radio. The 'volk' case is the 3.5.2 version that uses Volk for the tests.

The easiest way to handle these cases was to have two parallel installs, one for version 3.5.1 and the other for 3.5.2. To test the Volk and non-Volk version of 3.5.2, I simply edited the ~/.volk/volk_config file and switch all kernels to use the 'generic' version (see the README file in the volk_benchmark directory for more details on this).

For the results shown below, click on the image for an enlarged version of the graph.

Looking at the type conversion blocks, we get the following graph:

Volk Type Conversion Results

Another way to look at the results is to look at the percent speed difference between the 3.5.2 versions and the 3.5.1. So this graph shows us how much increase (positive) or decrease (negative) of speed the two cases have over the 3.5.1 control case.

Percent Improvement Over v3.5.1 for Type Conversion Blocks


These are the same graphs for the math kernels.

Volk Math Results
 Percent Improvement Over v3.5.1 for Math Blocks

There are two interesting trends here. The most uninteresting one is that Volk generally provides a massive improvement in the speed of the blocks, the more complicated the block (like complex multiplies) the more we gain from using Volk.

The first really interesting result is the improvement in speed between the schedulers from 3.5.1 and 3.5.2. As I mentioned earlier, we increased the number of lines of code in the scheduler that make calculations and logic and branching calls. I expected us to do worse because of this. My conjecture here is that by providing mostly aligned blocks of memory, there is something happening with data movement and/or the cache lines that is improved. So simply aligning the data (as much as possible) is a win even without Volk.

The other area this interesting is that in the rare case, the Volk call comes out to be worse than the generic and/or the v3.5.1 version of the block. The only math call where this happens is with the conjugate block. I can only assume that conjugating a complex number is so trivial (the sign flip of the imaginary part) that the code for it is highly optimize already. We are, though, talking about less than 5% hit on the performance, though. On the other hand, the multiply conjugate block, which is mostly when the conjugate is used in signal processing, is around 350% faster.

The complex to float conversion is a bit more of a head scratcher. Again, though, we are talking about a minor (< 3%) difference. But stiil, that these do not perform better is really interesting. Hopefully, we can analyze this farther and come up with some conclusions as to why this is occuring and maybe even improve the performance more.



Volk Integration to GNU Radio

Getting Volk into GNU Radio

We've been talking about integrating Volk into GNU Radio for what seems like forever. So what took us so damn long? Well, it's coming, very shortly, and I wanted to take a moment to discuss both the issues of Volk in GNU Radio and how to make use of it with some brand-new additions.

The main problem with using Volk in GNU Radio is the alignment requirements of most SIMD systems. In many SIMD architectures, Intel most notably (and we'll stick with them in these examples as it's what I'm most familiar with), we have a byte-alignment requirement when loading data into the SIMD-specific registers. When moving data in and out, there is a concept of aligned and unaligned loads and stores. You take a hit when using the unaligned versions, though, and they are not desirable. In SSE code, we need to by 16-byte aligned while the newer AVX architecture wants a 32-byte alignment.

But we have the dynamic scheduler in GNU Radio that moves data between blocks in chunks of items (where an item is whatever you want: floats, complex floats, samples, etc.). The scheduler tries to maximize system throughput by moving as large a chunk as possible to give the work function lots of data to crunch at once. Larger chunks minimize the overhead of going into the scheduler to get more data. But because we are never sure how much data any one block has ready for the next in the chain of GNU Radio blocks, we cannot always guarantee the number of items available, and so we cannot guarantee a specific byte alignment of our data streams.

We have one thing going for us, though: all buffers are page-aligned at the start. This is great since a page alignment is good enough for any current or foreseeable SIMD alignment requirement (16 or 32 bytes right now, and when we get to the problem of requiring more than 4k alignments, I'll be happy enough to readdress the problem then). So the first call to work on a buffer is always aligned. 

But what if the work function is called with a number of items that breaks the alignment? What are we supposed to do then?

The first attempt at a solution was to use the concept of setting a set_output_multiple value for the block. This call tells the scheduler that the block can only handle chunks of data that contain a number of items that is a multiple of this value. So if we have floats in SSE chips, we need a multiple of 4 floats per call to the work function. It will never be called with less than 4 or some odd number that will ruin our alignment. 

But there's a problem with that approach. The scheduler doesn't really function well when given that restriction. Specifically, there are two issues. First, if the data stream being processed is finite and that number is not a multiple of what's required by the alignment, then the last number of items won't ever be processed. That's not the biggest deal in the world as GNU Radio is typically meant to stream data, but it could be a problem for many applications of processing data from a file.

The second problem, though, is latency. When processing packetized data, we cannot produce a packet until we have enough samples to make the packet. But at some point, we have the last few samples sitting in the buffer waiting to be processed. Because of our output multiple restriction, we leave those sitting behind until more samples are available so that the scheduler can pass them on. That would mean a fairly large amount of added latency to handle a packet, and that's unacceptable.

No, what we need is a solution that keeps the data flowing as best as it can while still working towards keeping the buffers aligned.

Branch Location

This post discusses issues that will hopefully be merged into the main source code of GNU Radio soon. However, I would like it to undergo significant testing, first, and so have only published a branch at:

as the branch safe_align.

Scheduler Alignment

Instead of using the set_output_multiple approach, we need a solution that meets the following goals:

  • Minimize effect to latency; maximize throughput.
  • Try to maintain alignment of buffers whenever possible.
  • When not possible to keep alignment, pass on data quickly.
    • minimize latency accrued by holding data.
  • Re-establish alignment but not at the expense of extra calls.
    • pass on the largest buffer possible that re-establishes alignment.
    • don't pass the minimum required. The extra overhead of calling a purposefully-truncated work function is greater than the benefit of realigning quickly.

In it's implementation, we want to minimize any added computation to the scheduler and slow down our code.

In the approach that we came up with, the scheduler looks at the number of items it has available for the block. If there are enough items to keep the buffers aligned, it passes on the largest number of samples possible that maintains the alignment. If there aren't enough, then it sends them along anyway, but it sets a flag that tells the block of the alignment problem.

When the buffers are misaligned, the scheduler must try to correct the alignment. There are two ways of doing this. The easiest way is just to pass on the minimum number of items possible that re-establishes alignment. The problem with this approach is that the number is really small, so you are asking the work function to handle 1, 2, or 3 items, say. Then it has to go back to the scheduler and ask for more. This kind of behavior incurs a tremendous amount of overhead in that it deals more with moving the data than processing it.

The second way of handling the unalignment is to take the amount of data currently available and pass on the largest possible chunk of items that will re-establish the alignment. This forces us to handle another call to work with unaligned data, but the penalty for doing that is much less than the overhead of purposefully handling small buffers. In my experiments and analysis, most of the data comes across aligned, anyway, so these calls are minimal.

To accomplish these new rules, the GNU Radio gr_block class (which is a parent class to all blocks) has these new functions:

  void set_alignment (int multiple);

  int  alignment () const { return d_output_multiple; }

  void set_unaligned (int na);

  int unaligned () const { return d_unaligned; }

  void set_is_unaligned (bool u);

  bool is_unaligned () const { return d_is_unaligned; }

A block's work function can check it's alignment and make the proper decision on what to do based on that information. The block can test the is_unaligned() and call. If it indicates that the buffers are aligned, than the aligned Volk kernel can be called. Otherwise, it can either process the data directly or call an unaligned kernel.

In order not to make this blog post longer than it already is, I will post a separate blog post discussing the method and results of benchmarking all of this work. In it, just to tease, I'll be showing a few surprising results. First, I'll show that the use of Volk can give us dramatic improvements for a lot of simple blocks (ok, that's not surprising). Second, on the tested processors, I see almost no penalty for making unaligned loads and stores. And third, lest you think that last claim makes all of this work unnecessary, my test show that the efforts to keep the alignment going in the new scheduler actually improves the processing speed even without using Volk. So there is a two-fold benefit to this work: one from the scheduler itself and then a second effect of Volk. 

Making Unaligned Kernels

Because we will be processing unaligned buffers in this approach, we need to either handle these cases with generic implementations or use an unaligned kernel. The generic version of the code would be like what is already in a block now that we would like to transition to using Volk. This would be the standard C/C++ for-loop math.

A useful approach, though, is to make use of unaligned Volk kernel. Even though an unaligned load is a bit more costly than an aligned call, we try to maximize the size of the buffers to process and the overall affect is still faster than a generic for loop. So it behooves us to call the unaligned version in these cases, which might mean making a new kernel specifically for this. 

Luckily, in most cases, the only difference between an aligned Volk kernel and an unaligned one is the use of loadu instead of load and storeu instead of store. These two simple differences makes it really easy to create an unaligned kernel.

With this approach, a GNU Radio block can look really simple. Let's use the gr_multiply_cc block as an example. Here's the old version of the call:

gr_multiply_cc::work (int noutput_items,
  gr_vector_const_void_star &input_items,
  gr_vector_void_star &output_items)
  gr_complex *optr = (gr_complex *) output_items[0];
  int ninputs = input_items.size ();
  for (size_t i = 0; i < noutput_items*d_vlen; i++){
    gr_complex acc = ((gr_complex *) input_items[0])[i];
    for (int j = 1; j < ninputs; j++)
      acc *= ((gr_complex *) input_items[j])[i];
    *optr++ = (gr_complex) acc;
  return noutput_items;

That version uses a for-loop over both he number of inputs and number of items. Here's what it looks like when we call Volk.

gr_multiply_cc::work (int noutput_items,
     gr_vector_const_void_star &input_items,
     gr_vector_void_star &output_items)
  gr_complex *out = (gr_complex *) output_items[0];
  int noi = d_vlen*noutput_items;
  memcpy(out, input_items[0], noi*sizeof(gr_complex));
  if(is_unaligned()) {
    for(size_t i = 1; i < input_items.size(); i++)
      volk_32fc_x2_multiply_32fc_u(out, out, (gr_complex*)input_items[i], noi);
  else {
    for(size_t i = 1; i < input_items.size(); i++)
      volk_32fc_x2_multiply_32fc_a(out, out, (gr_complex*)input_items[i], noi);

  return noutput_items;

Here, we only loop over each input, but the calls themselves are to the Volk multiply complex kernel. We test the unaligned flag first. If the buffers are flagged as unaligned, we use the volk_32fc_x2_multiply_32fc_u kernel where the final "u" indicates that this is an unaligned kernel. So for each input stream, we process the data this way. In particular, this kernel only takes in two streams at once to multiply together, so we take the output and multiply it by the next input stream after having first pre-loaded the output buffer with the first input stream.

Now, if the block's buffers are aligned, the flag will indicate as much and the aligned version of the kernel is called. Notice that the only difference between the kernels is the "a" at the end instead of the "u" to indicate that this is an aligned kernel.

If we didn't have an unaligned kernel available, we could either create one or just call the old version of the gr_multiply_cc's work function in this case.

Blocks Converted so Far

These next few sections are starting to get really low-level and specific, so feel free to stop reading unless you are really interested in the development work. I include this as much for the historical reference as anything.

Most of these blocks that I have so far moved over to using Volk fall into the category of the "low-hanging fruit." That means that, mostly, the Volk kernels existed or were easy to create from existing Volk kernels (such as making unaligned versions of them), that the block only needed a single Volk kernel to perform the activity required, and that had very straight-forward input to output relationships.

On occasion, I went and added a few things that I thought were useful. The char->short and short->char type conversions did not exist, but they were already a Volk kernel, so making them a GNU Radio block was easy and, hopefully, useful.

I also added a gr_multiply_conjugate_cc block. This one made a lot of sense to me. First, it was really easy to add the two lines it took to convert the Volk kernel that did a complex multiply into the conjugate and multiply kernel that's there now. Since this is such an often-used function in DSP, it just seemed to make sense to have a block that did it. My benchmarking shows a notable improvement in speed by combining this operation into a single block, too. Just to note, this block takes in two (and only two) inputs where the second stream is the one that gets conjugated.

What follows is a list of blocks o different types convered to using Volk

Type conversion blocks

  • gnuradio-core/src/lib/general/gr_char_to_float
  • gnuradio-core/src/lib/general/gr_char_to_short
  • gnuradio-core/src/lib/general/gr_complex_to_xxx
  • gnuradio-core/src/lib/general/gr_float_to_char
  • gnuradio-core/src/lib/general/gr_float_to_int
  • gnuradio-core/src/lib/general/gr_float_to_short
  • gnuradio-core/src/lib/general/gr_int_to_float
  • gnuradio-core/src/lib/general/gr_short_to_char
  • gnuradio-core/src/lib/general/gr_short_to_float

Filtering blocks

  • gnuradio-core/src/lib/filter/gr_fft_filter_ccc
  • gnuradio-core/src/lib/filter/gr_fft_filter_fff
  • gnuradio-core/src/lib/filter/gri_fft_filter_ccc_generic
  • gnuradio-core/src/lib/filter/gri_fft_filter_fff_generic

General math blocks

  • gnuradio-core/src/lib/general/gr_add_ff
  • gnuradio-core/src/lib/general/gr_conjugate_cc
  • gnuradio-core/src/lib/general/gr_multiply_cc
  • gnuradio-core/src/lib/general/gr_multiply_conjugate_cc
  • gnuradio-core/src/lib/general/gr_multiply_const_cc
  • gnuradio-core/src/lib/general/gr_multiply_conjugate_cc
  • gnuradio-core/src/lib/general/gr_multiply_const_cc
  • gnuradio-core/src/lib/general/gr_multiply_const_ff
  • gnuradio-core/src/lib/general/gr_multiply_ff

Gengen to General

One thing that might confuse people who have previously developed in the guts of GNU Radio is how I moved some of the blocks from gengen to general. Many GNU Radio blocks perform some function, like basic mathematical operations on two or more streams, that behave identically from a code standpoint but which use different data types. These have been put into the gengen directory as templated files where a Python script is used to autogenerate the type-specific class. This was before Swig would properly handle actual C++ templates, so we were left doing it this way.

Well, with Volk, we don't really have the option to template classessince the Volk call is highly specific to the data type used. So when moving certain math block for a specific type out of gengen, we went with the simple solution of removing that data type from the autogeneration scripts and placing it into general as a stand-alone block that can call the right Volk function. Good examples are the gr_multiply_cc and gr_multiply_ff blocks to see what I mean.

This really seems like the simplest possible answer to the problem. It maintains our block structure that we've been using for almost a decade now and keeps things clean and simple for both developers and users. The downside is some duplication of code, but with the Volk C functions, that is somewhat inevitable and not a huge issue to deal with.


"GNU Radio is Crap" and Other Such Insights

There seem to be two kinds of people that I meet when talking about GNU Radio. Those who loves it and those who hate it. I rarely get anyone either in between or who just want to know more about it. I am writing this to explore why this happens and why some people think GNU Radio is, as is often put, crap. 

First, let me take a step back. I was recently at the Software Radio Implementation Forum (SRIF) in Hong Kong to give a talk on GNU Radio. There were some enthusiastic people in the audience who I talked to afterwards about the project. Also there was a talk on Microsoft's Sora project, another software radio system. The presentation was very enlightening, especially in regards to identifying the differences between GNU Radio and Sora. In large part, we have come at things from two completely different philosophies, which I think plays into the main thread of this article.

GNU Radio has always been about enabling the development and experimentation of communications. It was designed around the flow graph and the block structure to piece radios together. We wanted developers and designers. As such, we spent little time working on grand applications and demonstrations of the project, save maybe for the original ATSC decoder. We have lots of examples, but most of them are about how to use GNU Radio, not about showing what GNU Radio can do. There's a subtle but important difference between the two.

Sora, on the other hand, came at things with the intent of building and showing off complex applications in software radio, ostensibly to prove that GPPs could handle complex waveforms. Their initial project enabled them to do 802.11 in real-time. They then moved to LTE. Their use of memory and look-up tables to replace computations and their initial focus on using SIMD programming for speed have made a very fast, solidly performing SDR system. But it is really only now that they are building Sora into a project for developers. At this SRIF talk, I heard about them building "bricks," which we would call "blocks," that will fit together and create a radio. 

Now, this is, obviously, my take on the history of the project from the presentations and papers that I have seen and read. 

So the way that I see it is that the projects came from different directions. We started with the development and programming model, but lack good applications to showcase our product. They have great apps, but are just now getting things together as a developers platform. This represents two different mindsets that I see in the computer and programming community as a whole. There are those who want to develop and those who want to use. In a very basic and non-nuanced sense, these attitudes are the distinction between Windows users and Linux users. Linux users don't mind getting their hands dirty and working a bit harder for the freedoms and power Linux gives them. Windows users are more interested in using computers. (I confess that such as simplified comparison makes me feel like a bad stand-up comic, but I hope the point is made without belaboring the subject).

From this perspective, those people who think GNU Radio is crap, when I get a chance to talk to them about their problems, tend to be from the application side of things. They try to use one of our examples out of the box and treat it as though it is meant to be a full-on application. Except that they are examples of how to do things. We have not yet built a real digital communications application as part of GNU Radio. Our benchmark scripts are meant for benchmarking and exploring how to do digital communications. They were never meant to be used as part of a deployed communications platform. If you look at it, we don't even use equalizers or channel coding, so they are fragile and hard to use. But we've never claimed any differently.

Still, our examples are there for the world to see, and I can't be surprised when people mistake the intentions. And so I similarly cannot be upset when I get the reactions from people who wanted to use GNU Radio to make an quick application based on OFDM or MIMO. We simply haven't provided them with the means to do so.

On the other hand, when someone comes to the project with a developers mindset and wants to do something complicated, they can spend the time to work with the code and see what kind of capabilities are offered. We have some great developers who have done some amazing things with GNU Radio, and so we know that it's not crap. But it's not shiny, either.

To conclude, I'm left with the difficult problem of trying to think if there is a way to solve this problem. There is definitely no single solution or magic bullet. I'd love to see evne more complete applications get published on CGRAN or sites like Github. And for some of those that are already published, some amount of care needs to be taken to ensure some quality control, by which I mean good documentation and a user interface to the products as well as easy maintanence as GNU Radio versions evolve. I'd also love more feedback and additions to our examples in GNU Radio, even to the extent that we can get more of what we would call applications (there's a reason we have an examples and apps directory in our new tree

structure). These ideas involve the buy-in of the community and a slight change in our ideas and understanding of how to construct and present applications to the world.


SNR Estimators

In GNU Radio, we have been slowly evolving our digital communications capabilities. One thing that becomes quickly and painfully obvious to anyone doing real over-the-air communications with digital modulations is that the modulators and demodulators are the easy part. It's the synchronization that's the hard part and where most of your work as a designer goes.

Generally speaking, synchronization means three (maybe four) things: frequency, timing, and phase. The fourth is automatic gain control (AGC). While AGC isn't really "synchronization," it follows similar principles of an adaptive loop. Different modulation schemes have different methods for AGC and synchronization. These tend to fall into categories of narrowband, wideband (and then ultrawideband), and OFDM, but we can easily dispense with these categories and go in depth into differences within them, too. For instance, narrowband PSK and FSK systems have pretty widely different requirements out of a receiver for demodulation and the appropriate synchronization algorithms reflect this.

But this post is about SNR estimation. The reason to talk about synchronization here it twofold. First, like synchronizers, SNR estimation techniques can vary widely depending on the modulation scheme being used. Second, most synchronization schemes like to be reactive to changes in SNR. You can often find different algorithms that work well in low SNR cases but not high, or they are too costly to perform in high SNR where the signal quality allows you to use something simpler and/or more accurate. Take for example an equalizer. The constant modulus equalizer is great for PSK signals, but we know that an LMS decision-directed equalizer works better. But a decision-directed equalizer only works in cases where the SNR is large enough that the majority of samples are correct. So we often start with a blind CMA for acquisition purposes and then move to a decision-directed approach once we've properly locked on to the signal.

We've been meaning to add SNR estimators to GNU Radio for some time, now, and I finally developed enough of an itch to start doing it. But as I said, each modulation could use a different equalizer, and there are various equalizers designed for this modulation in that channel or that modulation in such and such channel. If you have access to IEEExplore, a simple search for "snr estimator" produces 1,279 results. Now, I know that's nothing to a Google search, but twelve hundred scholarly (we hope) articles on a topic is a lot to get through, and you will quickly see that there are finely-tuned estimators for different modulations and channels.

What I did was give us a start into this area. I took a handful of the most generic, computationally realistic estimators that I could find for PSK signals and implemented them. I'll give a shout out here to Normon Beaulieu, who has written a lot in this field. I've found that a lot of his work in SNR estimators to be accessible and useful, and he presents his work in ways that can be easily translated into actual, working code.

I also took the tack of looking for estimators that worked in AWGN channels. Now, if you've ever heard me speak on the subject of communications education or research, you've probably heard me scoff at anyone who develops a system under simulated AWGN conditions. In the case of an SNR estimator, though, I thought about this and had to come to the conclusion that the only way to handle this is to have an estimator that you can plug in variables for your channel model, which of course assumes that you have or can estimate these parameters. So in the end, I followed Beaulieu's lead in at least one of this papers and took algorithms that could be both simplified and tested by assuming AWGN conditions. I did, however, provide one of these algorithms (what I refer to as the M2M4 algorithm) in a way that allows a user to specify parameters to better fit a non-AWGN channel and non-PSK signals. Using the AWGN-based algorithms with this other version of the M2M4 seemed like a good compromise for being computable without more information but at least providing a tip of my hat to the issue of non-AWGN channels. If nothing else, these estimators should give us a ballpark estimate.

I also specifically developed these estimators based on a parent class that would easily allow us to add more estimators as they are developed. Right now, the parent class is specifically for MPSK, but we can add other estimator parent classes for other modulations; maybe have them all inherit from a single class in the end -- this is fine, since the inheritance would really be hidden from the end user. The class itself is in the gr-digital component and called digital_impl_mpsk_snr_est. It's constructor is simply:

digital_impl_mpsk_snr_est(double alpha);

Where the parameter alpha is the value used in a running average as every estimator I've seen is based on expected values of a time series, which we estimate with the running average. This value defaults to 0.001 and should be kept small.

I have created four estimators that inherit from this block. These are named the "simple," "skew," "M2M4," and "SVR" estimators. The last two come from [1]. The "skew" estimator uses a skewness measure that was developed in conversation with fred harris. The simple estimator is probably written up and documented somewhere, but it's the typical measurement based on the mean and variance of the constellation cloud.  I've tried to document these as best as possible in the header files themselves, so I'll refer you to the Doxygen documentation for details (note that as of the writing of this blog post, these estimators are only in the Git repository but will be available starting in the 3.5.1 release). The "SNR estimators"  group in the Doxygen manual can be used to find all of the available estimators and details about how to use them.

In particular, the M2M4 and SVR methods were developed around fading channels and both use the kurtosis of the modulation signal (k_a) and kurtosis of the channel (k_w) in their calculations. This is great if these values are know or can be estimated. In the case of the M2M4 algorithm, I provide a version of it, called the digital_impl_snr_est_m2m4, as an example of a non-PSK and non-AWGN method; right now, this block is unavailable through any actual GNU Radio block. It's untested and unverified, but I wanted it there for reference and hopefully to use later.


SNR Use in Demodulation

The main intent of having an SNR estimator block is to enable the use of SNR information by other blocks. As such, there are two GNU Radio blocks that are defined for doing this in different ways. First off, let's say that the SNR estimation is done after timing recovery (see harris' paper "Let’s Assume the System is Synchronized", which is unfortunately fairly costly but worth it if you can get a copy). So in GNU Radio terms, this means that the SNR is estimated down stream of most of the synchronization blocks. While I would like to just pass a tag along with the SNR information, that does won't work for every block, like the frequency recovery, AGC, and timing recovery loops that come before. Instead, we will have to pass them a message with this information. However, some blocks exist downstream that want this info, too, like the channel equalizer.

To accommodate both possible uses, I created two blocks. The digital_mpsk_snr_est_cc block is a flowgraph with a single input and single output port, so it's meant to go inline in a flowgraph. This block produces tags with the SNR every N samples, where N is set by the user (the second arg in the constructor and through the set_tag_nsamples(int N) function). The downstream blocks can then look for the tag with the key "snr" and pull out this information when they need it. The value of N defaults to 10,000 as an arbitrary, fairly large number. You'll want to set this depending on the speed you expect the SNR conditions to change.

The second block is a probe, which is GR speak for a sink. It's called digital_probe_mpsk_snr_est_c and only takes a single complex input stream. Right now, it just acts as a sink where the application running the flow graph can query the SNR by calling the "snr()" function on this block (the same is true for the digital_mpsk_snr_est_cc block, too). However, this block uses a similar constructor in that you set a value N for the number of samples between messages. In this case, instead of sending a tag down stream, it will send a message every N samples. The problem with this is that our message passing system isn't really advanced or easy enough to use to set this up properly. Recent work by Josh Blum might fix this, though.

Eventually, though, we hope to be able to create flow graphs where the SNR estimation is passed around to other blocks to allow them to adjust their behavior. In the case I'm interested in right now, I'd like to pass this info to the frequency lock loop to stop it if the SNR falls below a certain level so that it doesn't walk away when there is no signal to acquire.


[1]  D. R. Pauluzzi and N. C. Beaulieu, "A comparison of SNR estimation techniques for the AWGN channel," IEEE Trans. Communications, Vol. 48, No. 10, pp. 1681-1691, 2000.


Page 1 ... 2 3 4 5 6 ... 8 Next 5 Entries »