The PRU is a fantastic bit of hardware. With two of these running at 200MHz, and direct register-mapped access to GPIO pins, there's a lot of cool stuff you can do with them.
For example, in a recent project, I used one of the PRUs to generate a precise 40MHz square wave clock signal with 40% duty cycle, and the other to read the signal pin of a camera module into shared RAM. It worked extremely well, allowing me to obtain camera data at hundreds of FPS, and freed up the main CPU to do some fairly heavy image processing - all without involving an expensive camera capture rig or an external PC.
It's for an academic project which is ongoing (the camera control is just one part of the system). When it's done, I'd be happy to share source (and indeed we may simply open-source everything related to the project to accompany the publication).
BeagleBone's PRUs are an enabler of Bela, "the embedded platform for ultra-low latency audio and sensor processing" which was ~1000% funded on KickStarter earlier this year.
Bela is being used at the Augmented (Musical) Instruments Lab in London (and now the community) to make rich, responsive digital musical instruments. It has a Web IDE, supports C++, Pure Data, Faust, SuperCollider, etc., but again thanks to the BB's PRU's, it supports audio-rate sensor sampling!
What a beautiful project. I started to hack around for a guitar amp emulator on the bb in 2014, but it quickly became too involved. Thanks for sharing.
The PRU is pretty cool. Just had to simulate a dozen eCAP devices and its just a few lines of code. An eCAP records the period (in ticks) of a signal on an input pin. It can be triggered on a rising or falling signal, and can record absolute ticks or the number of ticks since the last trigger (differential mode). Here's the main loop:
while (1)
{
uint32_t sample = __R31;
uint32_t change = sample ^ pEcap->sample;
if (change)
{
// Calculate which bits have seen a desired edge (it changed, and we trigger on that change)
uint32_t edge = change & trigger;
if (edge)
{
uint32_t ts = ReadTimestamp();
int bit = 1;
int iBit = 0;
while (iBit < 30)
{
if (edge & bit)
{
// Store the ts (or the difference in ts) into the capture table for this bit
// and increment the slot index (aICap)
pEcap->ECAP[iBit][pEcap->aICap[iBit]++] = (pEcap->differential & bit)? (ts - tsLast[iBit]) :ts;
pEcap->aICap[iBit] %= 4;
// Next time we calculate the difference from this time
tsLast[iBit] = ts;
}
iBit++;
bit <<= 1;
}
pEcap->edgeDetected |= edge;
}
pEcap->sample = sample;
trigger = sample ^ pEcap->edgeUpDown;
}
}
It detects an edge within a few dozen nanoseconds (low jitter). A while loop like this would kill a main processor thread; and it would have terrible latency when other threads were scheduled during a trigger event.
And it detects edges on 30 pins in parallel! I could work on the "which pin had a trigger" code to reduce the period calculation from X30 to log(30) but I have no need for that fine latency in my current application.
I actually had no idea that the Beaglebone had these guys hiding onboard. A nice alternative to something like a RasPi with an Arduino hat or a USB-connected Arduino.
This is great for projects where you control the whole stack and don't plan to support anything else. I guess the only downside to going down this path is that you're locking in the Beaglebone Black as your sole hardware platform and losing some modularity.
For example it would be great to control a 3D printer by running something like OpenGB (http://opengb.readthedocs.io/) on the main CPU and something like Marlin (https://github.com/MarlinFirmware) on a PRU. But that would require a Beaglebone-centric approach which wouldn't work on other hardware combinations.
That's just an observation though - I'm really impressed that the PRUs are there!
So, structure your project that you can either use the internal PRU or an external motor controller… I don't see the big problem.
I'd say that most embedded related projects will have to deal with newer revisions of their hardware, maybe because an older part is no longer available or because people came up with more intelligent or less buggy circuits over time. So that's already a few (prob. very minor) variations you'll have to support.
And besides very trivial projects, you should always try to have at least one "dummy" implementation of everything, to facilitate automated testing of your code.
So, in your case, yout 3D printer controller could support the internal PRU of the Sitara, an external servo controller, or some dummy library that just logs positions to a file, for testing.
On reflection - you're absolutely right. In fact this is exactly the way OpenGB is designed!
OpenGB uses an abstract base class (called IPrinter) to describe a printer interface. At the moment there exists a Marlin implementation and a Dummy implementation (as you describe) of IPrinter.
Other comments mention BeagleG and MachineKit. It should be pretty trivial to add IPrinter implementations of both of these.
A friend of mine made a G-code interpreter for the Beaglebone. It opens up a circular buffer and passes commands to a small program running on the PRU: https://github.com/hzeller/beagleg
I'm reading the MachineKit docs and I kindof get how it works. From the gcode docs (http://www.machinekit.io/docs/fdm/fdm-gcode/) it seems like it should be reasonably simple to add support for it to OpenGB.
Modern SoCs are a circus of various CPU cores, and there's a huge amount of work going on at the companies that design them to make them work together. Even for those SoCs that run open-source software, many of those secondary cores aren't visible to people outside the company creating the SoC. They're hidden behind binary blob "firmware". One of the systems I worked on over 5 years ago had ARM Cortex-A7s running the main OS (could be Linux), talking to a proprietary DSP running some in-house RTOS, talking to an 6502... The systems I'm currently working on have way more and more modern cores than that. As a user, or even as a developper, you wouldn't know.
Kudos to Ken for lifting the curtain a bit on the Sitara's PRUs!
It's a shame there isn't more developer availability of these cores. I'm kind of shocked that the way to program these is still just through assembly, but really not that surprised.
There's actually a cortex-m3 core also hiding in the Beaglebone's chip, doing power management.
Regarding C vs asm, I started writing some 8mhz 4-bit capture code in C but found asm was easier to reason about timings. One line of code is 5ns, done. (apart from some memory access).
They're really limited in terms of instruction set, making getting a compiler working tough. Programs are generally short and not too tricky to program with asm
Wow, a lot more PRU interest than I expected. Thanks everyone for the comments! Looking online, I got the impression that hardly anyone was using the PRU. Are there any forums I should visit where PRU programmers hang out?
I think the main thing to think about PRUs is that they are the way to do a class of things in software that you'd normally break out some custom logic or an fpga to implement. Sure they're hard to initially set up for but way less effort than breaking out the PCB design software
There is also a C compiler available, I haven't used it, but Tridge (of Samba fame) gave a demo at linuxconf a couple of years back in which he launched a plane (remotely) and ran the guidance software on a BB, with the PRUs doing all the servo/etc work coded in C ..... and compiled the linux kernel onboard at the same time ....
1) Read the fine print when it comes to processor manufacturers telling how much time something takes. Although PRU subsystem is deterministic and __most__ instructions take 5ns, there are quite a few cases which take an order of magnitude more time[1]. Sure, the access times might be deterministic but that doesn't make it easy to know how much something will take.
2) Remote-controlled airborne vehicles need a lot less computation power than I expected. PX4 runs it's "main loop" at just 400Hz and I've seen PX4 or ArduPilot devs (might even be tridge) saying that 50Hz would be enough. Sure, you need accurate timing for PWM outputs (~1us resolution) and most important: low jitter.
The first point has bitten me personally- I found it non-trivial to get reliable < 100ns interrupt jitter on Cortex-M4. It really got down to what was happening on the bus between CPU/memory/peripherals at the point interrupt was supposed to fire (e.g. getting the documented latency of 10-odd cycles when CPU is idle but a lot more when there is a DMA transfer in progress).
The comparison to breaking out the PCB software is a touch misleading. You'd certainly also find it to be an onerous task to break out the PCB software and design a board for the Sitara SoC that sits on the Beaglebone. Realistically the comparison would be to use an FPGA devboard. Modern FPGAs often have an ARM processor build right into them, and if not you can use a Microblaze/NIOSII softcore processor in conjunction with the programmable logic. Such a setup will happily run linux, do your real-time tasks, and use all the typical peripherals that one of TIs soc's will.
I'm not saying that using an FPGA would be easier necessarily--indeed, likely not just because the tools a so terrible--but there are many options. Last I checked the documentation for the PRUs was pretty bad. Mostly a smattering of wiki pages and a couple powerpoints. And even that is a huge improvement over even just a year or two ago.
> I think the main thing to think about PRUs is that they are the way to do a class of things in software that you'd normally break out some custom logic or an fpga to implement.
This article makes repeated references like the following:
> If you want to perform real-time operations, the BeagleBone's ARM processor won't work well since Linux isn't a real-time operating system.
I get that Linux is not a real time system, but the article seems to be making the implication that the ARM processor cannot support real time operations. I was under the impression that the real-ness of time was determined solely by the operating system and not by the hardware. Is this not the case, or is the article just making the assumption that no one is going to port a real time operating system to the BeagleBone?
I'm sure there is an real-time version of Linux that would work on the ARM, but these systems still have worst-case scheduling latencies of around 100us, and around 10us in just normal operation.
So they are perfectly fine for some subset of real-time tasks, where you just need some sort of guarantee like "this thread will not be starved of CPU for 10 milliseconds". But that kind of latency and jitter isn't going to work out when you want to generate something like a fast PWM signal or send out audio over I2S in software or generally do any task that relies on reproducible (fast) timings. You could connect a servo to your real-time Linux generated PWM signal and it would tremble like it's drunk.
And honestly, modern beefy ARM cores like the BeagleBone uses have been tuned for throughput above everything else that you will find it extremely difficult to get anywhere near the timing that a PRU can do and still have anything resembling a useful system.
There's a nifty presentation from TI [1] where they have the ARM core do nothing but toggle one of it's GPIOs and it takes 200ns for that to go through all the various layers of caches, pin muxing, buses, interconnects, what have you. And that's a GHz processor!
Amusingly enough, performance enhancing features like cache or a flexible interrupt controller get in the way of a system's "real-timeness", because the timing of operations is no longer clearly deterministic. That's why ARM has cores like the Cortex-R series (R as in Realtime) that do promise real-time characteristics. The Sitara in the Beagleboard has a Cortex-A (A as in Application), which is not designed for realtime.
Just simply having interrupts will mess with a hard real time processes. Total sketchtastic, some modern processors have background stuff that takes control every so often.
AFAIK you can disable (or rather not enable) the cache(s) and use a simple interrupt scheme in any standard ARM core (at least it's possible on the Raspberry Pi).
The PRU is the unsung gem of the BeagleBone. I used it to prototype an optical control system that needed sub-100ns timing. As the article hints, it's not an easy beast to get started with, but it offers crazy realtime abilities.
The CPU loaded a bytecode program into the PRU's memory and told it to run.
The PRU ran a bytecode execution engine that was handcoded in assembly. The core of the engine was a loop:
- perform the routine for the current bytecode
- when the routine finished, spin on the PRU's embedded timer (the IEP) until 256 ticks (1.28us) had passed since the start of the cycle
- advanced to the next bytecode and repeated
External devices were connected either via I2C (those that didn't need super-tight timing) or to the pins controlled by the PRU. Some of the bytecodes controlled execution flow like loops while others were specific to the application, things like turn light on, move lens, trigger camera, etc.
There were a few more bells and whistles, like memory locations where flags were set by the engine to let the CPU know how progress was going.
It was a fun project. Unfortunately, the manufacturer of one of the core devices decided to stop making it, so that start-up had to go in a different direction.
For example, in a recent project, I used one of the PRUs to generate a precise 40MHz square wave clock signal with 40% duty cycle, and the other to read the signal pin of a camera module into shared RAM. It worked extremely well, allowing me to obtain camera data at hundreds of FPS, and freed up the main CPU to do some fairly heavy image processing - all without involving an expensive camera capture rig or an external PC.