DSP processors are in general very I/O balanced processors. This means
they offer a variety of high speed serial and parallel peripheral
interfaces. These interfaces are ideally designed in a way, that they
can be operated with very low or none overhead impact to the processor
core, leaving enough CPU time for running the OS and processing the
incoming or outgoing data.
A Blackin Processor as an example has multiple, fexible and
independent Direct Memory Access (DMA) controllers. DMA transfers can
occur between the processor’s internal memories and any of its
DMA-capable peripherals. Additionally, DMA transfers can be performed
between any of the DMA-capable peripherals and external devices
connected to the external memory interfaces, including the SDRAM
controller and the asynchronous memory controller.
The Blackfin processor provides besides other interfaces a Parallel
Peripheral Interface (PPI) that can connect directly to parallel D/A
and A/D converters, ITU-R-601/656 video encoders and decoders, and
other general-purpose peripherals, such as CMOS camera sensors. The PPI
consists of a dedicated input clock pin, up to 3 frame synchronization
pins, and up to 16 data pins.
Figure 1 below is an example of
how easily a CMOS imaging sensor can be wired to a Blackfin Processor,
without the need of additional active hardware components.
|
Figure
1: Micron CMOS Camera Sensor wiring diagram |
Below is example code for a
simple program that reads from a CMOS Camera Sensor, assuming a PPI
driver is compiled into the kernel or loaded as a kernel module. There
are two different PPI drivers available, a generic full featured
driver, supporting various PPI operation modes (ppi.c), and a simple
PPI Frame Capture Driver (adsp-ppifcd.c). Latter is here used.
The application opens the PPI device driver, performs some I/O
controls (ioctls), setting the number of pixels per line and the number
of lines to be captured. After the application invokes the read system
call, the driver arms the DMA transfer. The start of a new frame is
detected by the PPI peripheral, by monitoring the Line- and Frame-Valid
strobes.
A special correlation between the two signals indicates the start of
frame, and kicks-off the DMA transfer, capturing pixels per line times
lines samples. The DMA engine stores the incoming samples at the
address allocated by the application. After the transfer is finished,
execution returns to the application.
The p_w_picpath is then converted into the PNG (Portable Network Graphic)
format, utilizing libpng included in the uClinux distribution. The
converted p_w_picpath is then written to stdout. Assuming the compiled
program executable is called readimg, a command line to execute the
program, writing the converted output p_w_picpath to a file, can look like
following:
Audio, Video and Still Image Silicon Products widely use a I2C
compatible Two Wire Interface (TWI) as a system configuration bus. The
configuration bus allows a system master to gain access over device
internal configuration registers, such as brightness. Usually, I2C
devices are controlled by a kernel driver. But it is also possible to
access all devices on an adapter from user space, through the /dev
interface. Following example shows how to write a value of 0x248 into
register 9 of a I2C slave device identified by
I2C_DEVID:
The power of Linux is the inexhaustible number of applications released
under various open source licenses that can be cross compiled to run on
the embedded uClinux system. Cross compiling can be sometimes a little
bit tricky, that’s why it’s discussed here.
Cross compiling
Linux or UNIX is not a single platform, there is a wide range of
choices. Most programs distributed as source code are coming with a
so-called 'configure' script. This is a shell script that must be run
to recognize the current system configuration, so that the correct
compiler switches, library paths and tools will be used.
When there isn’t a configure script, the developer can manually
modify the Makefile to add target processor specific changes, or can
integrate it into the uClinux distribution. Detailed instructions can
be found here [18]. The configure script is usually a big script, and
it takes quite a while to execute. When this script is created from
recent autoconf releases, it will work for Blackfin/uClinux with minor
or none modifications.
The configure shell script inside a source package, can be executed
for cross compilation using following command line:
CC='bfin-uclinux-gcc –O2 -Wl,-elf2flt' ./configure --host=bfin-uclinux
--build=i686-linux
Alternatively:
./configure --host=bfin-uclinux --build=i686-linux
LDFLAGS='-Wl,-elf2flt' CFLAGS=-O2
There are at least two events that are able to stop the running
script: (1) some of the files
used by the script are too old or (2)
there are missing tools or libraries. If the supplied scripts are too
old to execute properly for bfin-uclinux, or they don't recognize
bfin-uclinux as a possible target. The developer need to replace
config.sub with more recent version form (e.g. a up to date gcc source
directory). Only in very few cases cross compiling is not supported by
the configure.in script manually written by the author and used by
autoconf. In this case latter file can be modified to remove or change
the failing test case.
Network Oscilloscope Demo
The Network Oscilloscope Demo shown in Figure
2 below is one of the sample applications, besides the VoIP
Linphone Application or the Networked Audio Player, included in the
Blackfin/uClinux distribution. Purpose of the Network Oscilloscope
Project is to demonstrates a simple remote GUI (Graphical User
Interface) mechanism to share access and data, distributed over a
TCP/IP network. Furthermore it demonstrates the integration of several
open source projects and libraries as building blocks into single
application.
For instance gnuplot, a portable command-line driven interactive
data file and function plotting utility, is used to generate graphical
data plots, while thttpd a CGI (Common Gateway Interface) capable web
server is servicing incoming HTTP requests. CGI is typically used to
generate dynamic webpages. It's a simple protocol to communicate
between web forms and a specified program. A CGI script can be written
in any language, including C/C++ ,that can read stdin, write to stdout,
and read environment variables.
The Network Oscilloscope works as following. A remote web browser
contacts the HTTP server running on uClinux where the CGI script
resides, and asks it to run the program. Parameters from the HTML form
such as sample frequency, trigger settings and displaying options are
passed to the program through the environment. The called program
samples data from a externally connected Analog to Digital Converter
(ADC) using a Linux device driver (adsp-spiadc.c).
Incoming samples are preprocessed and stored in a file. The CGI
program then starts gnuplot as a process and requests to generate a PNG
or JPEG p_w_picpath based on the sampled data and form settings. The
webserver takes the output of the CGI program and tunnels it through to
the web browser. The web browser displays the output as an HTML page,
including the generated p_w_picpath plot.
|
Figure
2 |
Real-time capabilities of uClinux
Since Linux was originally developed for server and desktop usage, it
has no hard real-time capabilities like most other operating systems of
comparable complexity and size. Nevertheless, Linux—and in particular,
uClinux—has excellent so-called “soft real-time” capabilities. This
means that while Linux or uClinux cannot guarantee certain interrupt or
scheduler latency compared with other operating systems of similar
complexity, they show very favorable performance characteristics. If
one needs a so-called “hard real-time” system that can guarantee
scheduler or interrupt latency time, there are a few ways to achieve
such a goal:
1) Provide the real-time
capabilities in the form of an underlying minimal real-time kernel such
as RT-Linux (
http://www.rtlinux.org)
or RTAI (http://www.rtai.org). Both solutions use a small real-time
kernel that runs Linux as a real-time task with lower priority.
Programs that need predictable real time are designed to run on the
real-time kernel and are specially coded to do so. All other tasks and
services run on top of the Linux kernel and can utilize everything that
Linux can provide. This approach can guarantee deterministic interrupt
latency while preserving the flexibility that Linux provides.
2) Provide the real-time
capabilities using Xenomai [19]. Xenomai is a real-time development
framework cooperating with the Linux kernel, in order to provide a
pervasive, interface-agnostic, hard real-time support to user-space
applications, seamlessly integrated into the GNU/Linux environment. It
is based on an abstract RTOS core, usable for building any kind of
real-time interfaces, over a nucleus which exports a set of generic
RTOS services. Any number of RTOS personalities called "skins" can then
be built over the nucleus, providing their own specific interface to
the applications, by using the services of a single generic core to
implement it. Aside of its own native and POSIX interfaces, Xenomai
also provides emulators for the VxWorks, VRTX, pSOS+ and uITRON
personalities. People interested in learning more about this project
can refer to the on-line documentation [21].
For the initial Blackfin port, included in Xenomai v2.1 [20], the
worst-case scheduling latency observed so far with user-space Xenomai
threads on a Blackfin BF533 is slightly lower than 50 us under load,
with an expected margin of improvement of 10-20 us, in the future.
Xenomai and RTAI use Adeos [22] as a underlying Hardware Abstraction
Layer (HAL). Adeos is a real-time enabler for the Linux kernel. To this
end, it enables multiple prioritized O/S domains to exist
simultaneously on the same hardware, connected through an interrupt
pipeline.
Xenomai as well as Adeos has been ported to the Blackfin
architecture by Philippe Gerum who leads both projects. This
development has been significantly sponsored by Openwide, a specialist
in embedded and real-time solutions for Linux [23].
Nevertheless in most cases, hard real time is not needed,
particularly for consumer multimedia applications, in which the time
constraints are dictated by the abilities of the user to recognize
glitches in audio and video. Those physically detectable constraints
that have to be met normally lie in the area of milliseconds—which is
no big problem on fast chips like the Blackfin Processor. In Linux
kernel 2.6.x, the new stable kernel release, those qualities have even
been improved with the introduction of the new O(1) scheduler.
Figures 3 and 4
below show the context switch time for a default Linux 2.6.x
kernel running on Blackfin/uClinux:
|
Figure
3 |
|
Figure4 |
Context Switch time was measured with lat_ctx from lmbench. The
processes are connected in a ring of Unix pipes. Each process reads a
token from its pipe, possibly does some work, and then writes the token
to the next process. As number of processes increases, effect of cache
is less. For 10 processes the average context switch time is 16.2us,
and with a standard deviation of .58, 95% of time, is under 17us.
Comclusion
Blackfin Processors offer a good price performance ratio (800 MMAC @
400 MHz for less than $5/unit in quantities), advanced power management
functions, and small mini-BGA packages. This represents a very low
power, cost and space-efficient solution. The Blackfin’s advanced DSP
and multimedia capabilities qualify it not only for audio and video
appliances, but also for all kinds of industrial, automotive, and
communication devices.
Development tools are well tested, documented and include everything
necessary to get started and successfully finished in-time. Another
advantage of the Blackfin Processor in combination with uClinux is the
availability of a wide range of applications, drivers, libraries and
protocols, often as open source or free software. In most cases, there
is only basic cross compilation necessary to get that software up and
running.
Combine this with such invaluable tools as Perl, Python, MySQL and
PHP, and developers have the opportunity to develop even the most
demanding feature-rich applications in a very short time frame, often
with enough processing power left for future improvements and new
features.
Since
obtaining his MSc (Computer Based Engineering) and Dipl-Ing.(FH)
(Electronics and Information Technologies) Degree from the Reutlingen
University, Michael Hennerich has worked as a design engineer on a
variety of DSP based applications. Michael now works as a DSP
Applications and Systems Engineer at Analog
Devices Inc. in Munich. This article is excerpted from a
paper of the same name presented at the Embedded Systems Conference
Silicon Valley 2006. Used with permission of the Embedded Systems
Conference. For more information, please visit www.embedded.com/esc/sv.