Subject: DSP Trick: Host to C40 memory read/writes with (almost) zero DSP overhead Date: Thu, 13 May 1999 13:50:46 +0000 From: Ran Cabell <rcabell@norfolk.infi.net> Newsgroups: comp.dsp THIS WORK IS PLACED IN THE PUBLIC DOMAIN
Name: Reading/writing to C40 memory from the host computer with (almost) zero DSP overhead.
Category: DSP Chip/ Specific Instruction Set/ General Software trick
Application: For DSP development boards hosted in another computer (such as a PC), if communication between the host and the C40 takes place via one of the C40 comm ports (as with the Loughborough Sound Images QPCB and QPCS boards, and the HEPC3/HEPC4 boards from Hunt Engineering), this trick lets you read and write to anywhere in the memory space of the DSP using only the DMA coprocessor. The only code executed in the CPU of the DSP is a very small setup routine that is called once and then never called again.
Advantages: Who needs shared memory? Code running on a host can monitor and change any value in DSP memory with no intervention from the CPU on the DSP. The only overhead on the DSP is due to bus conflicts between the CPU and the DMA coprocessor. The small setup routine is the only code executed on the DSP.
Disadvantages: Code running on the host can change any value in DSP memory with no intervention from the CPU on the DSP, which can obviously be dangerous. All communication is initiated by the host.
Introduction: This trick uses the wonderful autoinitization capabilities of the C40 DMA coprocessor. Two 16-word link tables are setup in the DSP memory. The first table always copies initialization data from the comm port to the second link table, and then transfers control to the second link table. The second link table transfers data to/from the DSP memory from/to the comm port, depending on what values were copied in by the first link table. Once the second data transfer is finished, control is returned to the first link table, which waits for the next block of initialization data to come from the host. Control continually cycles between these two link tables.
The Trick: This trick involves code on both the DSP and the host.
First, the DSP side of things. The following function, "init_dma", is the only function that needs to be called on the DSP; it sets up the two DMA link tables and then starts the DMA coprocessor, which continually cycles between the two link tables.
I use structures and macros in this code snippet that are defined in the Texas Instruments parallel runtime support library (prtslib); their purpose should be obvious. The C40 comm port reserved for host communications is referenced by the variable "MY_CP", which has a value from 0 to 5. The first dma link table, which copies six values from the comm port to the second link table, is called dma_setup. The second link table, which does the actual data transfer to/from the DSP memory, is called dma_xfer. The dma_xfer link table is initialized here with all zeros, except that its link pointer is set to point back to the dma_setup link table.
=================================================
/*
* Reserve space in memory for two dma control registers.
* Ideally these registers should be placed as low in the memory map
* as possible. This should ensure
* that in most situations the dma data transfers will work, even
* if a stack overrun occurs.
*/
asm("_dmaregs .usect "".dmaregs"", 32" );
extern DMA_REG dmaregs;
#define MY_CP 1
void
init_dma( void )
{
COMPORT_REG *cp_ptr = COMPORT_ADDR( MY_CP );
DMA_REG *dma_ptr = DMA_ADDR( MY_CP );
DMA_CONTROL autoInit;
DMA_REG *dma_setup = &dmaregs;
DMA_REG *dma_xfer;
/* Pointer arithmetic... */
dma_xfer = dma_setup + 1;
/* Initialize the dma control word for dma_setup */
autoInit._intval = 0x0;
autoInit._bitval.dma_pri = 1; /* rotating cpu/dma priority */
autoInit._bitval.transfer = 2; /* autoinit when xfer counter = 0 */
autoInit._bitval.sync = 1; /* source interrupt synch */
autoInit._bitval.start = 3; /* DMA start */
/* Setup link table to xfer packet from cp to dma_xfer struct */
set_dma_auto( dma_setup, /* this link table */
(long)autoInit._intval, /* control word */
(void*)(&cp_ptr->in_port), /* source */
0, /* source increment */
6, /* transfer count */
dma_xfer, /* destination */
1, /* destination inc */
dma_xfer ); /* next link table */
/* Setup skeleton link table */
set_dma_auto( dma_xfer, /* this link table */
0, /* control word */
0, /* source */
0, /* source increment */
0, /* transfer count */
0, /* destination */
0, /* destination inc */
dma_setup ); /* next link table */
/* setup dma to do dma_setup transfer first */
DMA_RESET( MY_CP );
dma_ptr->dma_regs.count = 0;
dma_ptr->_gctrl._intval = 0x9; /* autoInit without start bits */
dma_ptr->dma_link = (unsigned long*)&dma_setup;
/* start dma so it loads in dma_setup table */
DMA_RESTART( MY_CP );
/* Enable dma interrupts from comm port corresponding to MY_CP */
asm(" or 10h, die"); /* read interrupt */
asm(" or 40h, die"); /* write interrupts */
}
Note the rotating cpu/dma priority, which may or may not be suitable for your application. Also note that you want the transfers to and from the comm port to be interrupt synchronized, or else they will halt the peripheral bus on the C40 while either waiting for data from the host or waiting for the host to retrieve data.
On the host side, I use library routines that hide the mechanics of this data transfer from the user. For example, suppose the user wishes to read "number" integers from the DSP at the address "address", and store them in a buffer called "buffer". They call the following function:
#define AUTO_WRITE_SYNC 0xc00089
#define CP_ONE_OUT_PORT 0x100052
int read_int( unsigned int address,
int *buffer,
int number )
{
int ret;
struct DMA_REGSET dma_read;
/* First send down six values to fill in */
/* the dma_xfer structure on the dsp */
dma_read.gctrl = (unsigned int)AUTO_WRITE_SYNC;
dma_read.src = address;
dma_read.src_index = (unsigned int)1;
dma_read.count = number;
dma_read.dest = (unsigned int)CP_ONE_OUT_PORT;
dma_read.dest_index = 0;
ret = write( DSP_DEVICE, &dma_read, sizeof(dma_read) );
if( ret != sizeof(dma_read))
return 0;
return (read( DSP_DEVICE, buffer, number*sizeof(int) )/sizeof(int));
}
where the read(DSP_DEVICE...) and write(DSP_DEVICE...) functions represent the read and write functions you use to transfer data to and from the DSP.
The write_int function is almost identical:
#define AUTO_READ_SYNC 0xc00049
#define CP_ONE_IN_PORT 0x100051
int
write_int( unsigned int address,
int *buffer,
int number )
{
int ret;
struct DMA_REGSET dma_write;
/* First send down six values to fill in */
/* the dma_xfer structure on the dsp */
dma_write.gctrl = (unsigned int)AUTO_READ_SYNC;
dma_write.src = (unsigned int)CP_ONE_IN_PORT;
dma_write.src_index = (unsigned int)0;
dma_write.count = number;
dma_write.dest = address;
dma_write.dest_index = 1;
ret = write( DSP_DEVICE, &dma_write, sizeof(dma_write) );
if( ret != sizeof(dma_write))
return 0;
return (write( DSP_DEVICE, buffer, number*sizeof(int) )/sizeof(int));
}
It should be pretty obvious how to implement the read_float and write_float functions.
In order to determine the address of a variable on the DSP, you can either parse the map file or the out file. Since this is a common task, my library contains a routine called "get_address" which returns the 32-bit location in DSP memory of a given variable.
I've found it very useful to create an interface to the write_int, read_int, etc. routines in a MATLAB mex-file. This lets you create MATLAB routines such as "putdsp(x, 'is')" which sends the MATLAB variable 'x' to the dsp as an integer scalar (assuming there is an integer variable 'x' on the DSP). The putdsp routine automatically finds the address of 'x' on the dsp, and determines how many values to send based on the size of the MATLAB variable 'x'.
Ran Cabell