CY3014 USB3.0控制器

与FPGA高速通信

Posted by Shane on August 2, 2019

[TOC]

USB3.0

USB2.0的最大理论传输带宽为480Mb/s(即60MB/s),USB3.0的最大传输带宽则高达5.0Gb/s(500MB/s)。

注意:5Gb/s的带宽并不是5Gb/s除以8得到的625MB/s,而是采用与SATA相同的10位传输模式(在USB2.0的基础上新增了一对纠错码),因此器全速只有500MB/s。

USB3.0引入全双工数据传输,5根线路中两根用来发送数据,另两根用来接收数据,还有一根地线。换句话说,USB3.0可以同步全速地进行读/写操作。以前的USB版本不支持全双工数据传输。

USB3.0控制器FX3电路

USB3.0控制器FX3与FPGA之间通过SlaveFIFO接口互联,实现大吞吐量数据传输。

fx3_d[31:0]

fx3_a[1:0] (CTL[11]/A CTL[12]/A0)

fx3_clk (PCLK)

fx3_slcs# (CTL[0]/SLCS#)

fx3_slwr# (CTL[1]/SLWR#)

fx3_sloe# (CTL[2]/SLOE#)

fx3_rd# (CTL[3]/SLRD#)

fx3_pktend# (CTL[7]/PKTEND#)

fx3_flaga (CTL[4]/FLAGA)

fx3_flagb (CTL[5]/FLAGB)

fx3_flagc (CTL[8]/GPIO)

fx3_flagd (GTL[9]/GPIO)

复位引脚接上电复位和按键–手动复位。

启动模式:

PMODE[0]

PMODE[1]

PMODE[2]

PMODE[2:0]=F1F I2C启动,如失败,则启用USB引导

PMODE[2:-]=F11 USB引导

USB3.0控制器FX3实例

基于FX3内部DMA的USB传输Loopback实例

Cypress官方提供的固件代码,在 ..\Cypress\EZ-USB FX3 SDK\1.3\fireware\basic_examples\cyfxbulklpautoenum

通过FX3的一对USB Bulk端点实现Loopback的功能。所谓Loopback,通俗地说,就是FX3接收到任何数据,就将其发送出去;从PC端的调试软件看,就是PC给FX3传输什么数据,紧接着就接收到相同的返回数据。

这个Loopback功能的实现过程中,FX3内的ARM9是不参与数据本身的任何传输的,固件配置好后,USB端点之间通过DMA自动实现数据的传输。

基于FX3的UVC传输协议实例

FX3固件SlaveFIFO配置修改说明

功能概述

通过FX3与FPGA之家你的GPIF II接口通信,实现FPGA与FX3之间的数据交互,当然这些数据最终也传输到PC上。换句话说,通过FX3这个“桥梁”,实现FPGA与PC之间USB3.0接口的数据传输。

可以通过Cpyress官方提供的GPIF II Designer工具,获取GPIF II接口的配置参数,然后将这些参数传递给FX3的固件工程中进行编译。

基于FPGA-FX3 Slave FIFO接口的Loopback实例

实现PC端发送数据到FX3,FX3通过知识信号flaga告知FPGA有数据待读取,FPGA端通过SlaveFIFO接口读取PC端发送过来的数据缓存到FPGA内部的FIFO中,FPGA在完成读取操作后,发器一次SlaveFIFO的写入操作,将接收到的数据通过FX3最终返回到PC端。整个数据的收发过程可以在FPGA内部通过在线逻辑分析仪SignalTap II抓取SlaveFIFO接口的所有信号进行查看。

基于FPGA-FX3 Slave FIFO接口的StreamOUT实例

功能概述:

StreamOUT主要功能是PC端发送批量数据到FX3,FX3通过指示信号flaga告知FPGA有数据待读,然后FPGA端通过SlaveFIFO接口读取PC端发送过来的数据缓存到FPGA内部的FIFO中。整个数据的收发过程,在FPGA内部可以通过在线逻辑分析仪SignalTap II抓取SlaveFIFO接口的所有信号进行查看。

【PC】 <==USB3.0==> 【FX3】 <==Slave FIFO==> 【FPGA】

usb_controller模块是SlaveFIFO及其相关功能实现的主要模块。FX3 读/写状态机一旦检测到FX3的SlaveFIFO有可读取的数据,就进入FX3数据读取状态,读取SlaveFIFO中所有的数据,缓存到片内RAM中。

FX3读写状态机简述,上电状态为 FXS_REST,随后就进入 FXS_IDLE状态,判断SlaveFIFO是否有可读取数据,如果有则进入 FXS_READ状态读出FX3的SlaveFIFO中所有的数据,完成后在FXS_RDLY状态稍作延时,接着进入FXS_RSOP状态停留一个时钟周期,最后回到FXS_IDLE状态。

基于FPGA-FX3 Slave FIFO 接口的StreamIN实例

StreamIN主要功能是FPGA端产生批量数据,通过SlaveFIFO接口发送到FX3,直到FX3的多个FIFO通道写满。由于FX3是USB的从机,作为USB主机的PC需要发器读取FX3的待发送数据帧操作,一旦FX3有FIFO空出来,FPGA就再写入新的数据帧。

上电状态为FXS_REST,随后就进入FXS_IDLE状态,判断SlaveFIFO是否为空可以写入数据,如果可以则进入FXS_WRIT状态,写数据到FX3的SlaveFIFO中,接着进入FXS_WSOP状态停留一个时钟周期,最后回到FXS_IDLE状态。

USB_CONTROLLER.v

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
module usb_controller
(
    input	wire	[ 0: 0]	clk			,	//100MHz? 200MHz? 300MHz? 400MHz?
    input	wire	[ 0: 0]	rst_n		,	
    input	wire	[ 0: 0]	fx3_flaga	,	//slave fifo write full when addr 00
    input	wire	[ 0: 0]	fx3_flagb	,	//slave fifo almost write full when addr 00, 6 byte data can be written after this negative
    input	wire	[ 0: 0]	fx3_flagc	,	//slave fifo read empty when addr 11
    input	wire	[ 0: 0]	fx3_flagd	,	//slave fifo read empty almost when addr 11
    output	reg		[ 0: 0]	fx3_pclk	,	//slave fifo sync clock
    output	reg		[ 0: 0]	fx3_slcs_n	,	//slave fifo chip select
    output	reg		[ 0: 0]	fx3_slwr_n	,	//slave fifo write enable
    output	reg		[ 0: 0]	fx3_slrd_n	,	//slave fifo read enable
    output	reg		[ 0: 0]	fx3_sloe_n	,	//slave fifo output enable
    output	reg		[ 0: 0]	fx3_pktend_n,	//package end
    output	reg		[ 1: 0]	fx3_a		,	//
    inout	wire	[31: 0]	fx3_db		,
);
    wire	[ 9: 0]	fifo_used;	//fifo已经使用数据个数
    reg		[ 0: 0]	fifo_rdreq;	//fifo读请求信号,高电平有效
    reg		[ 0: 0]	fx3_dir;	//FX3读写方向指示信号,1--read,0--write
    reg		[ 9: 0]	num;		//数据寄存器
    reg		[ 3: 0]	delaycnt;	//延时计数寄存器
    reg		[ 3: 0]	fxstate;	//状态寄存器
    parameter	FXS_REST	=	4'd0;
    parameter	FXS_IDLE	=	4'd1;
    parameter	FXS_READ	=	4'd2;
    parameter	FXS_RDLY	=	4'd3;
    parameter	FXS_RSOP	=	4'd4;
    parameter	FXS_WRIE	=	4'd5;
    parameter	FXS_WSOP	=	4'd6;
    
    always @ (posedge clk or negedge rst_n)
        if(!rst_n)
            fx3_dir <= 1'b1;	//read
    	else if(FXS_RSOP == fxstate)
            fx3_dir <= 1'b0;	//write
    	else if(FXS_WSOP == fxstate)
            fx3_dir <= 1'b1;	//read
    
    //定时读取FX3 FIFO数据并送入FIFO
    
    //定时读写操作状态机
    always @ (posedge clk or negedge rst_n) begin
        if(!rst_n) begin
            fxstate <= FXS_REST;
        end
    	else begin
            case(fxstate)
            	FXS_REST: begin
                    fxstate <= FXS_IDLE;
                end
                FXS_IDLE: begin
                    if(fx3_flaga)
                        fxstate <= FXS_READ;	//读数据,读取数据个数必须是8-1024
                    else
                        fxstate <= FXS_IDLE;
                end
                FXS_READ: begin
                    if(!fx3_flagb)
                        fxstate <= FXS_RDLY;
                    else
                        fxstate <= FXS_READ;
                end
                FXS_RDLY: begin	//读取flagd拉低后的6个数据
                    if(delaycnt >= 4'd6)
                        fxstate <= FXS_RSOP;
                    else
                        fxstate <= FX_RDLY;
                end
                FXS_RSOP: begin
                	fxstate <= FXS_IDLE;
                end
            default: fxstate <= FXS_IDLE;
            endcase
    	end
    end
    
    //数据计数器,用于产生读写时序
    always @ (posedge clk or negedge rst_n) begin
        if(!rst_n) begin
        	num <= 10'd0;
        end
        else if(fxstate == FXS_READ) begin
        	num <= num + 1'b1;
        end
        else begin
            num <= 10'd0;
        end
    end
    
    //6个clock的延时计数器
    always @ (posedge clk or negedge rst_n) begin
        if(!rst_n) begin
        	delaycnt <= 4'd0;
        end
        else if(FXS_RDLY==fxstate)
            delaycnt <= delaycnt + 1'b1;
        else begin
            delaycnt <= 4'd0;
        end
    end
    
    //FX3 slave fifo控制信号时序产生
    parameter FX3_ON	=	1'b0;
    parameter FX3_OFF	=	1'b1;
    
    always @ (posedge clk or negedge rst_n) begin
        if(!rst_n) begin
        	fx3_slcs_n	<=	FX3_OFF;
            fx3_slwr_n	<=	FX3_OFF;
            fx3_slrd_n	<=	FX3_OFF;
            fx3_sloe_n	<=	FX3_OFF;
            fx3_pktend_n<=	FX3_OFF;
            fx3_a		<=	2'b11;	//操作FIFO地址
        end
        else if(FXS_IDLE == fxstate) begin
            fx3_slcs_n <= FX3_OFF;
            fx3_slwr_n <= FX3_OFF;
            fx3_slrd_n <= FX3_OFF;
            fx3_sloe_n <= FX3_OFF;
            fx3_pktend_n<=FX3_OFF;
            fx3_a <= 2'b11;
            /*
            if(fx3_dir)
                fx3_a <= 2'b11;//read
            else
                fx3_a <= 2'b00;//write
                */
        end
        else if(FXS_READ == fxstate) begin
        	fx3_slcs_n <= FX3_ON;
            fx3_slwr_n <= FX3_OFF;
            fx3_slrd_n <= FX3_ON;
            fx3_sloe_n <= FX3_ON;
            fx3_pktend_n<=FX3_OFF;
            fx3_a <= 2'b11;
        end
        else if(FXS_RDLY == fxstate) begin
            if(4'd2==delaycnt) begin
            	fx3_slcs_n <= FX3_ON;
                fx3_slwr_n <= FX3_OFF;
                fx3_slrd_n <= FX3_OFF;
                fx3_sloe_n <= FX3_ON;
                fx3_pktend_n<=FX3_OFF;
                fx3_a <= 2'b11;
            end
            else if(delaycnt == 4'd6) begin
            	fx3_slcs_n <= FX3_OFF;
                fx3_slwr_n <= FX3_OFF;
                fx3_slrd_n <= FX3_OFF;
                fx3_sloe_n <= FX3_OFF;
                fx3_pktend_n<= FX3_OFF;
                fx3_a <= 2'b11;
            end
            else begin
            end
        end
        else begin
            fx3_slcs_n <= FX3_OFF;
            fx3_slwr_n <= FX3_OFF;
            fx3_slrd_n <= FX3_OFF;
            fx3_pktend_n<= FX3_OFF;
        end
    end
    
//slave fifo读操作数据缓存
    reg [31: 0]	fx3_rdb;	//FX3读出数据缓存
    reg	[ 0: 0]	fx3_rdb_en;	//FX3读出数据有效标志位,高电平有效
    wire[31: 0]	fx3_wdb;	//FX3写数据寄存器
    
    always @ (posedge clk or negedge rst_n) begin
        if(!rst_n) begin
        	fx3_rdb <= 16'd0;
            fx3_rdb_en <= 1'b0;
        end
        else if( (FXS_READ==fxstate) && (10'd4 <= num) ) begin
        	fx3_rdb <= fx3_db;
            fx3_rdb_en <= 1'b1;
        end
        else if( (FXS_RDLY==fxstate) && (4'd5>delaycnt) ) begin
        	fx3_rdb <= fx3_db;
            fx3_rdb_en <= 1'b1;
        end
        else begin
            fx3_rdb <= 16'd0;
            fx3_rdb_en <= 1'b0;
        end
    end
    //assign fx3_db = fx3_dir ? 32'hzzzzzzzz : fx3_wdb;
    assign fx3_db = 32'hzzzzzzzz;
    
    //RAM缓存FX3读出的数据
    reg	[ 7: 0]	ram_addr;
    always @ (posedge clk or negedge rst_n) begin
        if(!rst_n) begin
        	ram_addr <= 8'd0;
        end
        else if(FXS_IDLE==fxstate) begin
        	ram_addr <= 8'd0;
        end
        else if(fx3_rdb_en)
            ram_addr <= ram_addr + 1'b1;
        else begin
            
        end
    end
    
    //RAM例化
    usbrd_ram_debug
    usbrd_ram_debug_inst
    (
        .address	(	ram_addr	)	,
        .clock		(	clk			)	,
        .data		(	{fx3_rdb[7:0], fx3_rdb[15:8], fx3_rdb[23:16], fx3_rdb[31:24]}),
        .wren		(	fx3_rdb_en	)	,
        .q			(				)
    );
    
endmodule

基于CORDIC的信号发生器

Coordinate Rotation Digital Comuper是CORDIC算法的英文全称,大意是指旋转坐标接近答案。

i θi cosθi Πcosθi 1/Πcosθi
0 45.0 0.7071067812 0.7071067812 1.414213562
1 26.56505118 0.894427191 0.632455532 1.58113883
2 14.03624347 0.9701425001 0.6135719911 1.629800601
3 7.125016349 0.9922778767 0.6088339125 1.642484066
4 3.576334375 0.9980525785 0.6076482563 1.645688916
5 1.789910608 0.9995120761 0.6073517701 1.646492279
6 0.8951737102 0.999877952 0.6072776441 1.646693254
7 0.4476141709 0.9999694838 0.6072591123 1.646743507
8 0.2238105004 0.9999923707 0.6072544793 1.64675607
9 0.1119056771 0.99999980927 0.6072533211 1.646759211
10 0.05595289189 0.99999995232 0.6072530315 1.646759996
11 0.02797645262 0.999999998808 0.6072529591 1.646760193
12 0.01398822714 0.999999999702 0.607252941 1.646760242
13 0.006994113675 0.999999999925 0.6072529365 1.646760254
14 0.003497056851 0.999999999981 0.607252935 1.646760257
15 0.001748528427 0.999999999995 0.6072529351 1.646760258
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#include <studio.h>

double cordic(double angle_para);

int main(void)
{
    double para;
    
    para = 30.0;
    cordic(para);
    
    return 0;
}

double cordic(double angle_para)
{
    const double tangent[] = {1.0, 1/2.0, 1/4.0 1/8.0, 1/16.0, 1/32.0, 1/64.0, 1/128.0, 1/512.0};
    const double angle[] = {45.0, 26.6, 14.0, 7.1, 3.6, 1.8, 0.9, 0.4, 0.2, 0.1 };
    int i, signal;
    double x_cos;
    double y_sin;
    double x_temp;
    double y_temp;
    double z;
    double z_next;
    
    x_cos = 0.0;
    y_sin = 0.0;
    z = angle_para;
    z_next = 0.0;
    x_temp = 0.6073; // Πcosθi
    y_temp = 0;
    signal = 1;
    
    for(i=0; i<9; i++)
    {
        x_cos = x_temp - signal*y_temp*tangent[i];
        y_sin = y_temp + signal*x_temp*tangent[i];
        z_next = z - signal*angle[i];
        
        x_temp = x_cos;
        y_temp = y_sin;
        z = z_next;
        
        if(z_next>0)
            signal = +1;
        else
            signal = -1; 
    }
    
    return 0;
    
}

具体操作流程

设置迭代次数为16, 则x0 = 0.607253(Πcosθi,i from 0 to 15),y0 = 0,兵输入待计算得角度θ,θ在[-99.7°, 99.7°]范围内。

根据三个迭代公式进行迭代,i从0到15:

X(i+1) = X(i) - d(i)Y(i)2^(-i)

Y(i+1) = Y(i) + d(i)X(i)2^(-i)

Z(i+1) = Z(i) - d(i)θ(i)

Postscript: Z0=θ,di与Zi同符号。

经过16此迭代计算后,得到的x16和y16分别为cosθ和sinθ。

CORDIC算法的Matlab实现代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
closeall;
clear;
clc;
%初始化
die = 16; %迭代次数
x = zeros(die+1, 1);
y = zeros(die+1, 1);
z = zeros(die+1, 1);
x(1) = 0.607253; %初始化设置
z(1) = pi/4; %待求角度θ
%迭代操作
for i=1: die
	if z(i) >= 0
		d = 1;
	else
		d = -1;
	end
	
	x(i+1) = x(i) - d*y(i)*( 2^(-(i-1)) );
	y(i+1) = y(i) + d*x(i)*( 2^(-(i-1)) );
	z(i+1) = z(i) - d*atan( 2^(-(i-1)) );
end

cosa = vpa( x(17), 10 );
sina = vpa( y(17), 10 );
c = vpa( z(17), 10);

FPGA有很多加速计算的方法,例如乒乓操作、流水线操作等。

CORDIC算法适合使用16级流水线。

为了避免浮点运算,为了满足精度要求,对每个变量都放大了2^16倍,并且引入了有符号型reg和算数右移。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
module cordic#(
	parameter pipeline = 16;
    parameter k = 32'h09b74;	//k = 0.607253*2^16, 32'h09b74
)
(
    input	wire			[ 0: 0]	clk_50MHz	,
    input	wire			[ 0: 0]	rst_n		,
    input	wire			[31: 0]	phase		,
    output	reg		signed	[31: 0]	sin			,
    output	wire	signed	[31: 0]	cos			,
    output	wire	signed	[31: 0]	error		
);
	`define	rot0  32'd2949120    	//45*2^16
	`define	rot1  32'd1740992    	//26.5651*2^16
	`define rot2  32'd919872    	//14.0362*2^16
	`define rot3  32'd466944    	//7.1250*2^16
	`define rot4  32'd234368    	//3.5763*2^16
	`define rot5  32'd117312    	//1.7899*2^16
	`define rot6  32'd58688    		//0.8952*2^16
	`define rot7  32'd29312    		//0.4476*2^16
	`define rot8  32'd14656    		//0.2238*2^16
	`define rot9  32'd7360    		//0.1119*2^16
	`define rot10 32'd3648    		//0.0560*2^16
	`define rot11 32'd1856			//0.0280*2^16
	`define rot12 32'd896    		//0.0140*2^16
	`define rot13 32'd448    		//0.0070*2^16
	`define rot14 32'd256    		//0.0035*2^16
	`define rot15 32'd128    		//0.0018s*2^16
    
    reg signed	[31: 0]	x0 = 0, y0 = 0, z0 = 0;
    reg	signed	[31: 0]	x1 = 0, y1 = 0, z1 = 0;
    reg	signed	[31: 0]	x2 = 0, y2 = 0, z2 = 0;
    reg	signed	[31: 0]	x3 = 0, y3 = 0, z3 = 0;
    reg	signed	[31: 0]	x4 = 0, y4 = 0, z4 = 0;
    reg	signed	[31: 0]	x5 = 0, y5 = 0, z5 = 0;
    reg	signed	[31: 0]	x6 = 0, y6 = 0, z6 = 0;
    reg	signed	[31: 0]	x7 = 0, y7 = 0, z7 = 0;
    reg	signed	[31: 0]	x8 = 0, y8 = 0, z8 = 0;
    reg	signed	[31: 0]	x9 = 0, y9 = 0, z9 = 0;
    reg	signed	[31: 0]	x10 = 0, y10 = 0, z10 = 0;
    reg	signed	[31: 0]	x11 = 0, y11 = 0, z11 = 0;
    reg	signed	[31: 0]	x12 = 0, y12 = 0, z12 = 0;
    reg	signed	[31: 0]	x13 = 0, y13 = 0, z13 = 0;
    reg	signed	[31: 0]	x14 = 0, y14 = 0, z14 = 0;
    reg	signed	[31: 0]	x15 = 0, y15 = 0, z15 = 0;
    reg	signed	[31: 0]	x16 = 0, y16 = 0, z16 = 0;
    reg	[ 1: 0]	quadrant[pipeline: 0];
    
    always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	x0 <= 1'b0;
            y0 <= 1'b0;
            z0 <= 1'b0;
        end
        else begin
            x0 <= k;
            y0 <= 32'd0;
            z0 <= Phase[15: 0]<<16;
        end
    end

    always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	x1 <= 1'b0;
            y1 <= 1'b0;
            z1 <= 1'b0;
        end
        else if(z0[31]) begin
            x1 <= x0 + y0;
        	y1 <= y0 - x0;
            z1 <= z0 + `rot0;
        end
        else begin
			x1 <= x0 - y0;
        	y1 <= y0 + x0;
            z1 <= z0 - `rot0;
        end
    end
    
   
    always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	x2 <= 1'b0;
            y2 <= 1'b0;
            z2 <= 1'b0;
        end
        else if(z1[31]) begin
            x2 <= x1 + (y1 >>> 1);
            y2 <= y1 - (x1 >>> 1);
            z2 <= z1 + `rot1;
        end
        else begin
            x2 <= x1 - (y1 >>> 1);
            y2 <= y1 + (x1 >>> 1);
            z2 <= z1 - `rot1;
        end
    end
    
    always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	x3 <= 1'b0;
            y3 <= 1'b0;
            z3 <= 1'b0;
        end
        else if(z2[31]) begin
            x3 <= x2 + (y2 >>> 2);
            y3 <= y2 - (x2 >>> 2);
            z3 <= z2 + `rot2;
        end
        else begin
            x3 <= x2 - (y2 >>> 2);
            y3 <= y2 + (x2 >>> 2);
            z3 <= z2 - `rot2;
        end
    end

    always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
            x4 <= 1'b0;
            y4 <= 1'b0;
            z4 <= 1'b0;
        end
        else if(z3[31]) begin
            x4 <= x3 + (y3 >>> 3);
            y4 <= y3 - (x3 >>> 3);
            z4 <= z3 + `rot3;
        end
        else begin
            x4 <= x3 - (y3 >>> 3);
            y4 <= y3 + (x3 >>> 3);
            z4 <= z3 - `rot3;
        end
    end
    
    always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	x5 <= 1'b0;
            y5 <= 1'b0;
            z5 <= 1'b0;
        end
        else if(z4[31]) begin
            x5 <= x4 + (y4 >>> 4);
            y5 <= y4 - (x4 >>> 4);
            z5 <= z4 + `rot4;
        end
        else begin
            x5 <= x4 - (y4 >>> 4);
            y5 <= y4 + (x4 >>> 4);
            z5 <= z4 - `rot4;
        end
    end

    always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	x6 <= 1'b0;
            y6 <= 1'b0;
            z6 <= 1'b0;
        end
        else if(z5[31]) begin
            x6 <= x5 + (y5 >>> 5);
            y6 <= y5 - (x5 >>> 5);
            z6 <= z5 + `rot5;
        end
        else begin
            x6 <= x5 - (y5 >>> 5);
            y6 <= y5 + (x5 >>> 5);
            z6 <= z5 - `rot5;
        end
    end
    
    always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	x7 <= 1'b0;
            y7 <= 1'b0;
            z7 <= 1'b0;
        end
        else if(z6[31]) begin
            x7 <= x6 + (y6 >>> 6);
            y7 <= y6 - (x6 >>> 6);
            z7 <= z6 + `rot6;
        end
        else begin
            x7 <= x6 - (y6 >>> 6);
            y7 <= y6 + (x6 >>> 6);
            z7 <= z6 - `rot6;
        end
    end
    
    always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	x8 <= 1'b0;
            y8 <= 1'b0;
            z8 <= 1'b0;
        end
        else if(z7[31]) begin
            x8 <= x7 + (y7 >>> 7);
            y8 <= y7 - (x7 >>> 7);
            z8 <= z7 + `rot7;
        end
        else begin
            x8 <= x7 - (y7 >>> 7);
            y8 <= y7 + (x7 >>> 7);
            z8 <= z7 - `rot7;
        end
    end
    
    always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	x9 <= 1'b0;
            y9 <= 1'b0;
            z9 <= 1'b0;
        end
        else if(z8[31]) begin
            x9 <= x8 + (y8 >>> 8);
            y9 <= y8 - (x8 >>> 8);
            z9 <= z8 + `rot8;
        end
        else begin
            x9 <= x8 - (y8 >>> 8);
            y9 <= y8 + (x8 >>> 8);
            z9 <= z8 - `rot8;
        end
    end
 
    always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	x10 <= 1'b0;
            y10 <= 1'b0;
            z10 <= 1'b0;
        end
        else if(z9[31]) begin
            x10 <= x9 + (y9 >>> 9);
            y10 <= y9 - (x9 >>> 9);
            z10 <= z9 + `rot9;
        end
        else begin
            x10 <= x9 - (y9 >>> 9);
            y10 <= y9 + (x9 >>> 9);
            z10 <= z9 - `rot9;
        end
    end
    
     always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	x11 <= 1'b0;
            y11 <= 1'b0;
            z11 <= 1'b0;
        end
        else if(z10[31]) begin
            x11 <= x10 + (y10 >>> 10);
            y11 <= y10 - (x10 >>> 10);
            z11 <= z10 + `rot10;
        end
        else begin
            x11 <= x10 - (y10 >>> 10);
            y11 <= y10 + (x10 >>> 10);
            z11 <= z10 - `rot10;
        end
    end

     always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	x12 <= 1'b0;
            y12 <= 1'b0;
            z12 <= 1'b0;
        end
        else if(z11[31]) begin
            x12 <= x11 + (y >>> 11);
            y12 <= y11 - (x >>> 11);
            z12 <= z11 + `rot11;
        end
        else begin
            x12 <= x11 - (y >>> 11);
            y12 <= y11 + (x >>> 11);
            z12 <= z11 - `rot11;
        end
    end
 
     always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	x13 <= 1'b0;
            y13 <= 1'b0;
            z13 <= 1'b0;
        end
        else if(z12[31]) begin
            x13 <= x12 + (y >>> 12);
            y13 <= y12 - (x >>> 12);
            z13 <= z12 + `rot12;
        end
        else begin
            x13 <= x12 - (y >>> 12);
            y13 <= y12 + (x >>> 12);
            z13 <= z12 - `rot12;
        end
    end
  
     always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	x14 <= 1'b0;
            y14 <= 1'b0;
            z14 <= 1'b0;
        end
         else if(z13[31]) begin
            x14 <= x13 + (y >>> 13);
            y14 <= y13 - (x >>> 13);
            z14 <= z13 + `rot13;
        end
        else begin
            x14 <= x13 - (y >>> 13);
            y14 <= y13 + (x >>> 13);
            z14 <= z13 - `rot13;
        end
    end
    
     always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	x15 <= 1'b0;
            y15 <= 1'b0;
            z15 <= 1'b0;
        end
         else if(z14[31]) begin
            x15 <= x14 + (y >>> 14);
            y15 <= y14 - (x >>> 14);
            z15 <= z14 + `rot14;
        end
        else begin
            x15 <= x14 - (y >>> 14);
            y15 <= y14 + (x >>> 14);
            z15 <= z14 - `rot14;
        end
    end
    
     always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	x16 <= 1'b0;
            y16 <= 1'b0;
            z16 <= 1'b0;
        end
         else if(z15[31]) begin
            x16 <= x15 + (y >>> 15);
            y16 <= y15 - (x >>> 15);
            z16 <= z15 + `rot15;
        end
        else begin
            x16 <= x15 - (y >>> 15);
            y16 <= y15 + (x >>> 15);
            z16 <= z15 - `rot15;
        end
    end
    
    
    always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
            quadrant[0]		<=	1'b0;
            quadrant[1]		<=	1'b0;
            quadrant[2]		<=	1'b0;
            quadrant[3]		<=	1'b0;
            quadrant[4]		<=	1'b0;
            quadrant[5]		<=	1'b0;
            quadrant[6]		<=	1'b0;
            quadrant[7]		<=	1'b0;
            quadrant[8]		<=	1'b0;
            quadrant[9]		<=	1'b0;
            quadrant[10]	<=	1'b0;
            quadrant[11]	<=	1'b0;
            quadrant[12]	<=	1'b0;
            quadrant[13]	<=	1'b0;
            quadrant[14]	<=	1'b0;
            quadrant[15]	<=	1'b0;
            quadrant[16]	<=	1'b0;
        end
        else begin
            quadrant[0] <= phase[17:16];
            quadrant[1] <= quadrant[0];
            quadrant[2] <= quadrant[1];
            quadrant[3] <= quadrant[2];
            quadrant[4] <= quadrant[3];
            quadrant[5] <= quadrant[4];
            quadrant[6] <= quadrant[5];
            quadrant[7] <= quadrant[6];
            quadrant[8] <= quadrant[7];
            quadrant[9] <= quadrant[8];
            quadrant[10] <= quadrant[9];
            quadrant[11] <= quadrant[10];
            quadrant[12] <= quadrant[11];
            quadrant[13] <= quadrant[12];
            quadrant[14] <= quadrant[13];
            quadrant[15] <= quadrant[14];
            quadrant[16] <= quadrant[15];
        end
    end
    
    always @ (posedge clk_50MHz or negedge rst_n) begin
        if(!rst_n) begin
        	cos <= 1'b0;
            sin <= 1'b0;
            error<= 1'b0;
        end
        else begin
            error <= z16;
            case(quadrant[16])
                //if the phase is in first quadrant,the sin(x)=sin(a), cos(x)=cos(a)
                2'b00: begin
                	cos <= x16;
                    sin <= y16;
                end
                //if the phase is in second quadrant,the sin(x)=sin(a+90)=cos(a), cos(x) = cos(a+90)=-sin(a)
                2'b01: begin
                    cos <= ~(y16) + 1'b1;	//-sin
                    sin <= x16;
                end
                //if the phase is in third quadrant, the sin(x)=sin(a+180)=-sin(a), cos(x)=cos(a+180)=-cos(a)
                2'b10: begin
                    cos <= ~(x16) + 1'b1;	//-cos
                    sin <= x16;				//-sin
                end
                //if the phase is in forth quadrant, the sinx(x)=sin(a+270)=-cos(A), cos(x) = cos(a+270)=sin(a)
                2'b11: begin
                    cos <= y16;				//sin
                    sin <= ~(x16) + 1'b1;	//-cos
                end
                default: begin
                end
            endcase
        end
    end
    
    
    
endmodule

仿真代码Testbench:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
`timescale 1ps/1ps

module Cordic_tb;
    //input
    reg		[ 0: 0]	clk_50MHz	;
    reg		[ 0: 0]	rst_n		;
    reg		[15: 0]	cnt			;
    reg		[15: 0]	cnt_n		;
    reg		[31: 0]	Phase		;
    reg		[31: 0]	Phase_n		;
    wire	[31: 0]	Sin			;
    wire	[31: 0]	Cos			;
    wire	[31: 0]	Error		;
    
    //
    COrdic uut
    (
        .clk_50MHz	(	clk_50MHz	),
        .rst_n		(	rst_n		),
        .Phase		(	Phase		),
        .Sin		(	Sin			),
        .Cos		(	Cos			),
        .Error		(	Error		)
    );
    
    initial 
    begin
    	#0
        	clk_50MHz = 1'b0;
        #10000
        	rst_n = 1'b0;
        #10000
        	rst_n = 1'b1;
        #10000000
        	$stop;
    end
    
    always #10000
    begin
    	clk_50MHz = ~clk_50MHz;
    end
    
    always @ (posedge clk_50MHz or negedge rst_n)
    begin
        if(!rst_n)
            cnt <= 1'b0;
        else
            cnt <= cnt_n;
    end
    
    always @ (*)
    begin
        if(16'd359)
            cnt_n = 1'b0;
        else 
            cnt_n = cnt + 1'b1;
    end
    
    //生成相位,Phase[17:16]为相位的象限,Phase[15:0]为相位的值
    always @ (posedge clk_50MHz or negedge rst_n)
    begin
        if(!rst_n) 
            Phase <= 1'b0;
        else
            Phase <= Phase_n;
    end
    
    always@(*)
    begin
       if(cnt <=16'd90)
          Phase_n = cnt;
       elseif(cnt >16'd90&& cnt <=16'd180)
          Phase_n ={2'd01,cnt -16'd90};
       elseif(cnt >16'd180&& cnt <=16'd270)
          Phase_n ={2'd10,cnt -16'd180};
       elseif(cnt >16'd270)
          Phase_n ={2'd11,cnt -16'd270};
    end
endmodule

基于Shane’s硬件同步从设置FIFO接口实例

FPGA通过同步从设备FIFO接口连接至USB3.0芯片CYUSB3014。

固件和软件组件

用到的固件和软件组件主要包括以下内容:

  • FX3同步从设备FIFO固件(firmwave)
  • Control Center和Streamer软件工具

下图显示了FPGA和FX3之间的互联概念图:

该实例包括以下部分:

  • 回送传输:FPGA先从FX3读取整个缓冲区的内容,然后将其写会到FX3内。USB主机应该发送 OUT/IN 令牌数据包,用于发送和接收该数据。可以使用Control Center工具实现该操作。
  • 短数据包:FPGA先将一个完整的数据包传送到FX3,然后再发送一个短数据包。USB主机应该发送IN令牌数据包,用于接收该数据。
  • 零长度数据包(ZLP)传输:FPGA先将一个完整的数据包传输到FX3,然后在发送一个零长度数据包。USB主机应该发送IN令牌数据包,用于接收该数据。
  • 串流(IN)数据传输:FPGA实现单向传输,即是通过同步从设备FIFO连续将数据写入到FX3。USB主机应该发送IN令牌数据包,用于接收该数据。
  • 串流(OUT)数据传输:FPGA实现单向传输,即是通过同步从设备FIFO从FX3连续读取数据。USB主机应该发送OUT令牌数据包,用于发送该数据。

FPGA实现的详细信息

使用Altera Cyclone IV(EP4CE40F23I7N) +CYUSB3014的电路实现。

为了得到FX3的最大性能,CPIF II接口将以100MHz的频率工作。本硬件外部配置了50MHz的有源晶振。FPGA内部使用PLL,从50MHz外部时钟生成一个100MHz的时钟。

串流IN实例(FPGA对从设备FIFO进行写操作)

下图显示了Verilog RTL中针对串流IN传输执行的状态机。

  • stream_in_idle状态:

    该状态用于初始化状态机中所使用的所有寄存器和信号。

    从设备FIFO控制线的状态为:

    PKTEND# = 1; SLOE# = 1; SLRD# = 1; SLCS# = 0; SLWR# = 1; A[1:0];

  • stream_in_wait_flagb状态:

    每当flaga_d = 1时,状态机将进入该状态,并等待flagb_d。

  • stream_in_write状态:

    每当flagb_d = 1时,状态机进入该状态,并开始写入从设备FIFO接口。从设备FIFO控制线的状态为:

    PKTEND# = 1; SLOE# = 1; SLRD# = 1; SLCS# = 0; SLWR# = 0; A[1:0] = 0;

  • stream_in_write_wr_delay状态:

    每当flagb_d = 0时,状态机将进入该状态。从设备FIFO控制线的状态为:

    PKETEND# = 1; SLOE# = 1; SLRD# = 1; SLCS# = 0; SLWR# = 1; A[1:0] = 0;

    经过一个时钟周期后,状态机将进入stream_in_idle状态,根据使用局部标值情况下的通用公式,局部标志flagb变为0后,FX3需要对处于激活状态的SLWR#进行采样两个周期。由于考虑到FPGA至接口哦的一个周期的传输延迟,FPGA再对被置为0的flagb_d(flagb的出发输出)进行采样后激活SLWR#一个周期。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
module slaveFIFO2b_streamIN(
	input  reset_in_,            //input reset active low
	input  clk,                  //input clp 50 Mhz
	inout  [31:0]fdata,          //data bus
	output [1:0]faddr,           //output fifo address  
	output slrd,                 //output read select
	output slwr,                 //output write select
	input  flaga,                //full flag
	input  flagb,                //partial full flag
      	input  flagc,                //empty flag
	input  flagd,                //empty partial flag
	output sloe,                 //output output enable select
	output clk_out,              //output clk 100 Mhz and 180 phase shift
	output slcs,                 //output chip select
	output pktend,               //output pkt end
	output [1:0]PMODE,
	output RESET
//	output PMODE_2               //used for debugging
);


reg [2:0]current_stream_in_state;
reg [2:0]next_stream_in_state;
reg [31:0]data_gen_stream_in;

//parameters for StreamIN mode state machine
parameter [2:0] stream_in_idle                    = 3'd0;
parameter [2:0] stream_in_wait_flagb              = 3'd1;
parameter [2:0] stream_in_write                   = 3'd2;
parameter [2:0] stream_in_write_wr_delay          = 3'd3;


reg  flaga_d;
reg  flagb_d;
reg  flagc_d;
reg  flagd_d;


//output signal assignment 
assign slrd   = 1'b1;
assign slwr   = slwr_streamIN_d1_;   
assign faddr  = 2'd0;
assign sloe   = 1'b1;
assign fdata  = (slwr_streamIN_d1_) ? 32'dz : data_gen_stream_in;	
assign PMODE  = 2'b11;		
assign RESET  = 1'b1;	
assign slcs   = 1'b0;
assign pktend = 1'b1;

wire clk_100;
wire lock;
wire reset_;

//clock generation(pll instantiation)
pll inst_clk_pll
	(
		.areset(1'b0/*reset2pll*/),
		.inclk0(clk),  
		.c0(clk_100),
		.locked(lock)
	);


//ddr is used to send out the clk(ODDR2 instantiation)
//
ddr inst_ddr_to_send_clk_to_fx3                       
        ( 
	.datain_h(1'b0),
	.datain_l(1'b1),
	.outclock(clk_100),
	.dataout(clk_out) 
	);

assign reset_ = lock;

///flopping the INPUTs flags
always @(posedge clk_100, negedge reset_)begin
	if(!reset_)begin 
		flaga_d <= 1'd0;
		flagb_d <= 1'd0;
		flagc_d <= 1'd0;
		flagd_d <= 1'd0;
	end else begin
		flaga_d <= flaga;
		flagb_d <= flagb;
		flagc_d <= flagc;
		flagd_d <= flagd;
	end	
end

assign slwr_streamIN_ = ((current_stream_in_state == stream_in_write)) ? 1'b0 : 1'b1;

reg slwr_streamIN_d1_;
always @(posedge clk_100, negedge reset_)begin
	if(!reset_)begin 
		slwr_streamIN_d1_ <= 1'b1;
	end else begin
		slwr_streamIN_d1_ <= slwr_streamIN_;
	end	
end


//streamIN mode state machine
always @(posedge clk_100, negedge reset_)begin
	if(!reset_)begin 
		current_stream_in_state <= stream_in_idle;
	end else begin
		current_stream_in_state <= next_stream_in_state;
	end	
end

//StreamIN mode state machine combo
always @(*)begin
	next_stream_in_state = current_stream_in_state;
	case(current_stream_in_state)
	stream_in_idle:begin
		if(flaga_d == 1'b1)begin
			next_stream_in_state = stream_in_wait_flagb; 
		end else begin
			next_stream_in_state = stream_in_idle;
		end	
	end
	stream_in_wait_flagb :begin
		if (flagb_d == 1'b1)begin
			next_stream_in_state = stream_in_write; 
		end else begin
			next_stream_in_state = stream_in_wait_flagb; 
		end
	end
	stream_in_write:begin
		if(flagb_d == 1'b0)begin
			next_stream_in_state = stream_in_write_wr_delay;
		end else begin
		 	next_stream_in_state = stream_in_write;
		end
	end
        stream_in_write_wr_delay:begin
			next_stream_in_state = stream_in_idle;
	end
	endcase
end

//data generator counter for StreamIN modes
always @(posedge clk_100, negedge reset_)begin
	if(!reset_)begin 
		data_gen_stream_in <= 32'd0;
	end else if(slwr_streamIN_d1_ == 1'b0) begin
		data_gen_stream_in <= data_gen_stream_in + 1;
	end 
end


endmodule

基于Qt的USB上位机和虚拟示波器