加速原理
正常仿真任务提交到机器上后,采用单核仿真。通过将线程与cpu多核进行绑定,设置亲和性affinity,来提高仿真速度。
一般机器上会显示有几个物理核,如16个物理核,每个核的工作负载work load,如果本身已经高负载,则无法继续选取该核进行绑定。同样,如果该核本身已经被其它jobs锁定,也无法选取。
FGP 是什么
- FGP (Fine-Grained Parallelism): 细粒度并行仿真技术;VCS利用FGP技术动态调整算法,能够优化并充分利用多核和众核处理器平台,提高仿真性能
Two-step flow:
% vcs -fgp -full64 <otherOptions>% simv -fgp=[FGP-OPTIONS]
Three-step flow:
% vlogan -full64 <otherOptions>% vhdlan -full64 <otherOptions> (for Mixed-HDL design only)% vcs -fgp -full64 <otherOptions>% simv -fgp=num_threads:<value>
注意:
- compile 添加 -fgp, simulation 不添加, simv 会采用单核仿真
- -fgp 需在-full64 使用
fgp 可选参数
options | description |
num_threads:<value> | 采用 <value>+1 cores (one master core + <value> child cores) ;If cores is less than the requested number of threads (N+1), then VCS generates an error. |
multisocket | When you use this option, the memory binding for the thread takes into consideration the socket that is running the thread. 有利于性能 |
sync:<scheme_value> | 线程同步方案。默认(优化性能): -fgp=sync:busywait;-mutex;-serial |
cpu_affinity | 例:simv -fgp= num_threads:8,cpu_affinity:(6-9,16-19) 9 cores are selected from the 10 available node1 CPUs |
single_socket_mode | Use this option to use all the available cores on the given socket |
num_cores:N | the number of cores*N cores are picked that includes one master core and N-1 child cores |
min_num_cores:P | set the minimum number of cores required by simulation. If there are less vacant cores than the min_num_cores, the simulation terminates. This option is used for the single_socket_mode option |
allow_less_cores | change the hard limit of “N number of cores are required by the simulation” to the soft limit as “Maximum N number of cores are required by the simulation”. Therefore, if there are less than the specified number of cores available, simulation does not quit with an error but assigns the available number of cores to the simulation and continue to run. |
num_fsdb_threads:M | This option is mandatory when FSDB dumping is enabled. VCS allocates as many cores as specified by M for dumping |
diag:ruse (dynamictoggle) | When there are no sufficient events for parallelization, simulation may slow down when VCS FGP is turned on. With this option, VCS FGP is turned off when simulation hits low activity region. This allows you to optimize the FGP engine while handling low event simulation. |
-Xdprof=timeline | runtime 分析 |
什么时候启用fgp
启用 -Xdprof=timeline 查看dprof.txt
报告说明
- 生成EPC的日志文件名为“dprof.txt”;
- 如果EPC的前三行(每个周期的事件)占80%以上,那么并行性更好。
- 如果最后两行的EPC高于其他行之上,这意味着这是一个非常低活性的设计。当运行更多线程时,这样的设计实际上会减慢FGP的速度。
- 运行时使用diag:ruse选项,switching activity 储到报告文件中。参考运行时选项dynamictoggle来动态关闭低活动区域的VCS FGP。
应用实例
compile: -fgpsimv -fgp=num_threads3, sync:busywait, auto_affinity -xdprof=diag:ruse, -Xdprof=timeline // 选用4核,自动设置,打印诊断信息,报告状态
没有回复内容