Achieving Simulation Performance

A few techniques I’ve implemented over years to help achieve performance during simulation.

No System Calls

For obvious reasons, the simulator shouldn’t do syscalls during simulation.

-especially I/O.

No Polling

Whenever simulating entire systems, don’t ever, never poll.

Better use callbacks for events notifications between (simulated) system components.

Hash Tables

As much as possible, achieve O(1) decoding of memory addresses when translating from guest to host memory.

Don’t walk arrays of descriptors of memory banks, peripherals, etc…

Better implement hash-tables.

When in doubt, implement both, profile, and decide fact-based.

Pre-Decode

Pre-decode once for all.

For instance:

  • At binary download, pre-decode & pre-disassemble the code,
  • When setting up the Memory Protection Unit, pre-decode memory regions,
  • Etc..

Needless to say, pre-decode may re-occur *during* the simulation (if the setup of the MPU changes, for instance).

But assuming this is rare, pre-decoding is always rewarding.

Setup up some sort of callback to (re-)initiate a pre-decode.

Synthesize Complex Conditions

Pre-compute complex and/or heavily used conditions, and memoize them.

Shall the condition change during simulation, setup up some sort of callback to (re-)initiate the pre-compute / memoization.

State Machines

Hold complex systems in a single state identifier.

Callback methods when transitioning between states.

Code Size

Depending on the guest architecture, factorizing instruction behaviors helps restraining the code size of the simulator, thus avoiding trashing the host CPU cache.

Inlining

For key resources (esp. registers & memory), consider inlining code.

Whenever possible, do several implementation, benchmark them all, and decide fact-based.

Take advantage of gcc optimization flags to control inlining.

Profile

Profile the simulation main loop, optimize it.

Identify the most used paths in the code of the simulator, profile them, optimize them.

These will be called several billion times 😉

Know your C++

During simulation, beware of default constructors, affectation operators, casts, etc…

These can hold significant amount of code, including memory allocation.

Likewise destructors releasing memory.

Especially true for the STL.

Setup

Implement some sort of init(); or reset(); method, responsible for setting up the simulation.

Perform all the parsing, configuration, memory allocations, etc.. only once, at the beginning of the simulation.

Output

Identically, have some sort of terminate(); method responsible to output the outcome (eg. profile, dumps, ..) of the simulation.

Plugin

Modularize the extra features (eg. trace, debug, …) as plugins, keeping the core of the simulator lean and fit.

Implement these features in shared libraries, and load them dynamically on-demand.