LLVM API Documentation

Defines | Functions | Variables
MemorySanitizer.cpp File Reference
#include "llvm/Transforms/Instrumentation.h"
#include "llvm/ADT/DepthFirstIterator.h"
#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/Triple.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InlineAsm.h"
#include "llvm/IR/InstVisitor.h"
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/MDBuilder.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Type.h"
#include "llvm/IR/ValueMap.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Compiler.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/ModuleUtils.h"
#include "llvm/IR/Intrinsics.gen"
Include dependency graph for MemorySanitizer.cpp:

Go to the source code of this file.

Defines

#define DEBUG_TYPE   "msan"
#define GET_INTRINSIC_MODREF_BEHAVIOR
#define ModRefBehavior   IntrinsicKind

Functions

 INITIALIZE_PASS (MemorySanitizer,"msan","MemorySanitizer: detects uninitialized reads.", false, false) FunctionPass *llvm
static GlobalVariablecreatePrivateNonConstGlobalForString (Module &M, StringRef Str)
 Create a non-const global initialized with the given string.

Variables

static const uint64_t kShadowMask32 = 1ULL << 31
static const uint64_t kShadowMask64 = 1ULL << 46
static const uint64_t kOriginOffset32 = 1ULL << 30
static const uint64_t kOriginOffset64 = 1ULL << 45
static const unsigned kMinOriginAlignment = 4
static const unsigned kShadowTLSAlignment = 8
static const size_t kNumberOfAccessSizes = 4
static cl::opt< intClTrackOrigins ("msan-track-origins", cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden, cl::init(0))
 Track origins of uninitialized values.
static cl::opt< boolClKeepGoing ("msan-keep-going", cl::desc("keep going after reporting a UMR"), cl::Hidden, cl::init(false))
static cl::opt< boolClPoisonStack ("msan-poison-stack", cl::desc("poison uninitialized stack variables"), cl::Hidden, cl::init(true))
static cl::opt< boolClPoisonStackWithCall ("msan-poison-stack-with-call", cl::desc("poison uninitialized stack variables with a call"), cl::Hidden, cl::init(false))
static cl::opt< intClPoisonStackPattern ("msan-poison-stack-pattern", cl::desc("poison uninitialized stack variables with the given patter"), cl::Hidden, cl::init(0xff))
static cl::opt< boolClPoisonUndef ("msan-poison-undef", cl::desc("poison undef temps"), cl::Hidden, cl::init(true))
static cl::opt< boolClHandleICmp ("msan-handle-icmp", cl::desc("propagate shadow through ICmpEQ and ICmpNE"), cl::Hidden, cl::init(true))
static cl::opt< boolClHandleICmpExact ("msan-handle-icmp-exact", cl::desc("exact handling of relational integer ICmp"), cl::Hidden, cl::init(false))
static cl::opt< boolClCheckAccessAddress ("msan-check-access-address", cl::desc("report accesses through a pointer which has poisoned shadow"), cl::Hidden, cl::init(true))
static cl::opt< boolClDumpStrictInstructions ("msan-dump-strict-instructions", cl::desc("print out instructions with default strict semantics"), cl::Hidden, cl::init(false))
static cl::opt< intClInstrumentationWithCallThreshold ("msan-instrumentation-with-call-threshold", cl::desc("If the function being instrumented requires more than ""this number of checks and origin stores, use callbacks instead of ""inline checks (-1 means never use callbacks)."), cl::Hidden, cl::init(3500))
static cl::opt< std::string > ClWrapIndirectCalls ("msan-wrap-indirect-calls", cl::desc("Wrap indirect calls with a given function"), cl::Hidden)
static cl::opt< boolClWrapIndirectCallsFast ("msan-wrap-indirect-calls-fast", cl::desc("Do not wrap indirect calls with target in the same module"), cl::Hidden, cl::init(true))

Detailed Description

This file is a part of MemorySanitizer, a detector of uninitialized reads.

The algorithm of the tool is similar to Memcheck (http://goo.gl/QKbem). We associate a few shadow bits with every byte of the application memory, poison the shadow of the malloc-ed or alloca-ed memory, load the shadow bits on every memory read, propagate the shadow bits through some of the arithmetic instruction (including MOV), store the shadow bits on every memory write, report a bug on some other instructions (e.g. JMP) if the associated shadow is poisoned.

But there are differences too. The first and the major one: compiler instrumentation instead of binary instrumentation. This gives us much better register allocation, possible compiler optimizations and a fast start-up. But this brings the major issue as well: msan needs to see all program events, including system calls and reads/writes in system libraries, so we either need to compile *everything* with msan or use a binary translation component (e.g. DynamoRIO) to instrument pre-built libraries. Another difference from Memcheck is that we use 8 shadow bits per byte of application memory and use a direct shadow mapping. This greatly simplifies the instrumentation code and avoids races on shadow updates (Memcheck is single-threaded so races are not a concern there. Memcheck uses 2 shadow bits per byte with a slow path storage that uses 8 bits per byte).

The default value of shadow is 0, which means "clean" (not poisoned).

Every module initializer should call __msan_init to ensure that the shadow memory is ready. On error, __msan_warning is called. Since parameters and return values may be passed via registers, we have a specialized thread-local shadow for return values (__msan_retval_tls) and parameters (__msan_param_tls).

Origin tracking.

MemorySanitizer can track origins (allocation points) of all uninitialized values. This behavior is controlled with a flag (msan-track-origins) and is disabled by default.

Origins are 4-byte values created and interpreted by the runtime library. They are stored in a second shadow mapping, one 4-byte value for 4 bytes of application memory. Propagation of origins is basically a bunch of "select" instructions that pick the origin of a dirty argument, if an instruction has one.

Every 4 aligned, consecutive bytes of application memory have one origin value associated with them. If these bytes contain uninitialized data coming from 2 different allocations, the last store wins. Because of this, MemorySanitizer reports can show unrelated origins, but this is unlikely in practice.

Origins are meaningless for fully initialized values, so MemorySanitizer avoids storing origin to memory when a fully initialized value is stored. This way it avoids needless overwritting origin of the 4-byte region on a short (i.e. 1 byte) clean store, and it is also good for performance.

Atomic handling.

Ideally, every atomic store of application value should update the corresponding shadow location in an atomic way. Unfortunately, atomic store of two disjoint locations can not be done without severe slowdown.

Therefore, we implement an approximation that may err on the safe side. In this implementation, every atomically accessed location in the program may only change from (partially) uninitialized to fully initialized, but not the other way around. We load the shadow _after_ the application load, and we store the shadow _before_ the app store. Also, we always store clean shadow (if the application store is atomic). This way, if the store-load pair constitutes a happens-before arc, shadow store and load are correctly ordered such that the load will get either the value that was stored, or some later value (which is always clean).

This does not work very well with Compare-And-Swap (CAS) and Read-Modify-Write (RMW) operations. To follow the above logic, CAS and RMW must store the new shadow before the app operation, and load the shadow after the app operation. Computers don't work this way. Current implementation ignores the load aspect of CAS/RMW, always returning a clean value. It implements the store part as a simple atomic store by storing a clean shadow.

Definition in file MemorySanitizer.cpp.


Define Documentation

#define DEBUG_TYPE   "msan"

Definition at line 121 of file MemorySanitizer.cpp.

#define ModRefBehavior   IntrinsicKind

Function Documentation

Create a non-const global initialized with the given string.

Creates a writable global for Str so that we can pass it to the run-time lib. Runtime uses first 4 bytes of the string to store the frame ID, so the string needs to be mutable.

Definition at line 304 of file MemorySanitizer.cpp.

References llvm::Module::getContext(), llvm::ConstantDataArray::getString(), llvm::Value::getType(), and llvm::GlobalValue::PrivateLinkage.

INITIALIZE_PASS ( MemorySanitizer  ,
"msan"  ,
"MemorySanitizer: detects uninitialized reads."  ,
false  ,
false   
)

Definition at line 291 of file MemorySanitizer.cpp.


Variable Documentation

cl::opt<bool> ClCheckAccessAddress("msan-check-access-address", cl::desc("report accesses through a pointer which has poisoned shadow"), cl::Hidden, cl::init(true)) [static]
cl::opt<bool> ClDumpStrictInstructions("msan-dump-strict-instructions", cl::desc("print out instructions with default strict semantics"), cl::Hidden, cl::init(false)) [static]
cl::opt<bool> ClHandleICmp("msan-handle-icmp", cl::desc("propagate shadow through ICmpEQ and ICmpNE"), cl::Hidden, cl::init(true)) [static]
cl::opt<bool> ClHandleICmpExact("msan-handle-icmp-exact", cl::desc("exact handling of relational integer ICmp"), cl::Hidden, cl::init(false)) [static]
cl::opt<int> ClInstrumentationWithCallThreshold("msan-instrumentation-with-call-threshold", cl::desc("If the function being instrumented requires more than ""this number of checks and origin stores, use callbacks instead of ""inline checks (-1 means never use callbacks)."), cl::Hidden, cl::init(3500)) [static]
cl::opt<bool> ClKeepGoing("msan-keep-going", cl::desc("keep going after reporting a UMR"), cl::Hidden, cl::init(false)) [static]
cl::opt<bool> ClPoisonStack("msan-poison-stack", cl::desc("poison uninitialized stack variables"), cl::Hidden, cl::init(true)) [static]
cl::opt<int> ClPoisonStackPattern("msan-poison-stack-pattern", cl::desc("poison uninitialized stack variables with the given patter"), cl::Hidden, cl::init(0xff)) [static]
cl::opt<bool> ClPoisonStackWithCall("msan-poison-stack-with-call", cl::desc("poison uninitialized stack variables with a call"), cl::Hidden, cl::init(false)) [static]
cl::opt<bool> ClPoisonUndef("msan-poison-undef", cl::desc("poison undef temps"), cl::Hidden, cl::init(true)) [static]
cl::opt<int> ClTrackOrigins("msan-track-origins", cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden, cl::init(0)) [static]

Track origins of uninitialized values.

Adds a section to MemorySanitizer report that points to the allocation (stack or heap) the uninitialized bits came from originally.

cl::opt<std::string> ClWrapIndirectCalls("msan-wrap-indirect-calls", cl::desc("Wrap indirect calls with a given function"), cl::Hidden) [static]
cl::opt<bool> ClWrapIndirectCallsFast("msan-wrap-indirect-calls-fast", cl::desc("Do not wrap indirect calls with target in the same module"), cl::Hidden, cl::init(true)) [static]

Definition at line 127 of file MemorySanitizer.cpp.

const size_t kNumberOfAccessSizes = 4 [static]

Definition at line 131 of file MemorySanitizer.cpp.

const uint64_t kOriginOffset32 = 1ULL << 30 [static]

Definition at line 125 of file MemorySanitizer.cpp.

const uint64_t kOriginOffset64 = 1ULL << 45 [static]

Definition at line 126 of file MemorySanitizer.cpp.

const uint64_t kShadowMask32 = 1ULL << 31 [static]

Definition at line 123 of file MemorySanitizer.cpp.

const uint64_t kShadowMask64 = 1ULL << 46 [static]

Definition at line 124 of file MemorySanitizer.cpp.

Definition at line 128 of file MemorySanitizer.cpp.