LLVM API Documentation
00001 //===-- X86Disassembler.h - Disassembler for x86 and x86_64 -----*- C++ -*-===// 00002 // 00003 // The LLVM Compiler Infrastructure 00004 // 00005 // This file is distributed under the University of Illinois Open Source 00006 // License. See LICENSE.TXT for details. 00007 // 00008 //===----------------------------------------------------------------------===// 00009 // 00010 // The X86 disassembler is a table-driven disassembler for the 16-, 32-, and 00011 // 64-bit X86 instruction sets. The main decode sequence for an assembly 00012 // instruction in this disassembler is: 00013 // 00014 // 1. Read the prefix bytes and determine the attributes of the instruction. 00015 // These attributes, recorded in enum attributeBits 00016 // (X86DisassemblerDecoderCommon.h), form a bitmask. The table CONTEXTS_SYM 00017 // provides a mapping from bitmasks to contexts, which are represented by 00018 // enum InstructionContext (ibid.). 00019 // 00020 // 2. Read the opcode, and determine what kind of opcode it is. The 00021 // disassembler distinguishes four kinds of opcodes, which are enumerated in 00022 // OpcodeType (X86DisassemblerDecoderCommon.h): one-byte (0xnn), two-byte 00023 // (0x0f 0xnn), three-byte-38 (0x0f 0x38 0xnn), or three-byte-3a 00024 // (0x0f 0x3a 0xnn). Mandatory prefixes are treated as part of the context. 00025 // 00026 // 3. Depending on the opcode type, look in one of four ClassDecision structures 00027 // (X86DisassemblerDecoderCommon.h). Use the opcode class to determine which 00028 // OpcodeDecision (ibid.) to look the opcode in. Look up the opcode, to get 00029 // a ModRMDecision (ibid.). 00030 // 00031 // 4. Some instructions, such as escape opcodes or extended opcodes, or even 00032 // instructions that have ModRM*Reg / ModRM*Mem forms in LLVM, need the 00033 // ModR/M byte to complete decode. The ModRMDecision's type is an entry from 00034 // ModRMDecisionType (X86DisassemblerDecoderCommon.h) that indicates if the 00035 // ModR/M byte is required and how to interpret it. 00036 // 00037 // 5. After resolving the ModRMDecision, the disassembler has a unique ID 00038 // of type InstrUID (X86DisassemblerDecoderCommon.h). Looking this ID up in 00039 // INSTRUCTIONS_SYM yields the name of the instruction and the encodings and 00040 // meanings of its operands. 00041 // 00042 // 6. For each operand, its encoding is an entry from OperandEncoding 00043 // (X86DisassemblerDecoderCommon.h) and its type is an entry from 00044 // OperandType (ibid.). The encoding indicates how to read it from the 00045 // instruction; the type indicates how to interpret the value once it has 00046 // been read. For example, a register operand could be stored in the R/M 00047 // field of the ModR/M byte, the REG field of the ModR/M byte, or added to 00048 // the main opcode. This is orthogonal from its meaning (an GPR or an XMM 00049 // register, for instance). Given this information, the operands can be 00050 // extracted and interpreted. 00051 // 00052 // 7. As the last step, the disassembler translates the instruction information 00053 // and operands into a format understandable by the client - in this case, an 00054 // MCInst for use by the MC infrastructure. 00055 // 00056 // The disassembler is broken broadly into two parts: the table emitter that 00057 // emits the instruction decode tables discussed above during compilation, and 00058 // the disassembler itself. The table emitter is documented in more detail in 00059 // utils/TableGen/X86DisassemblerEmitter.h. 00060 // 00061 // X86Disassembler.h contains the public interface for the disassembler, 00062 // adhering to the MCDisassembler interface. 00063 // X86Disassembler.cpp contains the code responsible for step 7, and for 00064 // invoking the decoder to execute steps 1-6. 00065 // X86DisassemblerDecoderCommon.h contains the definitions needed by both the 00066 // table emitter and the disassembler. 00067 // X86DisassemblerDecoder.h contains the public interface of the decoder, 00068 // factored out into C for possible use by other projects. 00069 // X86DisassemblerDecoder.c contains the source code of the decoder, which is 00070 // responsible for steps 1-6. 00071 // 00072 //===----------------------------------------------------------------------===// 00073 00074 #ifndef LLVM_LIB_TARGET_X86_DISASSEMBLER_X86DISASSEMBLER_H 00075 #define LLVM_LIB_TARGET_X86_DISASSEMBLER_X86DISASSEMBLER_H 00076 00077 #include "X86DisassemblerDecoderCommon.h" 00078 #include "llvm/MC/MCDisassembler.h" 00079 00080 namespace llvm { 00081 00082 class MCInst; 00083 class MCInstrInfo; 00084 class MCSubtargetInfo; 00085 class MemoryObject; 00086 class raw_ostream; 00087 00088 namespace X86Disassembler { 00089 00090 /// X86GenericDisassembler - Generic disassembler for all X86 platforms. 00091 /// All each platform class should have to do is subclass the constructor, and 00092 /// provide a different disassemblerMode value. 00093 class X86GenericDisassembler : public MCDisassembler { 00094 std::unique_ptr<const MCInstrInfo> MII; 00095 public: 00096 /// Constructor - Initializes the disassembler. 00097 /// 00098 X86GenericDisassembler(const MCSubtargetInfo &STI, MCContext &Ctx, 00099 std::unique_ptr<const MCInstrInfo> MII); 00100 public: 00101 00102 /// getInstruction - See MCDisassembler. 00103 DecodeStatus getInstruction(MCInst &instr, uint64_t &size, 00104 const MemoryObject ®ion, uint64_t address, 00105 raw_ostream &vStream, 00106 raw_ostream &cStream) const override; 00107 00108 private: 00109 DisassemblerMode fMode; 00110 }; 00111 00112 } // namespace X86Disassembler 00113 00114 } // namespace llvm 00115 00116 #endif