Skip to content

数据缺乏移植性 #204

@dyx2025

Description

@dyx2025

src/logstorage/log_store.cpp
如果数据要迁移到其他机器处理,会遇上移植性问题。

  1. 不同平台,int 类型的大小可能不一样。iFileID 和 iOffset 的数据类型从 int 改为 uint32_t 会更好。
  2. 不同机器的内存字节序可能不一样。在小端机序列化的数据,在大端机反序列后会不一样。例如 iFileID = 0x11223344,在小端机序列化后(phxpaxos代码的序列化是memcpy),再到大端机反序列化会变成 0x44332211(phxpaxos代码的反序列化是memcpy) 。

原代码

void LogStore :: GenFileID(const int iFileID, const int iOffset, const uint32_t iCheckSum, std::string & sFileID)
{
    char sTmp[sizeof(int) + sizeof(int) + sizeof(uint32_t)] = {0};
    memcpy(sTmp, (char *)&iFileID, sizeof(int));
    memcpy(sTmp + sizeof(int), (char *)&iOffset, sizeof(int));
    memcpy(sTmp + sizeof(int) + sizeof(int), (char *)&iCheckSum, sizeof(uint32_t));

    sFileID = std::string(sTmp, sizeof(int) + sizeof(int) + sizeof(uint32_t));
}

void LogStore :: ParseFileID(const std::string & sFileID, int & iFileID, int & iOffset, uint32_t & iCheckSum)
{
    memcpy(&iFileID, (void *)sFileID.c_str(), sizeof(int));
    memcpy(&iOffset, (void *)(sFileID.c_str() + sizeof(int)), sizeof(int));
    memcpy(&iCheckSum, (void *)(sFileID.c_str() + sizeof(int) + sizeof(int)), sizeof(uint32_t));

    PLG1Debug("fileid %d offset %d checksum %u", iFileID, iOffset, iCheckSum);
}

leveldb的coding的逻辑已经考虑到数据移植性问题,可以参考其实现。
https://github.com/google/leveldb/blob/main/util/coding.cc

void PutFixed32(std::string* dst, uint32_t value) { // 把 uint32_t  的数字序列化 
  char buf[sizeof(value)];
  EncodeFixed32(buf, value);
  dst->append(buf, sizeof(buf));
}

https://github.com/google/leveldb/blob/main/util/coding.h

inline void EncodeFixed32(char* dst, uint32_t value) {  // 以小端字节序序列化数据
  uint8_t* const buffer = reinterpret_cast<uint8_t*>(dst);

  // Recent clang and gcc optimize this to a single mov / str instruction.
  buffer[0] = static_cast<uint8_t>(value);
  buffer[1] = static_cast<uint8_t>(value >> 8);
  buffer[2] = static_cast<uint8_t>(value >> 16);
  buffer[3] = static_cast<uint8_t>(value >> 24);
}

https://github.com/google/leveldb/blob/main/util/coding.h

inline uint32_t DecodeFixed32(const char* ptr) { // 不管内存字节序时大端还是小端,都能正确反序列化出 uint32_t 的数据
  const uint8_t* const buffer = reinterpret_cast<const uint8_t*>(ptr);

  // Recent clang and gcc optimize this to a single mov / ldr instruction.
  return (static_cast<uint32_t>(buffer[0])) |
         (static_cast<uint32_t>(buffer[1]) << 8) |
         (static_cast<uint32_t>(buffer[2]) << 16) |
         (static_cast<uint32_t>(buffer[3]) << 24);
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions