Skip to content

Conversation

@bengbengbalabalabeng
Copy link
Contributor

Purpose of the pull request

Related: #811

Incorrect matching logic in FileTypeUtils.

What's changed?

Changes the logic to use startsWith() to correctly match the file magic numbers.

Checklist

  • I have read the Contributor Guide.
  • I have written the necessary doc or comment.
  • I have added the necessary unit tests and all cases have passed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in FileTypeUtils where image type recognition was failing due to incorrect matching logic. The original code used exact string matching on a 56-character hex string against short magic numbers (6-8 characters), which always failed. The fix changes to a prefix-based matching approach using startsWith().

Changes:

  • Modified getImageType() to use startsWith() for prefix matching instead of exact HashMap.get() lookup
  • Added IMAGE_TYPE_MARK_MIN_LENGTH constant and adjusted length validation logic
  • Added comprehensive unit tests for the FileTypeUtils class

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
fesod-sheet/src/main/java/org/apache/fesod/sheet/util/FileTypeUtils.java Fixed image type detection logic by implementing prefix matching and improving input validation
fesod-sheet/src/test/java/org/apache/fesod/sheet/util/FileTypeUtilsTest.java Added comprehensive test coverage for all code paths including edge cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 68 to 73
for (Map.Entry<String, ImageData.ImageType> entry : FILE_TYPE_MAP.entrySet()) {
String magicNumber = entry.getKey();
if (hexString.startsWith(magicNumber)) {
return entry.getValue();
}
}
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iteration order of HashMap entries is not guaranteed. While not an issue with the current two magic numbers, this could cause incorrect matches if future image types are added with overlapping prefixes. Consider using LinkedHashMap to maintain insertion order or implementing explicit ordering logic (e.g., longest prefix first) to ensure deterministic and correct matching behavior.

Copilot uses AI. Check for mistakes.
Comment on lines 68 to 73
for (Map.Entry<String, ImageData.ImageType> entry : FILE_TYPE_MAP.entrySet()) {
String magicNumber = entry.getKey();
if (hexString.startsWith(magicNumber)) {
return entry.getValue();
}
}
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iteration-based approach with startsWith() has O(n) time complexity for each lookup, compared to O(1) for the previous HashMap.get() approach. For better performance with the new logic, consider using a more efficient data structure like a Trie or maintaining a sorted list by magic number length for early exit optimization.

Copilot uses AI. Check for mistakes.
private static final char[] DIGITS = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'
};
private static final int IMAGE_TYPE_MARK_LENGTH = 28;
private static final int IMAGE_TYPE_MARK_MIN_LENGTH = 4;
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IMAGE_TYPE_MARK_MIN_LENGTH is set to 4 bytes, but JPEG's magic number "ffd8ff" only requires 3 bytes. The test test_getImageType_tooShort expects null for a 3-byte JPEG magic number, which would incorrectly reject valid JPEG headers. Either IMAGE_TYPE_MARK_MIN_LENGTH should be 3 to accommodate JPEG, or the JPEG magic number definition should be extended to require 4 bytes.

Suggested change
private static final int IMAGE_TYPE_MARK_MIN_LENGTH = 4;
// Must be at least the length (in bytes) of the shortest magic number (currently JPEG: "ffd8ff" = 3 bytes).
private static final int IMAGE_TYPE_MARK_MIN_LENGTH = 3;

Copilot uses AI. Check for mistakes.

@Test
void test_getImageType_tooShort() {
byte[] input = new byte[] {(byte) 0xFF, (byte) 0xD8, (byte) 0xFF};
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test creates a 3-byte array with valid JPEG magic bytes (0xFF, 0xD8, 0xFF) and expects null, but JPEG's magic number is "ffd8ff" which is exactly 3 bytes. This test appears to validate incorrect behavior where valid JPEG images would be rejected. The test should either expect PICTURE_TYPE_JPEG or use a different magic number that genuinely requires 4 bytes.

Suggested change
byte[] input = new byte[] {(byte) 0xFF, (byte) 0xD8, (byte) 0xFF};
byte[] input = new byte[] {(byte) 0x00, (byte) 0x01};

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant