Swift strings look identical but aren't
I recently found myself trying to debug a most perplexing problem. Two identical strings were different. How could this be?
Consider this code
print("String1: '\(string1)'")
print("String2: '\(string2)'")
print(string1 == string2)
Giving this output:
String1: '123456'
String2: '123456'
false
Wut? I was reading the strings from a file. I started doubting reality. The string were the same yet they were different.
I thought I’d try getting rid of weird white space characters:
let trimmed1 = string1.trimmingCharacters(in: .whitespacesAndNewlines)
let trimmed2 = string2.trimmingCharacters(in: .whitespacesAndNewlines)
print("String1: '\(trimmed1)'")
print("String2: '\(trimmed2)'")
print(trimmed1 == trimmed2)
Nope:
String1: '123456'
String2: '123456'
false
I finally did some spelunking and discovered the joys of the Byte Order Mark \u{FEFF}
, which is invisible.
My strings actually contained this (although since I was reading them from a file it wasn’t obvious):
let string1 = "\u{FEFF}123456"
let string2 = "123456"
Now I have a handy extension:
extension String {
var withoutBOM: String {
let bom = "\u{FEFF}"
if hasPrefix(bom) {
return String(dropFirst(bom.count))
}
return self
}
}
trimmingCharacters(in: .controlCharacters) does the trick too:
import Foundation
let string1 = "\u{FEFF}123456"
let string2 = "123456"
let trimmed1 = string1.trimmingCharacters(in: .controlCharacters)
let trimmed2 = string2.trimmingCharacters(in: .controlCharacters)
print("String1: '\(trimmed1)'")
print("String2: '\(trimmed2)'")
print(trimmed1 == trimmed2)
String1: '123456'
String2: '123456'
true
Whew! The universe is still internally consistent (I think)