Check if String matches GSM7 or ISO-8859-1 (Latin 1) Java

There are a couple of really neat methods to check whether a String is contained in Charset in the java.nio.charset package. The only problem with this is that when the check needs to be done on the front end (GWT) you'll run into problems when using this package. Should you want to check on the backend if a String is of a particular charset you can use this:

Charset.forName(CharEncoding.ISO_8859_1).newEncoder().canEncode("Some string")

This will of course check if "Some string" is contained in the ISO-8859-1 (Latin 1) charset.

If you ran into the same problem as I did where you need to do the check on the frontend then you can use the methods I wrote below to check whether a String is either GSM7 or ISO-8859-1.

The declarations below define the extended GSM7 character set as well as all available characters for the Latin 1 charset.

    private static char[] extendedGsmChars = {'^','{','}','\','[','~',']','|','€'};
    private static Character[] latin1Chars = {'!', '"', '#', '$', '%', '&', ''', '(', ')', '*', '+', ',', '-', '.', '/',
            '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@', 'A', 'B', 'C', 'D', 'E',
            'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[',
            '\', ']', '^', '_', '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q',
            'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '|', '}', '~', '¡', '¢', '£', '¤', '¥', '¦', '§', '¨', '©',
            'ª', '«', '¬', '­', '®', '¯', '°', '±', '²', '³', '´', 'µ', '¶', '·', '¸', '¹', 'º', '»', '¼', '½', '¾', '¿',
            'À', 'Á', 'Â', 'Ã', 'Ä', 'Å', 'Æ', 'Ç', 'È', 'É', 'Ê', 'Ë', 'Ì', 'Í', 'Î', 'Ï', 'Ð', 'Ñ', 'Ò', 'Ó', 'Ô', 'Õ',
            'Ö', '×', 'Ø', 'Ù', 'Ú', 'Û', 'Ü', 'Ý', 'Þ', 'ß', 'à', 'á', 'â', 'ã', 'ä', 'å', 'æ', 'ç', 'è', 'é', 'ê', 'ë',
            'ì', 'í', 'î', 'ï', 'ð', 'ñ', 'ò', 'ó', 'ô', 'õ', 'ö', '÷', 'ø', 'ù', 'ú', 'û', 'ü', 'ý' ,'þ', 'ÿ'};

No surprise as to what the method below achieves 🙂

public static boolean isContainedInLatin1Charset(String message) {
	ArrayList latin1CharsetArrayList = new ArrayList<Character>(Arrays.asList(latin1Chars));
	for (int i = 0; i < message.length(); i++) {
		char character = message.charAt(i);
		if (!latin1CharsetArrayList.contains(character)) {
			return false;
		}
	}
	return true;
}

Surprise surprise 🙂


public static boolean isContainedInGsm7Charset(CharSequence str0) {
        if(str0 == null) {
            return true;
        } else {
            int len = str0.length();

            for(int i = 0; i < len; ++i) {
                char c = str0.charAt(i);
                if((c < 32 || c > 95) && (c < 97 || c > 126)) {
                    switch(c) {
                        case 'n':
                        case 'f':
                        case 'r':
                        case '¡':
                        case '£':
                        case '¤':
                        case '¥':
                        case '§':
                        case '¿':
                        case 'Ä':
                        case 'Å':
                        case 'Æ':
                        case 'Ç':
                        case 'É':
                        case 'Ñ':
                        case 'Ö':
                        case 'Ø':
                        case 'Ü':
                        case 'ß':
                        case 'à':
                        case 'ä':
                        case 'å':
                        case 'æ':
                        case 'è':
                        case 'é':
                        case 'ì':
                        case 'ñ':
                        case 'ò':
                        case 'ö':
                        case 'ø':
                        case 'ù':
                        case 'ü':
                        case 'Γ':
                        case 'Δ':
                        case 'Θ':
                        case 'Λ':
                        case 'Ξ':
                        case 'Π':
                        case 'Σ':
                        case 'Φ':
                        case 'Ψ':
                        case 'Ω':
                        case '€':
                            break;
                        default:
                            return false;
                    }
                }
            }

            return true;
        }
    }