Android研發中對String的思考（源代碼分析）

1、經常使用創建方式思考：

String text = "this is a test text ";

上面這一句話實際上是運行了三件事
1、聲明變量 String text;
2、在內存中開辟空間（內存空間一）
3、將變量的text的引用指向開辟的內存空間

當有

text = "this is a change text";

這一句話運行了兩件事
1、在內存中開辟空間
2、將變量text 的引用指向新開辟的內存空間
3、內存空間一此時依舊存在，這就是說明了String的不可變性

測試實踐一：

String text = "oo";
 //循環體系將字符串拼接
for (int i = 0; i < 90000; i++) {
      text+=i;
}

這段小程序在我的應用中運行了8s的時間，太長了，原因非常easy。就是不斷的在反復創建新的對象與內存空間。同一時候還要不時的釋放未使用的內存空間

測試實踐二：

String text = "oo";
//創建字符緩沖區
StringBuilder builder = new StringBuilder(text);
//循環體系將字符串拼接
for (int i = 0; i < 100000; i++) {
      builder.append(i);
 }

運行這一小段代碼，運行的次數遠遠超于實踐一中。然而時間僅僅使用了 30 ms，大優化了性能，原因僅僅是由于其在僅僅是開辟了一個緩存空間，每次僅僅把新的數據加入到緩存中。

2、創建String 的構造簡析

2.1、String string = new String ();

這個方案是創建了一個空的String，在內存空間中開辟了一個空內容的地址，實際上也沒有什么用，源代碼中有這種描寫敘述

/**
     * Initializes a newly created {@code String} object so that it represents
     * an empty character sequence.  Note that use of this constructor is
     * unnecessary since Strings are immutable.
     */
    public String() {
        this.value = "".value;
    }

盡管是空的String（是""而不是null。“”出是占有內存空間，僅僅只是內容是空的）。但在其構造方法中依舊調用了value方法。也就是空字符串的value,
所謂value

 /** The value is used for character storage. */
    private final char value[];

2.2、String string = new String (" test ");

這樣的方法來創建String，與我們經常使用創建方式分析中的思路一至，聲明變量，開辟空間。賦值引用

源代碼中這樣操作

public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

毫無疑問，它的構造方法中首先將其轉為了相應的字節數組中的數據。
同一時候獲取其相應的Hash值

 /** Cache the hash code for the string */
    private int hash; // Default to 0

2.3、將char數組中的內容構造為String類型

 char[] chars = new char[]{"q","e","e"，"q","e","e"};
        //將char數組中的內容構造為String類型
        String string = new String(chars);

        //構造的字符串為 qeeqee

這里是依賴字符數組來創建構造String 。從源代碼的角度來看，毫無疑問在其相應的構造中首先將其轉為了相應字符數組

 public String(char value[]) {
            this.value = Arrays.copyOf(value, value.length);
        }

能夠看到這里并沒有直接通過 Arrays的copyOf創建了一個新的字符數組，并賦值于this.value,

public static char[] copyOf(char[] original, int newLength) {
            char[] copy = new char[newLength];
            System.arraycopy(original, 0, copy, 0,
                            Math.min(original.length, newLength));
            return copy;
        }

2.4、將字符數組中的部分數據構造為String

 char[] chars = new char[]{"q","e","e"，"q","e","e"};
        //構造字符串
        String string = new String(chars,0,2);

        //構造的字符串為 qe 
        //也就是說將字符數組中的部分字符 構造成String

從其構造源代碼來看：

public String(char value[], int offset, int count) {
            if (offset < 0) {
                throw new StringIndexOutOfBoundsException(offset);
            }
             if (count <= 0) {
                 if (count < 0) {
                    throw new StringIndexOutOfBoundsException(count);
                }
            if (offset <= value.length) {
                    this.value = "".value;
                    return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
                throw new StringIndexOutOfBoundsException(offset + count);
            }
            this.value = Arrays.copyOfRange(value, offset, offset+count);
        }

也就是說當我們通過這樣的方法來構造String 的時候，傳入的參數中構造字符起始位置 offset小于零。那么將會出異常，構造字符個數 count 小于0，

那么意味取出的字符數組中的長度為0，也就是構造一個 ""字符串，同一時候賦值其字符數組。

當構造起始位置字符個婁都不小于0的時候，當起始位置與取出的字符長度設置不合理（所謂不合理指的是例字符數組長度為 5，

構造字符串的時候傳入的起始位置為3 。構造長度為5，也就是說在字符數組中從3號位置取值，向后延取5個值，原字符數組長度必定不夠）的時候，

拋出異常

當傳入的參數合理的時候。則通過Arrays的copyOfRange方法創建一個新的字符數組空間，并賦值 this.value

public static char[] copyOfRange(char[] original, int from, int to) {
            int newLength = to - from;
            if (newLength < 0)
                throw new IllegalArgumentException(from + " > " + to);
            char[] copy = new char[newLength];
            System.arraycopy(original, from, copy, 0,
                            Math.min(original.length - from, newLength));
            return copy;
        }

2.6、通過 unicode數組來構造String

int[] charIntArray = new int[]{67,68,69,70};

String  string = new String (charIntArray,0,charIntArray.length);

//相應的字符串 CEDF

從源代碼角度來看：（詳細的關于unicode將在下文中簡述）

public String(int[] codePoints, int offset, int count) {
                if (offset < 0) {
                    throw new StringIndexOutOfBoundsException(offset);
                }
                if (count <= 0) {
                    if (count < 0) {
                        throw new StringIndexOutOfBoundsException(count);
                    }
                    if (offset <= codePoints.length) {
                        this.value = "".value;
                        return;
                    }
                }
                // Note: offset or count might be near -1>>>1.
                if (offset > codePoints.length - count) {
                    throw new StringIndexOutOfBoundsException(offset + count);
                }

                final int end = offset + count;

                // Pass 1: Compute precise size of char[]
                int n = count;
                for (int i = offset; i < end; i++) {
                    int c = codePoints[i];
                    if (Character.isBmpCodePoint(c))
                        continue;
                    else if (Character.isValidCodePoint(c))
                        n++;
                    else throw new IllegalArgumentException(Integer.toString(c));
                }

                // Pass 2: Allocate and fill in char[]
                final char[] v = new char[n];

                for (int i = offset, j = 0; i < end; i++, j++) {
                    int c = codePoints[i];
                    if (Character.isBmpCodePoint(c))
                        v[j] = (char)c;
                    else
                        Character.toSurrogates(c, v, j++);
                }

                this.value = v;
            }

2.7、將字節數組構建為String

byte[] newByte = new byte[]{4,5,6,34,43};
        //構造字符 
        String string = new String (newByte,0,newByte,"UTF-8");
        //參數一 相應的字節數組
        //參數二 參數三 構造字符串的字節范圍
        //參數四 構造字符串的編碼方式 這里使用的 為UTF - 8;

從其源代碼的角度來看

public String(byte bytes[], int offset, int length, String charsetName)
                throws UnsupportedEncodingException {
                    if (charsetName == null)
                        throw new NullPointerException("charsetName");
                    checkBounds(bytes, offset, length);
                    this.value = StringCoding.decode(charsetName, bytes, offset, length);
            }

假設傳入的編碼方式為空。則直接拋出異常

而以下的checkBounds方法僅僅是做了一些安全性檢驗

private static void checkBounds(byte[] bytes, int offset, int length) {
                if (length < 0)
                    throw new StringIndexOutOfBoundsException(length);
                if (offset < 0)
                    throw new StringIndexOutOfBoundsException(offset);
                if (offset > bytes.length - length)
                    throw new StringIndexOutOfBoundsException(offset + length);
            }

然后通過方法StringCoding.decode 方法創建了一個新的字符數組，并賦值與 this.value

2.8、將字節數組中的一部分數據構建為String

        byte[] newByte = new byte[]{4,5,6,34,43};
        //構造字符 
        String string = new String (newByte,"UTF-8");

3、經常使用處理String操作的方法分析

3.1、取出一個String中的指定角標下的一個字符

 String text = "thisisapicture";
        //取出角標索引為0位置的字符
        char cahrString = text . charAt(0);

        //這里取出來的字符為 t

從源代碼角度來看

public char charAt(int index) {
            if ((index < 0) || (index >= value.length)) {
                throw new StringIndexOutOfBoundsException(index);
            }
            return value[index];
        }

從其源代碼看來，當我們剛剛創建一個String 對象的時候，String對象的內容則會被轉為內存空間中的對應的字數組，而在這里，

則是從其相應的內存數組中拿到相應角標下的相應字符

3.2、對字符串的切割

3.2.1、 split方法進行切割

 String textString = "q,q,w,e,e,r,f,g3,g";
        //將字符串以“,”為基準進行切割
        String[] stringArray = textString.split(",");
        //得到對應的字符串數組 
        // [q, q, w, e, e, r, f, g3, g]

從源代碼的角度來看

public String[] split(String regex) {
            return split(regex, 0);
        }

實際中調用了兩個參數的重載方法

public String[] split(String regex, int limit) {
            /* fastpath if the regex is a
             (1)one-char String and this character is not one of the
                RegEx's meta characters ".$|()[{^?
*+\\", or
             (2)two-char String and the first char is the backslash and
                the second is not the ascii digit or ascii letter.
             */
            char ch = 0;
            if (((regex.value.length == 1 &&
                     ".$|()[{^?
*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
                     (regex.length() == 2 &&
                      regex.charAt(0) == '\\' &&
                      (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
                      ((ch-'a')|('z'-ch)) < 0 &&
                      ((ch-'A')|('Z'-ch)) < 0)) &&
                    (ch < Character.MIN_HIGH_SURROGATE ||
                     ch > Character.MAX_LOW_SURROGATE))
                {
                    int off = 0;
                    int next = 0;
                    boolean limited = limit > 0;
                    ArrayList<String> list = new ArrayList<>();
                    while ((next = indexOf(ch, off)) != -1) {
                        if (!limited || list.size() < limit - 1) {
                            list.add(substring(off, next));
                            off = next + 1;
                        } else {    // last one
                            //assert (list.size() == limit - 1);
                            list.add(substring(off, value.length));
                            off = value.length;
                            break;
                        }
                    }
                    // If no match was found, return this
                    if (off == 0)
                        return new String[]{this};

                    // Add remaining segment
                    if (!limited || list.size() < limit)
                        list.add(substring(off, value.length));

                    // Construct result
                    int resultSize = list.size();
                    if (limit == 0) {
                        while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                            resultSize--;
                        }
                    }
                    String[] result = new String[resultSize];
                    return list.subList(0, resultSize).toArray(result);
                }
                return Pattern.compile(regex).split(this, limit);
            }

當看到這段源代碼的時候。有點頭疼，由于它有點長。事實上細致一看，也只是如此

能夠看到方法內有一個if，假設條件為true。那么就使用indexOf（）打開循環體系，推斷后substring()截取，
//第一步部分：當regex的長度為1且不是“.$|()[{^?*+\\”中的時，為真
(regex.value.length == 1 &&".$|()[{^?

*+\\".indexOf(ch = regex.charAt(0)) == -1)
//第二部分：當長度為2時且第一個字符為“\”轉義字符。第二個字符不是字符0-9 a-z A-Z 以及utf-16之間的字符

從if能夠看出假設regex內容為一個非正則匹配符或者是轉以后的特殊字符時，採用indexOf()+substring()處理。

否則使用正則表達式 Pattern.compile(regex).split(this, limit)

也就是說，我們在使用Split方法對String字符串進行切割的時候，不僅能夠以一個普通的字符為標準進行切割，

還能夠使用一個正則表達式進行切割

而當使用的切割符號正好為正則表達式中的某些符號的時候。須要正則轉義才可以得到正確的結果

3.3 獲取字符串中指定角標字符的Unicode編碼

String text = "ABCD";
    //獲取0號位置也就是'A'字符的編碼
    int uniCode = text.codePointAt(0);

從源代碼來看：

public int codePointAt(int index) {
        if ((index < 0) || (index >= value.length)) {
            throw new StringIndexOutOfBoundsException(index);
        }
        return Character.codePointAtImpl(value, index, value.length);
    }

實際的操作體則是Character這個類在發揮作用：Character.codePointAtImpl(value, index, value.length)

也就是直接將String text相應的字符數組。以及要獲取字符角標，以及字符數組長度傳參

Unicode給世界上每一個字符分配了一個編號。編號范圍從0x000000到0x10FFFF。

編號范圍在0x0000到0xFFFF之間的字符，為經常使用字符集，稱BMP(Basic Multilingual Plane)字符。

編號范圍在0x10000到0x10FFFF之間的字符叫做增補字符(supplementary character)。

Unicode主要規定了編號。但沒有規定假設把編號映射為二進制，UTF-16是一種編碼方式，或者叫映射方式，

它將編號映射為兩個或四個字節，對BMP字符。它直接用兩個字節表示，

對于增補字符，使用四個字節，前兩個字節叫高代理項(high surrogate)。范圍從0xD800到0xDBFF，

后兩個字節叫低代理項(low surrogate)，范圍從0xDC00到0xDFFF，UTF-16定義了一個公式。能夠將編號與四字節表示進行相互轉換。

Java內部採用UTF-16編碼。char表示一個字符。但僅僅能表示BMP中的字符，對于增補字符，須要使用兩個char表示。一個表示高代理項。一個表示低代理項。

使用int能夠表示隨意一個Unicode字符。低21位表示Unicode編號，高11位設為0。整數編號在Unicode中一般稱為代碼點(Code Point)，表示一個Unicode字符，與之相對，另一個詞代碼單元(Code Unit)表示一個char。

而在Character這個類中，相應的靜態操作方法實則為：

static int codePointAtImpl(char[] a, int index, int limit) {
        char c1 = a[index];
        if (isHighSurrogate(c1) && ++index < limit) {
            char c2 = a[index];
            if (isLowSurrogate(c2)) {
                return toCodePoint(c1, c2);
            }
        }
        return c1;
    }

實際在這種方法中的第一步是從text相應的字符數組中取出相應的角標的字符，假設不符合當中if的條件，那么將直接獲取的是字符本身。

也就是字符本身的unicode編碼是本身
在if的推斷條件中.運行推斷取出的字符是否在isHighSurrogate這種方法所限定的范圍內，

public static boolean isHighSurrogate(char ch) {
        // Help VM constant-fold; MAX_HIGH_SURROGATE + 1 == MIN_LOW_SURROGATE
        return ch >= MIN_HIGH_SURROGATE && ch < (MAX_HIGH_SURROGATE + 1);
    }

MIN_HIGH_SURROGATE utf-16 編碼中的 unicode 高代理項代碼單元的最小值。高代理項也稱為前導代理項。
MAX_HIGH_SURROGATE utf-16 編碼中的 unicode 高代理項代碼單元的最大值

假設在。則運行 isLowSurrogate方法的限定推斷

public static boolean isLowSurrogate(char ch) {
        return ch >= MIN_LOW_SURROGATE && ch < (MAX_LOW_SURROGATE + 1);
    }

此方法能夠理解為確定給定char值是否為一個Unicode低代理項代碼單元
也能夠理解為推斷相應字符是否在 0xDC00到0xDFFF 范圍的低代理項。
假設在，則運行toCodePoint方法計算其相應的Unicode值，也就是依據高代理項high和低代理項low生成代碼單元

public static int toCodePoint(char high, char low) {
        // Optimized form of:
        // return ((high - MIN_HIGH_SURROGATE) << 10)
        //         + (low - MIN_LOW_SURROGATE)
        //         + MIN_SUPPLEMENTARY_CODE_POINT;
        return ((high << 10) + low) + (MIN_SUPPLEMENTARY_CODE_POINT
                                       - (MIN_HIGH_SURROGATE << 10)
                                       - MIN_LOW_SURROGATE);
    }

3.4、獲取字符串中指定角標索引前一個元素的代碼點

String textString = "ABCDEF";
        //這里是獲取三號位置前面一位，也就是二號位 C 的uniCode編碼 
        int uniCode = textString.codePointBefore(3);

從源代碼來看：實際實操作的還是字符串相應的字符數組，操作方法依舊是Character這個類所定義的靜態方法

public int codePointBefore(int index) {
            int i = index - 1;
            if ((i < 0) || (i >= value.length)) {
                throw new StringIndexOutOfBoundsException(index);
            }
            return Character.codePointBeforeImpl(value, index, 0);
        }

而在Character這個類中的方法中則是直接從相應的字符數組中取出角標前一位的字符，

然后判定是否在編碼區中的高低代理區，假設在。那么就計算返回相應的編碼

static int codePointBeforeImpl(char[] a, int index, int start) {
        char c2 = a[--index];
        if (isLowSurrogate(c2) && index > start) {
            char c1 = a[--index];
            if (isHighSurrogate(c1)) {
                return toCodePoint(c1, c2);
            }
        }
        return c2;
        }

3.5、獲取字符串中指定范圍內的字符串Unicode代碼點數

String textString = "ABCDEF";
        //這里獲取的是整個字符串所相應的 代碼點數
        int uniCodeCount = textString.codePointCount(0, textString.length());

從源代碼角度來看實際還是在操作String相應的字符數組。負責操作的是 Character這個類

public int codePointCount(int beginIndex, int endIndex) {
        if (beginIndex < 0 || endIndex > value.length || beginIndex > endIndex) {
            throw new IndexOutOfBoundsException();
        }
            return Character.codePointCountImpl(value, beginIndex, endIndex - beginIndex);
        }

這里直接傳入字符串相應的字符數組，以及范圍的開始點，范圍長度

static int codePointCountImpl(char[] a, int offset, int count) {
            int endIndex = offset + count;
            int n = count;
            for (int i = offset; i < endIndex; ) {
                if (isHighSurrogate(a[i++]) && i < endIndex &&
                    isLowSurrogate(a[i])) {
                    n--;
                    i++;
                }
            }
                return n;
        }

能夠看出來在這里面也是一個循環推斷每個字符是否在高代理項區與低代理項區，然后進行計數

3.6、比較兩個字符串是否相等

        String text1 = "ABCDEF";
        String text2 = "ABCDEFG";
        //比較
        int compareNumber = text1.compareTo(text2);

事實上就是依次比較兩個字符串ASC碼。

假設兩個字符的ASC碼相等則繼續興許比較，否則直接返回兩個ASC的差值。

假設兩個字符串全然一樣。則返回0

從源代碼角度來看：

 public int compareTo(String anotherString) {
            int len1 = value.length;
            int len2 = anotherString.value.length;
            int lim = Math.min(len1, len2);
            char v1[] = value;
            char v2[] = anotherString.value;

            int k = 0;
            while (k < lim) {
                char c1 = v1[k];
                char c2 = v2[k];
                if (c1 != c2) {
                    return c1 - c2;
                }
                k++;
            }
            return len1 - len2;
        }

在這里先字符串本身相應的字符數組的長度，然后再通過方法anotherString.value.length方法獲取比較對象String的長度

3.7 忽略大寫和小寫比較兩個字符串

String text1 = "ABCDEF";
        String text2 = "ABCDEFG";
        //比較
        int compareNumber = text1.compareToIgnoreCase(text2);

從源代碼角度來看

public int compareToIgnoreCase(String str) {
            return CASE_INSENSITIVE_ORDER.compare(this, str);
        }

這里直接使用了CASE_INSENSITIVE_ORDER的compare方法來比較兩個字符串。

而CASE_INSENSITIVE_ORDER是String類中定義的一個比較器
（1.2版本號開始使用）
public static final Comparator<String> CASE_INSENSITIVE_ORDER

3.8 將兩個字符串拼接為一個字符串

 String text1 = "ABCDEF";
        String text2 = "ABCDEFG";
        //這里將 text2 拼接到text1后面 
        String newString  = text1.concat(text2);

將兩個字符串拼接到一起，能夠使用 text1+text2這種方法，僅僅只是是在性能上有一點點的不合適

從源代碼角度來看

public String concat(String str) {
            int otherLen = str.length();
            if (otherLen == 0) {
                return this;
            }
            int len = value.length;
            char buf[] = Arrays.copyOf(value, len + otherLen);
            str.getChars(buf, len);
            return new String(buf, true);
        }

第一步在這里先進行字符串判定，假設要拼接的字符串是""，那么拼接后依舊是本身，所在這里直接return。
第二步通過 Arrays.copyOf方法來創新一個新的指定大小的數組

 public static char[] copyOf(char[] original, int newLength) {
                char[] copy = new char[newLength];
                System.arraycopy(original, 0, copy, 0,
                                 Math.min(original.length, newLength));
                return copy;
            }
            //這里創建一個新長度的數組空間并將原來的字符數組COPY進去

第三步通過 String 的 getChars方法將兩個字符串相應的字符數組拼接
第四步構造新的字符串返回

3.9 查找一個字符在字符串中第一次出現的位置

 String text = "ABCD ";
        //查找 "A" 在字符串text中第一次出現的位置 
        int index = text.indexOf("A");

從源代碼角度來看

 public int indexOf(String str) {
            return indexOf(str, 0);
        }

        //再看indexOf(String str, int fromIndex)方法

 public int indexOf(String str, int fromIndex) {
         return indexOf(value, 0, value.length,
                    str.value, 0, str.value.length, fromIndex);
        }

上述indexfOf(String str)方法中調用indexOf(String str,int fromIndex)方法，然后傳入0，也就是說將從String的開始位置查找指定字符在字符串中出現的位置，

而在indexOf(String str,int fromIndex)這種方法中則調用了indexOf的多參數方法，這時分別傳入
            value 父字符串的字符數組
            0 父字符串的有效字符起始角標索引這里傳入0，也就是整個父字符串都可作為操作對象
            value.length 父字符串的字符數組長度
            str.value 子字符串的字符數組
            0 子字符串的有效字符起始角標索引
str.value.length 子字符串的字符數組長度
fromIndex 在父字符串的查找的開始位置

再看多參數方法

static int indexOf(char[] source, int sourceOffset, int sourceCount,
                char[] target, int targetOffset, int targetCount,
                int fromIndex) {
            if (fromIndex >= sourceCount) {
                return (targetCount == 0 ? sourceCount : -1);
            }
            if (fromIndex < 0) {
                fromIndex = 0;
            }
            if (targetCount == 0) {
                return fromIndex;
            }

            char first = target[targetOffset];
            int max = sourceOffset + (sourceCount - targetCount);

            for (int i = sourceOffset + fromIndex; i <= max; i++) {
                /* Look for first character. */
                if (source[i] != first) {
                    while (++i <= max && source[i] != first);
                }

                /* Found first character, now look at the rest of v2 */
                if (i <= max) {
                    int j = i + 1;
                    int end = j + targetCount - 1;
                    for (int k = targetOffset + 1; j < end && source[j]
                            == target[k]; j++, k++);

                    if (j == end) {
                        /* Found whole string. */
                        return i - sourceOffset;
                    }
                }
            }
            return -1;
        }

在這種方法中，參數最多，那么說明處理的方法就在這里面
先分析參數
char[] source, 父String 相應字符數組
int sourceOffset, 父String 被使用起始索引
int sourceCount, 父String 相應字符數組長度
char[] target, 子String 相應字符數組
int targetOffset, 子String 被使用超始索引
int targetCount, 子String 相應字符數組長度
int fromIndex 檢索起始位置

從上到下
if (fromIndex >= sourceCount) {
return (targetCount == 0 ? sourceCount : -1);
}
假設查找子String標識的在父View中的起始位置大于等于父String的長度
則終止操作。返回0 或者 -1
當子字符串的長度為0的時候。則返加0，此時子字符串為”“ 內容為空的空字符串
當子字符串的長度不為0的時候，返回-1，也就是說沒有查找到

if (fromIndex < 0) {
fromIndex = 0;
}
當檢索起始位置小于0，將起始位置默認設為0

if (targetCount == 0) {
return fromIndex;
}
當檢索的子String相應的字符數組長度為0時。返回傳入的檢索位置

char first = target[targetOffset];
取出檢索子String的第一個字符

int max = sourceOffset + (sourceCount - targetCount);
父String被檢索起始位 + （父String相應字符數組長度 - 子String相應字符數組長度）

for (int i = sourceOffset + fromIndex; i <= max; i++) {
。。

。。
}
假設循環體系中的條件沒有被滿足。那說明沒有查找到相應的字符，返回-1

在循環體系中
/* Look for first character. */
if (source[i] != first) {
while (++i <= max && source[i] != first);
}
上面這段代碼讓我感覺到無語
/* Found first character, now look at the rest of v2 */
if (i <= max) {
int j = i + 1;
int end = j + targetCount - 1;
for (int k = targetOffset + 1; j < end && source[j]
== target[k]; j++, k++);

if (j == end) {
/* Found whole string. */
return i - sourceOffset;
}
}

3.10 推斷一個字符串中是否包括指定的某一個字符

String text = "ABCE";
        //推斷text中是否包括 "A"
        boolean isContans = text.contains("A");

從源代碼的角度來看：

public boolean contains(CharSequence s) {
            return indexOf(s.toString()) > -1;
        }

能夠看到事實上實際中還是String 的indexOf在發揮實際的作用，這里調用的indexOf(String str)方法，當查找到相應的字符時，會返回這個字符在字符串中的角標索引，角標索引從0 開始肯定大于-1，與-1相比較，返回true
假設沒有找到，indexOf方法返回的是 -1，終于contains方法返回的是 false，假設傳入的是 ""空字符串。indexOf方法會返回0，終于contains方法返回的是true

3.11 比較字符串到指定的CharSequence 序列是否同樣

（2016-9-22 18：43 更新）

 String text1 = "ABCDEF";
        //比較序列
        boolean isExit = text1.contentEquals("AABCDEF");

從源代碼的角度來看

        public boolean contentEquals(StringBuffer sb) {
            return contentEquals((CharSequence)sb);
        }
        public boolean contentEquals(CharSequence cs) {
            // Argument is a StringBuffer, StringBuilder
            if (cs instanceof AbstractStringBuilder) {
                if (cs instanceof StringBuffer) {
                    synchronized(cs) {
                       return nonSyncContentEquals((AbstractStringBuilder)cs);
                    }
                } else {
                    return nonSyncContentEquals((AbstractStringBuilder)cs);
                }
            }
            // Argument is a String
            if (cs instanceof String) {
                return equals(cs);
            }
            // Argument is a generic CharSequence
            char v1[] = value;
            int n = v1.length;
            if (n != cs.length()) {
                return false;
            }
            for (int i = 0; i < n; i++) {
                if (v1[i] != cs.charAt(i)) {
                    return false;
                }
            }
            return true;
        }

這里兩個方法重載，傳類型不一樣。CharSequence類型和StringBuffer類型

而實際操作起作用的是contentEquals(CharSequence cs)方法

而在這種方法中

if (cs instanceof AbstractStringBuilder) {
                     if (cs instanceof StringBuffer) {
                          synchronized(cs) {
                            return nonSyncContentEquals((AbstractStringBuilder)cs);
                       }
            } else {
                 return nonSyncContentEquals((AbstractStringBuilder)cs);
          }
    }

能夠看出來，當我們傳入的參數 cs 是AbstractStringBuilder的實例的時候運行if內部的方法而且假設是StringBuffer的實例，加同步鎖，

而在方法nonSyncContentEquals中

private boolean nonSyncContentEquals(AbstractStringBuilder sb) {
                    char v1[] = value;
                    char v2[] = sb.getValue();
                    int n = v1.length;
                    if (n != sb.length()) {
                        return false;
                    }
                    for (int i = 0; i < n; i++) {
                        if (v1[i] != v2[i]) {
                            return false;
                        }
                    }
                    return true;
                }

這種方法中吧，感覺到無語。首先先獲取參數的相應字符數組的長度，然后比較字符數組的長度，假設長度不同樣，那么直接返回false，也就是這兩個字符串的序列號肯定不一樣，
假設長度一樣，再一循環比較每個字符

假設不滿足if中的推斷語句后。在向下運行，

// Argument is a String
                if (cs instanceof String) {
                    return equals(cs);
                }

假設傳參是String的實例的時候
則調用了String的equals方法進行比較

假設不滿足上面的條件。再向下運行

 // Argument is a generic CharSequence
                    char v1[] = value;
                    int n = v1.length;
                    if (n != cs.length()) {
                        return false;
                    }

能夠看到這一步是直接比較的字符參數的長度，假設長度不一樣，那么其相應的序列號肯定也就不一樣了
假設不滿足if條件

   for (int i = 0; i < n; i++) {
                        if (v1[i] != cs.charAt(i)) {
                            return false;
                        }
                    }

能夠看到繼續向下。則是比較每個字符，不同樣返回false。
假設不滿足上述方法。則最后返回true

3.12 感覺無力的 copyValueOf方法

（2016-9-22 18：43 更新）

            
        String string = "4444";
        
        char[] charTest = new char[]{'d','r','w','q'};
        
        String newString = string.copyValueOf(charTest);

構建結果盡然是： drwq

不忍心看源代碼，可是還得看下，一看。更是無語

         /**
         * Equivalent to {@link #valueOf(char[])}.
         *
         * @param   data   the character array.
         * @return  a {@code String} that contains the characters of the
         *          character array.
         */
        public static String copyValueOf(char data[]) {
            return new String(data);
        }

擦，盡然是調用了String的一個構造來構造了一個新的 String

3.13 推斷字符串是否以指定的字符或者字符串開頭

（2016-9-23 8：33 更新）

 String string = "andThisIs4";
        //推斷是否以a開頭
        boolean flag = string.startsWith("a");
        //輸出結果為 true

從源代碼角度來看：

        public boolean startsWith(String prefix) {
            return startsWith(prefix, 0);
        }

這里調用其重載方法
對于這種方法來說，傳入兩個參數
String string = "thisIsAndAnd";
//推斷從二號索引位置是否是以this開頭
boolean flag = string.startsWith("this",2);
//結果為 flase

public boolean startsWith(String prefix, int toffset) {
            char ta[] = value;
            int to = toffset;
            char pa[] = prefix.value;
            int po = 0;
            int pc = prefix.value.length;
            // Note: toffset might be near -1>>>1.
            if ((toffset < 0) || (toffset > value.length - pc)) {
                return false;
            }
            while (--pc >= 0) {
                if (ta[to++] != pa[po++]) {
                    return false;
                }
            }
            return true;
        }

從上到下：
//獲取父String 相應的字符數組
char ta[] = value;
//賦值檢索父String的起始位置
int to = toffset;
//獲取子String 相應的字符數組
char pa[] = prefix.value;
//定義變量標識
int po = 0;
//獲取子String 相應字符數組的長度
int pc = prefix.value.length;
//假設檢索父String的位置小于0 返回false
//假設
if ((toffset < 0) || (toffset > value.length - pc)) {
return false;
}
//循環比較
while (--pc >= 0) {
if (ta[to++] != pa[po++]) {
return false;
}
}
return true

3.14 推斷字符串是否以指定的字符或者字符串結尾

（2016-9-23 8：33 更新）

String string = "andrThisIs4";
        //推斷是否是以 4 結尾
        boolean flag = string.endsWith("4");
        //結果 為true

從源代碼角度來看

public boolean endsWith(String suffix) {
                return startsWith(suffix, value.length - suffix.value.length);
            }

不忍直視。它盡然調用了 startsWith方法來運行推斷

3.15 比較兩個字符串中的內容是否同樣

（2016-9-23 8：33 更新）

        String text1 = "ABCD";
        String text2 = "ABCE";
        //比較兩個字符串是否相等
        boolean flage = text1.equals(text2);

text1 text2 分別指向兩個堆內存空間
當使用 text1 == text2 來推斷時，比較的是text1 與 text2這兩個變量相應的引用地址是否相等。也就是說其指向的內存地址是否一樣

從源代碼角度來看 equals方法

public boolean equals(Object anObject) {
             if (this == anObject) {
                    return true;
                }
            if (anObject instanceof String) {
                String anotherString = (String)anObject;
                int n = value.length;
                if (n == anotherString.value.length) {
                    char v1[] = value;
                    char v2[] = anotherString.value;
                    int i = 0;
                    while (n-- != 0) {
                        if (v1[i] != v2[i])
                            return false;
                        i++;
                    }
                    return true;
                }
            }
            return false;
        }

參數格式要求是 Object類型

在這種方法中，其先比較的兩個變量指向的堆內存空間地址是否是一樣，假設一樣，那么直接返回true ，
當 String text1 = "ABC";在內存空間開辟空間，并賦值變量text1
當定義 Sring text2 ="ABC";時。虛擬機會先去內存空間的常量區中尋找是否有相應內容的空間，假設有。那么就直接指向那邊，在這里，因創建的text1 內容與 text2的內容一致，所以在text2創建的時候，會將text2的引用指向直接指向text1指向的內存空間

在接下來的一步中。假設傳入的比較參數是String實例。將其強轉為String 類型
然后通過循環體制一一比較每個字符，

3.16 獲取String默認格式下的字節編碼

（2016-9-27 14：33 更新）

        String text1 = "ABCDEF";
        //獲取默認編碼下的字節數組 
        byte[] bytes = text1.getBytes();

從源代碼角度來看：

        public byte[] getBytes() {
            return StringCoding.encode(value, 0, value.length);
        }

這里間接使用了StringCoding類的靜態方法 encode方法

        static byte[] encode(char[] ca, int off, int len) {
            String csn = Charset.defaultCharset().name();
            try {
                // use charset name encode() variant which provides caching.
                return encode(csn, ca, off, len);
            } catch (UnsupportedEncodingException x) {
                warnUnsupportedCharset(csn);
            }
            try {
                return encode("ISO-8859-1", ca, off, len);
            } catch (UnsupportedEncodingException x) {
                // If this code is hit during VM initialization, MessageUtils is
                // the only way we will be able to get any kind of error message.
                MessageUtils.err("ISO-8859-1 charset not available: "
                                 + x.toString());
                // If we can not find ISO-8859-1 (a required encoding) then things
                // are seriously wrong with the installation.
                System.exit(1);
                return null;
            }
        }

在這種方法中首先通過 String csn = Charset.defaultCharset().name(); 獲取默認的編碼方式（“UTF-8”）
然后調用其重載的方法來進行編譯，
假設出現異常。則使用 ISO-8859-1 的編碼方式來進行解析，假設再出現異常，系統異常退出

3.17 獲取String指定編碼格式下的字節數組

（2016-9-27 14：33 更新）

        String text1 = "abef";
        byte[] bytes = null;
        
        try {
            bytes = text1.getBytes("UTF-8");
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        }

可見這種方法是獲取默認編碼格式下的字節數組方法的重載方法

從源代碼角度來看

        public byte[] getBytes(String charsetName)
            throws UnsupportedEncodingException {
                if (charsetName == null) throw new NullPointerException();
                return StringCoding.encode(charsetName, value, 0, value.length);
        }

能夠看到這里直接調用的是 StringCoding 的四個參數的重載方法（getBytes()方法調用的是三個參數的encode方法）

3.18 將String 中的部分字符拷貝到指定的字符數組中去

（2016-9-27 14：33 更新）

        String text1  ="ABCDEF";
        //目標字符數組 
        char[] chars = new char[10];
        //拷貝
        text1.getChars(0,text1.length(),chars,0);
        //參數一 參數二  拷貝String中字符的范圍
        //參數三  目標字符數組
        //參數四  目標字符數組中的存儲起始位置

從源代碼角度來看

public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
            if (srcBegin < 0) {
                throw new StringIndexOutOfBoundsException(srcBegin);
            }
            if (srcEnd > value.length) {
                throw new StringIndexOutOfBoundsException(srcEnd);
            }
            if (srcBegin > srcEnd) {
                throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
            }
            System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
        }

其直接調用 System 的拷貝數組的方法將數據拷貝

        public static native void arraycopy(Object src,  int  srcPos,
                                        Object dest, int destPos,
                                        int length);

對于System的arraycopy方法來說，其直接調用 JNI層方法實現

在上述調用 System的arraycopy方法的時候傳入參數
參數一（value）父級String相應的字符數組
參數二（srcBegin）父級String開始拷貝數據的起始位置
參數三（dst）目標數組
參數四（dstBegin）目標數組存儲數據的起始位置
參數五（srcEnd - srcBegin）截取父String中字符的長度

3.19 將其它數據類型轉為String類型（String的靜態方法valueOf()方法）

（2016-10-8 14：00 更新）

3.19.1

    //將boolean類型轉為String類型
    String newString = String.valueOf(true);

源代碼中這樣描寫敘述

    public static String valueOf(boolean b) {
        return b ?
 "true" : "false";
    }

源代碼中這樣寫到，我能夠說我也是有點醉了

3.19.2

    //將字符類型轉為String類型
    
    char textChar = 'd';

    String newString = String.valueOf(textChar);

源代碼中這樣來描寫敘述：

    public static String valueOf(char c) {
        char data[] = {c};
        return new String(data, true);
    }

在valueOf方法中直接構造了一個char數組，然后再通過String的構造方法將char數組構造成為一個新的String。只是在這里使用的這個String構造讓我感到有點無語

   /*
    * Package private constructor which shares value array for speed.
    * this constructor is always expected to be called with share==true.
    * a separate constructor is needed because we already have a public
    * String(char[]) constructor that makes a copy of the given char[].
    */
    String(char[] value, boolean share) {
        // assert share : "unshared not supported";
        this.value = value;
    }

能夠看得到這是一個私有的方法，傳入的第二個參數沒有使用到

3.19.3

    //將int類型數據轉為String類型
    int textInt = 2345;
    String newString = String.valueOf(textInt);

從源代碼中來看：

    public static String valueOf(int i) {
        return Integer.toString(i);
    }

事實上是 Integer類發揮了實際操作的作用
也就能夠推理到

3.19.4

    將float類型數據轉為String。實際上是Float.toString(f),方法發揮作用
    源代碼中這樣描寫敘述 ：
    public static String valueOf(float f) {
        return Float.toString(f);
    }
    //將double類型數據轉為String類型
    public static String valueOf(double d) {
        return Double.toString(d);
    }
    //將long類型數據轉換為String
    public static String valueOf(long l) {
        return Long.toString(l);
    }

也就是說其基本數據類型相應的對象類的toString方法在發揮著實際的操作作用

3.20 去掉字符串前后的空格（trim()方法）

（2016-10-9 14：00 更新）

    String text = "  abce  ";
    //去掉空格
    String newString = text.trim();
    //生成新的字符串 "abce"

在源代碼中：

    public String trim() {
        int len = value.length;
        int st = 0;
        char[] val = value;    /* avoid getfield opcode */

        while ((st < len) && (val[st] <= ' ')) {
            st++;
        }
        while ((st < len) && (val[len - 1] <= ' ')) {
            len--;
        }
        return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
    }

基本實現思路是定義角標索引標識 int st, int len ，

接下來就是兩個循環分別檢索記錄字符串的開頭空格的位置，與字符串的結束空格的位置，最后調用推斷邏輯。

當st=0,len=value.length,說明字符串的關部與尾部沒有空格，直接返回本身，假設不為上述的值。st>0，說明字符串開頭有空格，len<value.length,說明字符串結尾有空格，則調用substring方法對字符串進行截取

3.21 截取固定角標范圍的字符串（substring()方法）

（2016-10-10 18：00 更新）

    String text = "dfdfabcef";
    //截取 "dfdf"
    String newString = text.substring(0,4);

從源代碼的角度來看

    public String substring(int beginIndex, int endIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        if (endIndex > value.length) {
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        int subLen = endIndex - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }

3.21.1 對于方法substring(int beginIndex,int endIndex)，傳入參數beginIndex（截取字符串的開始位置），endIndex（截取字符串的結束位置）

3.21.2 在方法的內部就對這兩個參數的范圍先做了范圍推斷，beginIndex開始位置最小也是0吧，不可能有負位，endIndex最長也僅僅能等于父字符串的長度（value.length (value是父String相應的字符數組)）吧，不可能超出父字符串的長度而去截取

3.21.3 然后截取的子字符串的長度應為 int length = endIndex - beginIndex;也就是我們所說的不包括頭包括尾

3.21.4 假設子字符串的長度length <0 那么說明傳入的開始位置與結束位置是不合理的

3.21.5 然后最后通過 String 的三個參數的構造方法構造了新的字符串返回

官方這樣簡述：

    String text = "abcdefg";
    //截取 "bcdefg"
    String newText = text.substring(1);

源代碼角度來看

    public String substring(int beginIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        int subLen = value.length - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);
    }

能夠看到不管是 substring(int beginIndex,int endIndex)方法。還是substring(int beginIndex)方法，
終于發揮操作的是 String 的一個三參數的構造方法 String(char value[], int offset, int count)
然后實質就是將一個字符數組從指定的位置開始截取固定長度的字符從而構造成為一個新的String

未完...

posted @ 2017-08-20 20:18 lytwajue 閱讀(1092) 評論(0) 收藏舉報

刷新頁面返回頂部

lytwajue

Android研發中對String的思考（源代碼分析）

1、經常使用創建方式思考：

2、創建String 的構造簡析

2.1、String string = new String ();

2.2、String string = new String (" test ");

2.3、將char數組中的內容構造為String類型

2.4、 將字符數組中的部分數據構造為String

2.6、 通過 unicode數組來構造String

2.7、將字節數組構建為String

2.8、 將字節數組中的一部分數據構建為String

3、經常使用處理String操作的方法分析

3.1、 取出一個String中的指定角標下的一個字符

3.2、 對字符串的切割

3.2.1、 split方法進行切割

3.3 獲取字符串中指定角標字符的Unicode編碼

3.4、獲取字符串中指定角標索引前一個元素的代碼點

3.5、獲取字符串中指定范圍內的字符串Unicode代碼點數

3.6、比較兩個字符串是否相等

3.7 忽略大寫和小寫比較兩個字符串

3.8 將兩個字符串拼接為一個字符串

3.9 查找一個字符在字符串中第一次出現的位置

3.10 推斷一個字符串中是否包括指定的某一個字符

3.11 比較字符串 到指定的CharSequence 序列是否同樣

3.12 感覺無力的 copyValueOf方法

3.13 推斷字符串是否以指定的字符或者字符串開頭

3.14 推斷字符串是否以指定的字符或者字符串結尾

3.15 比較兩個字符串中的內容是否同樣

3.16 獲取String默認格式下的字節編碼

3.17 獲取String指定編碼格式下的字節數組

3.18 將String 中的部分字符拷貝到指定的字符數組中去

3.19 將其它數據類型轉為String類型（String的靜態方法valueOf()方法）

3.20 去掉字符串前后的空格（trim()方法）

3.21 截取固定角標范圍的字符串 （substring()方法）

2.4、將字符數組中的部分數據構造為String

2.6、通過 unicode數組來構造String

2.8、將字節數組中的一部分數據構建為String

3.1、取出一個String中的指定角標下的一個字符

3.2、對字符串的切割

3.11 比較字符串到指定的CharSequence 序列是否同樣

3.14 推斷字符串是否以指定的字符或者字符串結尾

3.19 將其它數據類型轉為String類型（String的靜態方法valueOf()方法）

3.21 截取固定角標范圍的字符串（substring()方法）