for Robot Artificial Inteligence

75. Add and Search Word - Data structure design[Tries, Difficult]

|

  • The core idea is to solve this problem with a Trie, a tree-like structure that allows us to navigate across a vocabulary of possible words, so that its root store as children all the possible first letters of all the words in the vocabulary and they in turn point to the possible second letters and so on - check the wiki link above for more info and a visual representation.

  • The advantages of this approach are clear when you start to think about navigating the tree structure only once to verify if a word is valid: does it start with ‘t’? If so, you go down that route, otherwise you do not waste time checking for all the other possible combinations; and for larger vocabularies and/or longer word the difference in performances might be massive.

  • So, in order to implement it, first of all we need a base node with at least 2 properties: children (ie: all the other nodes that start from there and lead to other words) and a boolean telling us if a word was ending there or not (eow, in my code, as in “end of word”).

  • Also note that since an array can vastly improve performance over a vector, but it is not initialised by default, we will have to do so in our constructor for our trie, WordDictionary.

  • Since we know we will only get lowercase letters and we really want (and love!) to save resources whenever it is possible, our array of pointers is going to be only 26 in size and the helper function convertChar will help us turning the characters we get in matching indexes in the 0 - 25 range.

  • And, well, that was the easy part - I am not gonna lie here -, but if you grasp it properly, the rest is definitely going to come down much more easily.

  • So we also need to implement the add method that will gradually populate our trie, building graphs out of new words: we loop through each character, check if we already have a matching node for it in the current children, we create one if not and we update our pointer root until we are done, being mindful of setting eow to true for the last word, without touching it otherwise: if for example we add, in order “pip” and “pippi”, we will have eow set to true for both the second ‘p’ and the second ‘i’ nodes.

  • And that was the intermediate part, so now catch your breath and be ready for the hard part (including looking for wild card characters, the real challenge of this kata): the search method is meant to take initially just one word, but we might be better off allowing it to receive different optional parameters too: an index of where to start to look in the string (defaulted to 0) and its companion *WordDictionary pointer root telling us what tree to use for our search (defaulted to this).

  • et’s start with the hardest part: when the character is ‘.’, the wild card option, we do not proceed through a child linearly, but we ask our Trie to check for all the children: we return any_of taking all of them and returning false first of all if the node is not there (meaning: no child present and no letter matches that path), otherwise we return true if either node->eow (it is a valid, end of word) and i == word.size() - 1 (it is actually the last character).

  • Alternatively, we just invoke search recursively (still provided node exist, otherwise we would be using root at each call and I had to debug a bit before noticing that!): same string, increased index, updated root.

  • The other cases are a bit easier: if we find a character matching no element in the current children, we return false; otherwise we update root to be root->children[c] and we either return when it is the last character or proceed with the next iteration of the query string.

  • If all else fails, we just return false :)

memset 함수를 사용하는 이유

  • 대체로 memset함수는 특정 범위에 있는 연속된 메모리에 값을 지정하고 싶을 때 사용하는데 for문보다 더 빠른 속도가 나올수가 있다.
  • 여기서 나올수가 있다라고 표현한 이유는 컴파일러 그리고 컴퓨터 아키텍처에 따라서 다르기 때문이다.
class TrieNode{
public:
    bool word;
    TrieNode* children[26];
    TrieNode()
    {
        word = false;
        memset(children, NULL,sizeof(children));
    }

};

class WordDictionary {
private:
    TrieNode* root = new TrieNode();
    bool search(const char* word, TrieNode* node)
    {
        for(int i = 0; word[i] && node; i++)
        {
            if(word[i] != '.')
            {
                node = node ->children[word[i]-'a'];
            }
            else
            {
                TrieNode* tmp = node;
                for(int j=0; j<26; j++)
                {
                    node = tmp->children[j];
                    if(search(word +i +1, node))
                    {
                        return true;
                    }
                }
            }

        }
        return node && node -> word;

    }
public:
    /** Initialize your data structure here. */
    WordDictionary() {

    }

    /** Adds a word into the data structure. */
    void addWord(string word) {
        TrieNode* node = root;
        for(char c : word)
        {
            if(!node->children[c-'a']) // 소문자 만들기
            {
                node->children[c-'a'] = new TrieNode();
            }
            node = node->children[c-'a'];
        }
        node->word = true;

    }

    /** Returns if the word is in the data structure. A word could contain the dot character '.' to represent any one letter. */
    bool search(string word) {
        return search(word.c_str(), root);
        //문자열 전체에 저장된 문자열들과 같은 내용을 담고 있는 널 종료 문자 배열 을 가리키는 포인터를 리턴한다.

        //쉽게 말해서 c_str() 부터 c_str() + size() 전 까지 우리의 문자열이 저장되어 있고, 맨 마지막 위치에는 NULL 문자가 오게 됩니다. string 객체를 널 종료 문자 배열을 받는 함수에 전달할 때 유용하게 사용할 수 있습니다.

    }
};

/**
 * Your WordDictionary object will be instantiated and called as such:
 * WordDictionary* obj = new WordDictionary();
 * obj->addWord(word);
 * bool param_2 = obj->search(word);
 */

Comments