How tokenizers convert human language into numbers LLMs understand. Deep dive into Byte Pair Encoding (BPE), why character-level tokenization fails