Got a dilemma, tonight. Every post, I would like to provide the most entertainment possible for as many readers of this blog, as possible - present and future. At the time I post this, maybe 6 people read this blog. I anticipate in the (hopefully) not-too-distant future I’ll have thousands or more. It could happen. Currently, catering would be very easy since I know my audience somewhat personally. But, in the future, everyone else could be visiting from anywhere because this is the internet.
Anyway, none of this is my dilemma. The dilemma is, what exactly is entertainment?
My oldest brother, Corey, is a very smart individual. Technically my half-brother but has long since been accepted as the oldest of five male siblings; he works for IBM in a major role that utilizes smart individuals. Whenever we get together, conversation turns pseudo-geek philosopher as we discuss science, math, the universe, religion, computers, life, and death. No topic is hit upon lightly as we both can keep this up for hours since we are overly analytical…and nerds. We will thoroughly destroy a topic as we dig deep looking for the purpose and meaning of each and every minutiae. Typically I’m forced into the ‘devil’s advocate’ role in certain conversations because I feel it provokes Corey’s more expressive and argumentative analysis. He’s a natural-born teacher and I’m usually the one being schooled.
Don’t mistake my blog flow. Clearly, this post has yet to be entertaining. I get that. But I brought this bit up about my brother for a reason.
Earlier tonight we talked about how our father invented phonetic text-to-speech algorithms. Actually, the first part of the chat was just laughing at the idea our father invented anything. The second part was discussing how Google search knows that when you search the word “Nucular” you’re really trying to search for the proper “Nuclear”. Normally, one would just go to Google.com and read about how they do it. That’s not how we roll. Now stay with me, since I feel an aside is necessary here. Computers pull off some magical shit. How they do this magic will in turn explain how Google implemented the feature. And that explanation, in turn, will either detail what entertainment is or not. I know this because we’ve analyzed it to the hilt.
Corey and I have both been employed professionally in a computer software programming capacity. He still does this daily, and myself as a customer service rep, not so much lately. However, we’ve both picked up a few facts as we went along. Fact: Computer programs will only do what a programmer codes. Fact: Computer programs only correlate and follow rules as delegated by the programmer. Fact: Computers are not sentient (Sorry Anthony). Much like what you’re reading this moment, you only get to see what I’ve typed out and not something that will be generated by the program that allows typing in this blog. If a programmer somewhere determined that last sentence would be generated, coding it in would be absolutely necessary. Usually a programmer must decide and implement the code that generates the words ‘Hello World’ as their first effort with a scripting language since displaying text is the most basic thing you could do. The edit box I’m typing into right now and the generation of this text is far more complicated. In no conceivable manner would this program ever veer off and do anything beyond what I expect it to do, allowing me to finish this paragraph.
Unfortunately, the nature of programming really makes it hard for me to achieve my real dream in life. I want a robot that does my laundry. Think of an extremely sophisticated Rhoomba. Menial tasks really suck, and I would love to create a machine that would do those tasks for us humans. Robots aren’t alive, so no task would be too much. My menial task robot would have to do some very complex routines. Depressingly, getting to that end will require some carpal tunnel inducing coding. As I mentioned earlier, my robot won’t do my laundry until I’ve put that code into its hard drive.
Google search has evolved over the years and grown a huge database thanks to millions or even billions of users. Originally, Google and other internet search engines weren’t that impressive. Some kid in his mom’s basement somewhere coded a program called a web crawler. Web crawlers would connect to an internet address and usually download content or copy hyperlink data and return that information back. Probably invented to make it easier to grab porn images, search engines would eventually rely on this retrieved data for indexing and linking one web page to an address that would indexed making it easily searchable. The problem was, web crawlers didn’t grab the entire internet. Nobody could actually download the entire internet even when it was relatively small. Some addresses wouldn’t resolve or would time out, or be entirely unreachable. Indexing the entire internet wasn’t realistic.
Google’s creators at the time realized this and took it one step further. They realized if a link keeps coming back multiple times, that link is significantly relevant. If they set the web crawler to follow that link, they could repeat the relevant link process over and over until a healthy majority of the internet was being returned and resolved properly. As the internet was increasing its user base, Google made another good decision. They realized they could rely on human beings to also provide information about web pages that were useful, even if the humans didn’t realize they were doing this. Named after Google’s co-creator Larry Page, Page Rank, an algorithm used to compare relevance of browsed web pages, became a method for internet users to either manually rank a page for its relevance or, and more commonly, to monitor user’s browsing habits reporting back those visited sites to Google. While this practice was nasty, it was all legal and we wouldn’t have the Google we know and love today without it. Fun side note: Google didn’t own the patent to the PageRank algorithm, and had to give Stanford University 1.8 million shares of Google stock for its use. Stanford University sold those shares awhile back for $336 million. Meanwhile, Larry Page is worth $19 billion. See kids, math IS useful!
All this data was so much more than just feeding back links to Star Wars fan sites; it was developing the brain of the Google Search engine. More users meant more data. And Google had such a cute silly name as well as provided links to Star Wars fan sites that didn’t suck, so its popularity kept growing. Eventually the coding for the search engine would develop to include this data in a meaningful way that provided the most intelligent seeming results. Programmers used this gigantic resource of data they accumulated and were able to make rules that would parse this data looking for specific patterns allowing them to further correlate search terms with search results, allowing other rules to be implemented for generating some awesome magical shit. A programming rule is something like “If A then B else C”. Correlation is the magic you can pull off with one of these rules. For example, a cat is also a kitten. If I search for cat, please give me results with kittens, too. Then the programmer could add this rule “If A then B else C, but A could also be D and if D then B else C, but oh shit, B might be A, and if B is A then omg I just invented Pong”.
Programs don’t know that cats are like kittens. Programs don’t know anything. Someone out there in Text Searching Land developed a line or two that essentially did the following: The term Cat comes up a lot when the term kitten is also used. Dog and Cat are used a lot together, but not as much as Kitten. Kitten and Dog are used a lot, but not as much as Cat and Dog are, and definitely still not as much as Cat and Kitten. Therefore, it’s reasonable to consider Cat could just as easily be replaced by the word Kitten, but not if the term Dog is being used. This bit of code alone makes search results really decent. But, you need a lot of people typing Cat and Kitten together and people NOT typing Cat and Kitten together, put up against all the other millions of terms people would type into a search box to really nail it down. Google has this kind of information to draw from. Over time, every word became linked to every other word, and weighted values could be added in the mix to ensure certainty. Using my example above, if I type “baby cat” into a search field, I want it to reply with “kitten” and not “dog”.
Since most of the data coming in was input by humans, and humans are naturally stupid, there is also a lot of useless fluff to sort through. If I wanted to find out more on “fucking” I don’t want to know about “assholes”. But, it’s highly probably millions of users typed “Tom Cruise is a fucking asshole” into search and a programmer would have to be clever to filter this out. Add in the problem that most people were also typing “Tom Cruise is fucking gay” and now you’ve got a lot of data that links “Tom Cruise” to “gay” and “fucking” and “assholes”. While a human could look at that data and say “Yea, makes sense” a computer doesn’t KNOW what a Tom Cruise is or why he’s so fucking gay. So how does someone make a computer understand this concept?
Turns out making a computer understand is impossible. Humans had to evolve over several millenia for our computer processor-like brains to recognize patterns and interpret them into something, a.k.a. understanding. A simple game like human versus computer Tic Tac Toe isn’t coded by thinking like a human could, but instead uses calculations of victories and goes backward each move in order to make its current move. A beginner programmer could use some rules like “If player puts an X into square one, have computer put an O in square 5.” And for millions more lines of code, could put a rule in for each possible move a player and computer could make. However, to make a computer play like a human, you have to code the game to think like a computer. Make sense? A computer can calculate millions of possible moves using data available so much faster than a human could. Using this, it’s a better idea to realize that Tic Tac Toe only has so many possible outcomes that result in a victory for a computer. Far less than the millions of rules you would have to write. A developer code then code a “loop” that would force the program to calculate these victory conditions and recognize those patterns, interpret them, correlate them with the current state of the board, and what amounts to cheating really, chooses the next move based on the pattern that leads to victory. Oh, and don’t forget to program the game to know that victory is the result it wants in the end. Summing up, correlating and rules lead to interpreting patterns, resulting in what humans might just confuse for understanding. Add in another math algorithm, such as phonetics of speech, and you can match annunciation with actual words that exist. So, now if George Bush goes to a Google and types in “how do I find the nucular weapons in Iraq” the search engine recognizes that most of those things are actual words, but nucular isn’t one. “‘Nu-cu-lar’ could be ‘new-queler’, but those aren’t words either. How about ‘nuke-ular’? No, those aren’t words. How about ‘nuke-clear’? Now we’re getting warmer. I know! The word should be ‘nuclear’, an actual fucking word.” Of course, it does this practically faster than you could type the word out, especially when nucular has been matched with nuclear a million times over, that Google can simply ignore the idiots who spell it as nucular. Much like they can match Cat with the term Kitten.
My laundry may never be done by computers, because computers will never develop reasoning. If I’m going to have a computerized robot mucking about my house working with electrical equipment, or maybe even being the electrical equipment, it could very well kill me at some point. Not like Terminator tried to kill Sarah Connor, but like accidentally setting the house on fire while I’m sleeping. Much like the ability to understand things, humans have evolved a device in our brain to quickly calculate, and discriminate between right and wrong. Sadly, not every human is born with this ability, but a good majority of us know that setting houses on fire while their owner is sleeping is a bad thing. As a programmer of artificial intelligence, I would need to make sure my laundrybot would know to stop starting fires. But, again, how do I make that happen? I can’t just put code in that says “If fire starting is a likely result of doing this laundry, don’t do the laundry”. Well, I could. But that would mean instead of writing millions of lines of code giving the program the ability to win at Tic Tac Toe, I’m now writing a million billion million lines of code to decipher conditions in which fires wouldn’t be started by doing this load of laundry.
Fortunately, I’m a dreamer. And I want to one day make a program I could put in a computer so it will do menial tasks for me. In fact, forget laundry. I’m going to make a program that writes entertaining blogs, because, well, I’m not sure I figured out how to do that, yet. :)