Misunderstanding Computers

Why do we insist on seeing the computer as a magic box for controlling other people?
人はどうしてコンピュータを、人を制する魔法の箱として考えたいのですか?
Why do we want so much to control others when we won't control ourselves?
どうしてそれほど、自分を制しないのに、人をコントロールしたいのですか?

Computer memory is just fancy paper, CPUs are just fancy pens with fancy erasers, and the network is just a fancy backyard fence.
コンピュータの記憶というものはただ改良した紙ですし、CPU 何て特長ある筆に特殊の消しゴムがついたものにすぎないし、ネットワークそのものは裏庭の塀が少し拡大されたものぐらいです。

(original post/元の投稿 -- defining computers site/コンピュータを定義しようのサイト)

Friday, May 2, 2014

What is a user-id?

Digging through posts I have started and failed to finish, I found a sub-thread I started on the Fedora list, trying to explain the difference between a system user, as represented by a user-id, and the human whose hands are on the keyboard:
http://lists.fedoraproject.org/pipermail/users/2012-April/416380.html
When you hear/see the word "user" it's only natural to think in terms of a human user. So, when you think of a user-id, there is a tendency to think of your school ID card or your driver's license certificate, or such.

But there are lots of user-ids in use on your computer. Whose are they all? Who let them in there? Why?

Maybe we should have used Latin words for the computer jargon, instead of borrowing from living languages. Dead languages are convenient. Or maybe we just could have made up new words. In this case, virtual-user-role-to-manage-a-related-collection-of-tasks, or VURTMARCOT. Shoot, just the acronym is long enough we want to take it out behind the barn and shoot it. I don't want to be talking about a VURTMARCOT-identitfier-number every time I try to figure out what is happening in my system.

So, let's just say that most of the user-ids on your computer are VURTMARCOTs and keep using the term, "user-id", instead.

No?

Uhm, okay, first we have to explain. When your system is running, there are lots of different things going on inside it. Hundreds and thousands of things. The impression that you have that it is complicated is not incorrect. That means that there has to be programs in there to manage the complexity, so that you don't have to. Lots of programs. So many that even the programs to manage the complexity have to be managed.

(Did I really have to say that? I think so.)

So we collected some of those things according to how they are related, and we gave that collection an id number, and we called it a user-id, and we invented  the concept of a (virtual?) system user to manage those things. And we ended up with lots of those system user-ids.

Now, if the system has lots of user-ids for its use, why shouldn't you?

One login user-id for when you are playing around, one for when you are working at your new job, one for the old job, one for when you are going to the bank, one for when you are keeping track of your family records, ...

Well, it's a little inconvenient, because you really don't want to log out and back in every time you shift gears. It's not just a hassle, either, because sometimes what you were playing with turns out to be useful for work, and you want to copy-and-paste from your play area to your work area.

I've blogged about a way to do that. The information is a little old, and I should post an update, with some of the options that I've found to (sort-of) work. But not today.

So, what is a user-id? It's an identifier (both a number and a mnemonic name) that you use to keep track of what's going on in your computer. You may have more than one user-id per human user, and you have lots of user-ids that have no human user.

And you should, in fact, have at least one login user-id for administration tasks, separate from the one you usually log in on to work or whatever.

Who was the first programmer?

My wife asked me and my son to change some carpet that had two desks and a chest of drawers on it. I was not enthusiastic, my son even less so. Took two hours or so, moving the furniture around in a Japanese apartment with limited space.

My son was making quite a bit of fuss about the hassle of moving desks and chairs and other things this way and then that way and then back again, just to get the carpet in place, so I mentioned that computer programming often has similar problems to solve -- updating or changing software underneath live data in a limited space without damaging things, with minimum interruption to access.

My wife heard that and asked a very interesting question:

Who was the first programmer?

Well, that question has many answers.

Ada Lovelace, who wrote a program for Charles Babbage's analytical engine, is generally considered to be the first (modern) programmer, but she was never able to see her program run on actual hardware.

Konrad Zuse is the first person generally known to have programmed a modern computer. (His work was in wartime Germany was overshadowed by the more publicly known work on the ENIAC in the US.)

The members of the ENIAC programming team are often considered to be the first group to work regularly as computer programmers.

The job of a programmer would be most accurately described (in my opinion) as implementing abstract descriptions of processes as (or in) functioning (real) systems. Since the goal is to obtain a functioning system, debugging (fixing errors in the implementation) is part of the job.

Refining or optimizing the implementation would be an optional part of the job.

The computer industry has a habit of trying to limit the definition of a computer system to the whatever can be currently manufactured and sold, but they also have a habit of trying to be the first to expand that definition into new areas. (Call the other guys' work irrelevant until you have duplicated it, then claim you are first, faster, or some other way more deserving of customers' money.)

Anyway, I see no particular reason to limit the definition of a system to whatever the computer industry is currently manufacturing and selling.

Expanding the definition of a system a bit, it becomes clear that teaching is part of the programming process. I don't want to call it an example of programming, because teachers cannot directly do the implementation part. The students themselves have to do that. (Or, rather, if the teachers don't allow the students to do the actual work of implementation, they impede, rather than assist, the processes of learning.)

Learning, on the other hand, is a programming process. So is invention. Practically every thing we do involves programming.

And now we see why my wife asked the question.

So, who were the first programmers? Can I suggest it goes back as far as human history?

Maybe even before human history, back to God? (Or, back to the parameters of the big bang, if you think that invoking gods is an anthropomorphic activity?)

Thursday, May 1, 2014

Things to Fix in E-mail, Newsgroups, and Mailing Lists

I've posted a bit of soapbox-ing on the Debian User list over the last few weeks. One of my rants was about e-mail and mailing lists and the fact that they must change in both form and function. I was asked off-list about what I had in mind, and I guess it's just as well that I should post here, so that the answer is public, but not cluttering up the list further.

(More of the thoughts that have lead me to my current opinions here.)

First-off, current e-mail is actually pretty good, for people who are willing to understand the interface and take responsibility for their own use. Both imap and pop provide methods for checking message headers and deleting messages without actually downloading messages. At the GUI level, Sylpheed, for instance, has the
message -> receive -> remote mailbox 
menu item, which allows you to sort by subject, sender, and date, in addition to deleting or downloading individual messages.

Since spam tends to clump together under a sort, sorting helps greatly at handling spam without actually setting up complex filtering systems. I can clear about a thousand spam messages in about fifteen minutes to a half-hour and not worry about false positives, etc. (This won't work for everyone -- It took me several years to tune my mental filters and visual scanning techniques.)

But, speaking of spam, there you have it. Senders of unsolicited commercial messages are on questionable moral and ethical ground, and the unsolicited pseudo-commercial messages we call spam, mostly fraud or worse, are definitely cases of irresponsible use. The tendency to respond to messages that shouldn't be responded to is also a case of not-really responsible use, whether answering a message about some unknown wanting to give you money or joining in a mail-list flame war.

If current e-mail protocols and usage were sufficient for mail lists, I think we would have no need for either twitter or facebook. The biggest problem with e-mail and mailing lists is that there are always going to be irresponsible users. Even in the early days of the internet, when the users were all military researchers and academics, you'd have (for example) the occasional professor deciding he needed to get the broadest possible audience for something he was doing and address-span mailing every address in his address files.

It's not so much the volume as the lack of human judgement about which addresses to use.

With physical mail, the volume issue is significantly offset by the cost of sending physical junk mail. That's the biggest reason it usually takes more than a week of failing to empty your mailbox to cause it to explode. But if we think of a way to make mass e-mail cost, e-mail is suddenly less valuable because we then start worrying about the cost of our daily conversations.

And then there's the problem of knowing who it really is you are talking to.

We have these things called certificates, that are supposed to provide us assurance of the identity of the other guy. But they don't really work because of companies that would rather make money than provide a service, and we can't use them to tell for sure whether the message we just got really is from the person it says it is from. Without such methods, all we have to work from is the contents of the "From" header and the contents of other headers that ostensibly describe the path that the message took on its way to our in-box. And all those headers are easily forged.

With a physical envelope, we really have no way of knowing that the return address on the envelope is for real, unless the letter is sent as registered mail. (And even then we aren't quite sure.) But the content of the letter has out-of-band clues, like the hand-writing, that can help us be sure.

In current e-mail, we have neither registered mail nor handwriting. The closest thing we have to registered mail is the logs on the servers that the message has passed through. If you don't any of the servers on the path, you can't trust the path itself.

Other out-of-band stuff like pictures require html format messages, which are dead easy to use a variety of forgery techniques with. The problems are inherent in the methods we use to encode the data in the messages. ASCII and its descendants, including Unicode, don't really provide good, standardizable methods for burying identifying information in the messages. There are cases where we don't want identifying messages in our messages, but there are cases when we very much want to know who we are talking with and want them to know who they are talking with.

Back to the size issues, individual messages are generally not all that large, but when you have a lot of messages from a mail list or newsgroup, the size adds up quickly. Non-requested advertisements add up even more quickly.

Mailing list and newsgroup browsers (or the mailing list mode of your MUA) should not download the thread headers unless you request a thread listing. (And they should respect the thread-related headers, to avoid breaking threads.) And they should not download a message unless you actually request the message.

But it's often hard to tell whether you want to download a message until you've read it and decided you know who sent it, or decide you are interested in it. Since you can't read it without downloading, you're often stuck with downloading anyway. (I'm only successful in my methods of checking the headers from years of practice. If I try that with a new newsgroup or mail list, however, it's going to take a little while to learn that group/list's patterns.)

If we can put reasonably useful identifying headers in a message, and if our MUA can read those headers, we can at least make meaningful judgments about who wrote the message. And that can help us decide whether to download a message, and help us reduce our bandwidth use. (And save us time.)

The more I've used e-mail, the more I find myself storing it the same way I store newsgroup and mailing list messages -- by thread. (And that is one of the reasons I can generally identify spam just by the headers fairly quickly.)

These identifying headers require the cooperation of the mail servers and internet service providers. But the providers and servers are not interested in their users' efficiency. That does nothing to help their bottom line, and in many cases (think wireless) what is inefficient for users makes providers money. Counter-motivation here.

Until we start serving our own mail, and managing our own connections to the internet more directly, e-mail, newsgroups, and mailing lists will remain as they are, rivers where users are dragged along in the flow, instead of tools for the benefit of users. But the technology to allow ordinary users to do so is still not there.

(I think this post is getting a little closer to what I've been trying to say about the internet, and computers, for a long time, but I'm still not quite there.)