Part 5: Make your own data types / 自分自身のデータ型を作る

What is a data type? / データ型とは？

A data type is a specific collection of data with a specific set of operations. For example, an integer is a collection of bits with arithmetic operations, and a vector is a collection of three real numbers with operations like addition, cross product, or normalization.

データ型は、ある決まった一組の演算に従う特定のデータの集まりです。例えば、整数は算術演算に従うビットの集まりであり、ベクトルは和、外積、規格化などの演算に従う３実数の集まりです。

Python allows the definition of new data types that can do anything that the built-in types can do. From a user's point of view there is no difference. In fact, many of the data types used in this course are defined in Python: vectors, text files, interpolating functions, etc. It is also possible to define new data types in C; the array data type from the module Numeric is an example.

Python では、新しいデータ型を定義することが許されており、組込み型に可能な事であれば何でもすることができます。使用者から見れば差異はありません。実際、このコースで使われるデータ型の多くは Python 中で定義されています：ベクトル、テキストファイル、内挿関数、などがそうです。新しいデータ型を C 言語で定義することも可能です； Numeric モジュールの配列データ型がその一例です。

Python provides a set of standard operations that any data type can use if appropriate. These include arithmetic, indexing, function call, etc. In addition, data types can provide arbitrary operations in the form of methods. Methods are like functions, but they depend on a specific data object. You have already seen many examples: append is a method defined for lists, close is a method defined for files, etc.

Python には、妥当である限り任意のデータ型によって使用可能な、一組の標準演算があります。算術演算、添字付け、関数呼び出し、などです。さらに、データ型はメソッド (method) という形式で、任意の演算を提供することができます。メソッドは関数に似ていますが、特定のデータオブジェクトに依存する点が異なります。実はすでに多くの例を見てきました： append はリストに定義されたメソッド、close はファイルに定義されたメソッド、などです。

There are many reasons for defining new data types. They help to keep programs readable - it is much clearer to write (a+b)/2 for two vectors a and b than to write something like vector_scale(vector_add(a, b), 0.5), as many programming languages require. Type definitions also help to reduce the dependence between different parts of a program. A program using vectors doesn't have to know how vectors store their data (they could use a list, a tuple, an array, or three separate variables). So if the storage method is changed for some reason, other modules will not be affected.

新しいデータ型を定義する理由はたくさんあります。それによって、プログラムを読み易くができます -- 二つのベクトル a と b について、(a+b)/2 と書く方が、多くのプログラム言語が要求するように vector_scale(vector_add(a, b), 0.5) と書くよりもずっと明解です。型定義によって、プログラムの異なる部分間の相互依存を小さくすることもできます。ベクトルを使用するプログラムは、ベクトルがデータを格納する方法（リスト、タプル、配列、別々の三変数、のいずれも使えます）を知る必要はありません。格納方法が何らかの理由で変更された場合でも、他のモジュールは影響を受けずに済む訳です。

Object-oriented programming / オブジェクト指向プログラミング

It is possible to write a complete program exclusively by defining data types, ranging from low-level general-purpose data types like vectors to high-level data types describing application-specific objects (e.g. molecules, force fields, wavefunctions, etc.). This technique is known as object-oriented programming. Experience has shown that it is generally better than the traditional procedural style (structuring code according to functions and subroutines), because it results in code that is easier to understand and easier to extend and modify due to a greater independence between the different parts of a complete program. Python supports most of the techniques that are commonly used in object-oriented programming.

ベクトルのような低級で汎用のデータ型から、応用が特定されたオブジェクトを表す高級のデータ型（例えば、分子、力場、波動関数など）まで、データ型を定義することのみによってプログラム全体を書くことも可能です。この技法は、オブジェクト指向プログラミングとして知られています。これは、（コードを関数とサブルーチンから構成する）伝統的な手続き型の流儀よりも一般に優れていることが経験的に分かっています。プログラム全体中の異なる部分間がより独立しているために、理解し易く拡張と変更が容易なコードが得られるからです。 Python は、オブジェクト指向プログラミングで共通に使われる技術の大部分に対応しています。

The most difficult part of writing large object-oriented software systems is deciding on the data types to be used. As a rule of thumb, data types representing mathematical entities (such as arrays or functions) and data types representing physical entities (molecules, wavefunctions, etc.) are a good choice. But more abstract data types are equally important, for examples data types that represent common data structures (lists, stacks, etc.) or algorithms. Object-oriented design is best learned by experience; whenever you write a non-trivial program (i.e. more than a few lines), consider doing it in an object-oriented way. There is also an extensive literature on the subject.

大きなオブジェクト指向ソフトウエアシステムを書く際に最も難しいのは、使用するデータ型を決定する段階です。経験則によれば、数学的なもの（配列や関数など）や、物理的なもの（分子や波動関数など）は、データ型で表すのに適しています。しかし、もっと抽象的なデータ型、例えば汎用のデータ構造（リスト、スタックなど）またはアルゴリズムを表すデータ型、も同様に重要です。オブジェクト指向デザインを学ぶには、経験を積むのが一番です；書き方が自明でない（すなわち、２〜３行以上の）プログラムを書く時は常に、オブジェクト指向に則って書いてみることを考えましょう。また、この話題に関しては沢山の文献もあります。

Classes / クラス

A definition of a new data type is called a class. A class defines all the operations of a type in the form of methods (standard operations like arithmetic are mapped to methods with special names). It also defines the initialization of a new object.

新しいデータ型の定義は、クラスと呼ばれます。クラスは、データ型の持つ全ての演算をメソッドとして定義します。（算術演算のような標準的演算は、特別な名前を持つメソッドに割り当てられます。）新しいオブジェクトの初期化も定義します。

The following example shows a part of the definition of the class Vector. Only initialization, addition, and length calculation are shown explicitly, and some operations are simplified (less general). Look at the source code of the module Scientific.Geometry to see the complete definition.

次の例は、Vector クラスの定義の一部です。ここに示すのは、初期化、加法、長さ計算だけであり、なかには簡単化されている（一般性を減じている）演算もあります。完全な定義については、Scientific.Geometry モジュールのソースコードを見てください。

import Numeric

class Vector:

    def __init__(self, x, y, z):
        self.array = Numeric.array([x,y,z])

    def __add__(self, other):
	sum = self.array+other.array
	return Vector(sum[0], sum[1], sum[2])

    def length(self):
	return Numeric.sqrt(Numeric.add.reduce(self.array*self.array))

Methods are defined like functions, and also behave much like functions. However, their first argument has a special meaning: it stands for the object on which the method is called. This argument is by convention called self, but you could use any other name instead. In the method call v.length() (assuming v is a vector), the variable self gets the value of v.

メソッドは関数の様に定義され、しかも振る舞いも関数と殆ど同様です。ただし、メソッドの第一引数は特別な意味を持っています：それが表すのは、そのメソッドが呼び出される対象となるオブジェクトです。この引数は習慣的に self と書かれますが、代りにどんな名前を使っても構いません。 v.length()（v はベクトルとします）というメソッド呼び出しの際には、変数 self は v の値を受取ります。

The methods whose names begin and end with a double underscore have a special meaning; they are not normally called explicitly (although they can). The most important special method is __init__, which is called immediately after an object has been created. The expression Vector(1., 0., 1.) creates a new vector object and then calls its method __init__, which stores the three coordinates in an array that is assigned to a local variable of the new object.

名前の始めと終りが二重下線のメソッドは、特殊な意味を持ちます；それらは普通は明示的には呼び出されません（可能ではありますが）。最も重要な特殊メソッドは __init__ で、オブジェクトが生成された直後に呼ばれます。 Vector(1., 0., 1.) と書くと、それは新しいベクトルオブジェクトを生成し、次にそのメソッドである __init__ を呼び、その新しいオブジェクトの局所変数に割当てられた配列に３つの座標を格納します。

The arithmetic operations are also implemented as special methods. The expression a+b is equivalent to a.__add__(b), and the other operations have similar equivalents; see the Python Language Reference for details.

算術演算も、特殊メソッドとして使うようになっています。 a+b という表現は、a.__add__(b) と等価ですし、他の演算も類似の等価表現を持ちます；詳細は、 Python Language Reference を参照して下さい。

There are more special methods that implement indexing, copying, printing, etc. Only the methods that make sense must be implemented, and only if the default behaviour is not sufficient. For vectors, for example, it makes sense to define a printed representation that shows the values of the coordinates. This is achieved by adding another special method:

まだ他にも、添字付け、コピー、印字などに関する特殊メソッドがあります。意味のあるメソッドのみを作るべきですし、それもデフォルトの振舞いが十分でない場合だけにするべきです。例えばベクトルについては、座標値を印字する記法を定義するのは有意義です。これは、特殊メソッドを一つ追加すればできます：

    def __repr__(self):
        return 'Vector(%s,%s,%s)' % (`self.array[0]`,
    				     `self.array[1]`,`self.array[2]`)

Attributes / 属性

An object in Python can have any number of attributes, which are much like variables, except that they are attached to a specific object, whereas variables are attached to modules or functions. In fact, variables defined in modules are nothing but attributes of module objects. The notation for accessing attributes is always object.attribute. Method names are attributes too, just like function names are variables.

Python のオブジェクトは、任意の数の属性を持てます。属性は変数とよく似ていますが、属性が特定のオブジェクトに付随するのに対し、変数はモジュールあるいは関数に付随する点が異なります。実際、モジュール中で定義される変数は、モジュールオブジェクトの属性に他ならないのです。属性にアクセスするときには、object.attribute という記法をいつも用います。メソッド名も属性です。これは関数名が変数であることと同様です。

Unlike many other object-oriented languages, Python does not provide access control to attributes. Any code can use and even change any attribute in any object. For example, you can run

import Numeric
Numeric.sqrt = Numeric.exp

to make sqrt behave like exp. Obviously this is not a good idea, but Python does not try to protect you against your own stupidity. Of course there is a certain chance of accidentally changing an attribute, but in practice this is not a problem.

他の多くのオブジェクト指向言語とは違って、Python には属性へのアクセス制限がありません。どんなコードでも、任意のオブジェクト中のあらゆる属性を使用し、変更することさえも可能です。例えば、

import Numeric
Numeric.sqrt = Numeric.exp

を実行して、sqrt が exp のように振る舞うようにすることもできます。明らかにこれは良い考えではありません。しかし、Python はあなた自身の愚かさからあなたを保護してくれようとはしません。もちろん、たまたま間違って属性を変更してしまう可能性もいくらかはありますが、実際上は問題にはなりません。

"Everything is an object": the Python universe / "全てがオブジェクト"：Python の世界

Python is a very consistent language. Its world view consists of nothing but objects, names, and name spaces. All data is kept in objects, but modules, functions, classes, and methods are also objects. Objects can be assigned to names, and names reside in name spaces. Every object has an associated name space for its attributes. In addition, functions and methods provide a temporary name space during execution (for local variables).

Python は、とても一貫性のある言語です。その全体像は、オブジェクト、名前、名前空間だけから成っています。全てのデータはオブジェクトに保管されます。ただし、モジュール、関数、クラス、メソッドもオブジェクトです。オブジェクトには名前を付与することができ、名前は名前空間中に存在しています。あらゆるオブジェクトには、その属性用の名前空間が附随しています。さらに、関数とメソッドは、（局所変数用の）一時的な名前空間を、その実行中に持ちます。

There are a few rules that decide in which name space a certain name resides. Definitions in modules end up in the attribute name space of the module object. Definitions within functions and methods are made in the temporary execution name space, but code in a function can also use (but not assign to) names in the surrounding module. Definitions in classes go to the attribute name space of the class. Finally, an object that is constructed from a class (a class instance) has of course its own attribute name space, in which all assignments happen, but when an attribute is requested that is not in this name space, it is searched for in the class name space; this is how methods are normally found.

ある名前がどの名前空間に置かれるかについての規則が２、３あります。モジュール中で定義されたものは、モジュールオブジェクトの属性名前空間に置かれます。関数とメソッド中の定義は、一時的な実行名前空間に置かれますが、関数中のコードはそれを取り囲むモジュール中の名前も使うことができます（ただしそれを割り付けることはできません）。クラス中の定義は、そのクラスの属性名前空間に置かれます。最後に、クラスから作られたオブジェクト（クラスインスタンス）は、当然ながら自身の属性名前空間を持ち、あらゆる割当はその中でなされます。しかし、この名前空間に無い属性が要求された時は、クラス名前空間が検索されます；メソッドは通常このようにして見つけられます。

Specializing and extending classes / クラスの特殊化と拡張

Often there are several data types that have something in common. One might be a specialization of another one; one could for example introduce normalized vectors as a special kind of vector. Or several data types could share many operations, but differ in certain features. For example, one could define data types representing scalar and vector fields, which would share some properties (e.g. being defined on a grid) but have specific operations like "gradient" for scalar fields and "divergence" for vector fields. Both data types would be implemented as specializations of a data type "field", which would define the common behaviour but not be used directly in programs (this is sometimes called an abstract class).

複数のデータ型が、何らかの共通性を持つということは多くあります。あるデータ型が他の型の特殊化されたものである場合もあるでしょう；例えば、特殊なベクトルとして、規格化されたベクトルを導入することもできます。あるいは、いくつかのデータ型が多くの演算を共有し、しかしある側面では異なっているという場合もあり得ます。例えば、スカラー場とベクトル場を表すデータ型を定義することができます。それらは共通の性質（例えば、格子点上で定義されること）を持つ一方で、スカラー場には「勾配」、ベクトル場には「発散」という様に、各々が特定の演算を持ちます。両者のデータ型は、共通の動作を定義しているけれどもプログラム中で直接は使用されない様な、「場」というデータ型の特殊化されたものとして作ることができます。（これは、抽象クラスと呼ばれることがあります。）

The technique for treating specialization is called inheritance. A class can inherit methods from another class, substitute those that require modification, and add some of its own. The main advantage is avoiding redundant code, which is an important source of mistakes and of course also a waste of memory.

特殊化の技法は継承と呼ばれます。あるクラスが、他のクラスからメソッドを継承し、変更を要するものは置き換え、自身のメソッドを追加する、ということが可能です。これの主な利点は、コードの重複を避けることにあります。重複は間違いの主因ですし、もちろんメモリの無駄でもあります。

The following code defines a class representing directions in space, i.e. vectors with length one. based on the vector class defined above:

次のコードは、空間内における方向、すなわち長さ１のベクトル、を表すクラスを、上で定義したベクトルクラスに基づいて定義します：

class Direction(Vector):

    def __init__(self, x, y, z):
        Vector.__init__(self, x, y, z)
        self.array = self.array/self.length()

The only method being redefined is initialization, which now normalizes the vector. Note that the initialization method first calls the initialization method of the class Vector and then applies the normalization.

再定義されているメソッドは初期化だけで、それがベクトルを規格化しています。この初期化メソッドが、まず Vector クラスの初期化メソッドを呼んでから規格化を行っていることに注意して下さい。

The class Direction inherits all the operation from Vector, which act as if their code were repeated in the new class. In particular, the sum of two directions will be a vector, not another direction. To obtain a normalized result, the method __add__ would have to be redefined as well. But since addition of directions is not such a useful operation, it might not be worth the effort.

この Direction メソッドは、Vector から全ての演算を継承しています。これは、新しいクラスにおいて後者のコードが繰り返されているかの様に振る舞います。特に、二つの "方向" の和は、別の規格化された "方向" ではなく、単なるベクトルになります。この場合に規格化された結果を得るためには、__add__ メソッドを同様に定義し直さなくてはならないでしょう。しかし、方向の加法はそれ程有用な演算では無いので、これは労力に値しないかも知れません。

Error handling / エラー処理

When an error occurs, Python prints a stack trace (a list of all active functions at the time the error occurred) and stops. This is often useful, but not always. You might want to deal with errors yourself, e.g. print a warning, ask for user input, or do some different calculation. Python allows any code to catch specific error conditions and deal with them in whatever way necessary.

エラーが起こった時には、Python はスタックトレース（エラーが起こった時に実行されていた全関数のリスト）を印字して停止します。これは多くの場合に有用ですが、常にそうとは限りません。エラーを自分自身で処理したいこともあるでしょう。例えば、警告を印字したり、利用者による入力を促したり、別の計算を行なったりです。 Python では、任意のコードから個々のエラー状況を捕獲できますし、どんな方法ででも必要に応じてそれを処理することができます。

To identify an error type, Python has several built-in error objects, for example ValueError (indicating that a value is unsuitable for an operation, e.g. negative numbers for a square root) or TypeError (indicating an unsuitable data type, e.g. when asking for the logarithm of a character string). A program can catch a specific error object, a specific collection, or any error.

エラーの型を指定するために、Python はいくつかの組込みエラーオブジェクトを持っています。例えば、ValueError（値が演算に対して不適。例えば平方根に対する負の数）あるいは TypeError（データ型が不適。例えば、文字列の対数を求めたりした場合）などです。プログラムは特定のエラーオブジェクトやコレクション、または任意のエラーを捕獲できます。

The general form of error catching is
エラーの捕獲の一般形は次の様になります。

try:
    x = someFunction(a)
    anotherFunction(x)
except ValueError:
    print "Something's wrong"
else:
    print "No error"

The code after try: is executed first. If a ValueError occurs, then the code after except ValueError: is executed, otherwise the code after else:; this part is optional. To catch several error types, use a tuple (e.g. except (ValueError, TypeError).) To catch all error, use a blank except:. To deal with several error types in a different way, use several except ...: code parts. For further details and possibilities, see the language reference manual

まず最初に try: 以下のコードが実行されます。 ValueError が発生した場合には except ValueError: 以下のコード、それ以外の場合には else: 以下のコードが実行されます；ただし後者は無くてもかまいません。複数のエラー型を捕獲するにはタプルを使います（例えば except (ValueError, TypeError)）。エラーを全て捕獲するには、単にexcept: とします。複数のエラー型を個別に処理するには、except ...: code を繰り返し使います。より詳しい説明と可能性については、 language reference manual を参照して下さい。

Of course Python programs can also generate errors, by using the statement raise ErrorObject, or raise ErrorObject, "Explanation" to add an explanation for the user. The ErrorObject can be any of the predefines error objects, but also a string. Many modules define their own error types as strings and let other modules import them:

もちろん Python プログラムはエラーを発生させることもできます。それには、raise ErrorObject を使います。または、利用者のために説明を付け加えるときには、raise ErrorObject, "Explanation" とします。 ErrorObject としては、あらかじめ定義されたエラーオブジェクトの何れもが使えますし、文字列も使えます。多くのモジュールは、自身のエラー型を文字列で定義し、他のモジュールにそれらを import させます：

# Module A
# モジュール A

AError = "Error in module A"

def someFunction(x):
    raise AError

# Module B
# モジュール B

import A

try:
    A.someFunction(0)
except A.AError:
    print "Something went wrong"

Exercises / 練習問題

Write a class that represents a matrix. A matrix is constructed from a two-dimensional array and has the following operations:
- Addition and subtraction (element-by-element)
- Multiplication (matrix product)
- Generalized inverse (returning another matrix object)
Write a specialized subclass that represents a square matrix. It should verify that it is constructed from a square array. In addition to the general matrix operation, it should provide:
- Inversion (returning another matrix object)
- Eigenvalues (returning an array)
Rewrite the "mass calculation" exercise from part 3 in an object-oriented fashion. Define a class representing a residue with a method "mass" that returns its mass. Write another class representing a peptide chain with a method "mass" that returns the total mass of the chain.
行列を表すクラスを書きましょう。行列は、二次元の配列から作られ、以下の演算を持ちます：
- 加法と減法（要素毎に計算）
- 乗法（行列積）
- 一般化された逆元（別の行列オブジェクトを返す）
正方行列を表す様に特殊化されたサブクラスを書きましょう。正方配列から作られることを検証するようにすべきです。さらに、一般的な行列演算に加え、以下を提供するようにします：
- 逆行列（別の行列オブジェクトを返す）
- 固有値（配列を返す）
Part 3 の "質量計算" の練習問題を、オブジェクト指向のやり方で書き直しましょう。アミノ酸残基を表すクラスを定義し、"mass" というメソッドによって質量を返す様にしましょう。さらに、ペプチド鎖を表す別のクラスを書き、"mass" メソッドが鎖の全質量を返す様にしましょう。

Table of Contents / 目次