Hierarchical clustering
and 3D mapping
 

Level_0 clustering

When we have a table with records Document_path|Vector|Word we must put them in  clusters basing this operation on distance between their vectors (we have called sometime them as "document fingerprint").
We want perform this process as fast as possible using the recognition power of a Radial Basis Function hardware implementation.
The choice of ZISC neural chip (Zero Instruction Set Computer) is related to the easy to understand behavior and programming interface and mainly for it's "unlimited" expandability at no cost of performance.
The process of clustering is an hybrid situation where some aspects of supervised and unsupervised learning behaves together. The learning process must be performed without knowing a-priori classes associated with patterns, but at the same time it is needed to associate a class to any cluster. The choice is to associate a serial number linked with the commitment of a new prototype neuron in the Radial Basis Function neural network. MIF (Minimum Influence Field) and MAF (Maximum Influence Field) should be selected following the relation:

MIF = f ( DB_SIZE, NN_MAX_SIZE )
MAF = f ( DB_SIZE, NN_MAX_SIZE )

When a new prototype neuron is committed due to the fact it's pattern doesn't match within influence field of any other existing prototype, it will have directly MAF as influence field.
The edge MIF will have a mean when the number of documents inside a specific cluster  exceed a fixed edge MCS (Maximum Cluster Size) which match the measure of clustering roughness.
When a pattern is learned both the URL (Uniform Resource Locator) of the document and his most used word (MUW) are memorized in a record of a database whose key is simply the number of the cluster. Note that this is a relation "one to many" due to the fact that many URLs with the associated MUW can be contained in a cluster.
 
 

CLUSTER_0
VECTOR
URL
MUW
23403
[120][030]...[240]
http://www.alpha/beta/gamma.html
neural
"
"
...
...
"
"
http://www.beta/alpha/gamma.html
kohonen

TABLE 1:
example of records for the key CLUSTER = 23403 in DB_CLUSTER_0
 


Clustering process from documents to level_0 clusters
 
 

Level_N clustering

The upper clustering levels are needed in order to supply a hierarchical structure for the navigation of the database.
The number of clustering levels is function of database dimension and the degree of roughness which best fits a meaningful navigation. Starting from Level_1, a cluster of Level_n doesn't contain URLs but numbers of the Level_(n-1) clusters. A MUW list is associated with any Level_(n-1) cluster.
In order to enable a 3D navigation we need to add at this record some information which represent the x-y-z position of the cluster in a 3D space. We perform this operation with a recognition operation on the vector divided in 3 elements, using ZISC trained with predefined RANDOM patterns:

X = CLASS(k = 0 - m STEP 3) { V[k] }
Y = CLASS(k = 1 - m STEP 3) { V[k] }
Z = CLASS(k = 2 - m STEP 3) { V[k] }

The picture at the bottom of this chapter explains the x-y-z calculation process.
 
 

CLUSTER_n
VECTOR
CLUSTER_(n-1)
MUW-LIST
X
Y
Z
2366
[120][030]...[240]
23403
WORD1,...n
2404
1230
240
"
"
...
WORD1,...n
...
...
...
"
"
32240
WORD1,...n
2400
1040
200

TABLE 2:
example of records for the key CLUSTER = 2366 in DB_CLUSTER_n
 


Clustering process from level n-1 to level n clusters
 
 
 
 

X-Y-Z calculation process
 
 

LEONARD HOME PAGE