Saturday, October 11, 2014

Edit Distance in Java

This is a simple implementation of edit distance, which is used in spell error correction in NLP.

public class MED{
public static void main(String[] args){
MED med = new MED();
med.min_edit_dic("abc", "cabc");
}
public int min_edit_dic( String target ,String source){
int n = target.length();
int m = source.length();
int[][] distance = new int[ n+1 ][ m+1 ];
distance[ 0 ][ 0 ] = 0;
for ( int i = 1; i <= n; i++){
distance[ i ][ 0 ] = distance[ i-1 ][ 0 ] + ins_cost(target.charAt(i-1));
}
for ( int j = 1; j <= n; j++){
distance[ 0 ][ j ] = distance[ 0 ][ j-1 ] + ins_cost(target.charAt(j-1));
}
for ( int i = 1; i <= n; i++){
for ( int j = 1; j <= m; j++){
int ins = distance[ i-1 ][ j ] +ins_cost(target.charAt(i-1));
int sub = distance[ i-1 ][ j-1 ] + subs_cost(target.charAt(i-1),source.charAt(j-1));
int del = distance[ i ][ j -1 ] + del_cost(source.charAt(j-1));
distance[i][j] =  min( ins, min(sub,del));
}
}

for ( int i = 0; i <= n; i++){
for ( int j = 0; j <= m; j++){
System.out.print(distance[i][j]+"\t");
}
System.out.println();
}

return 1;
}

private int min(int d1, int d2){
return d1 < d2 ? d1: d2;
}

private int ins_cost(char c){
return 1;
}

private int del_cost(char c){
return 1;
}

private int subs_cost(char c1 , char c2){
return c1 != c2 ? 2 : 0;
}
}

Comparison of StringBuilder and StringBuffer

StringBuilder and StringBuffer are very common classes in Java.
Here, I am comparing with these two classes.
Generally speaking,
StringBuilder is
1) non-synchronized
2) be careful when using multithreads
3) faster  

StringBuffer 
1) synchronized.
2) safer but slower

>>Java test codes
public class StringDemo{
public static void main(String[] args){
int N = 100000;
StringBuffer buffer = new StringBuffer();
StringBuilder builder = new StringBuilder();
long begin = System.currentTimeMillis();
for(int i = 0; i < N; i++){
buffer.append(" ");
}
System.out.println("StringBuffer takes " + (System.currentTimeMillis() - begin) + " ms");
begin = System.currentTimeMillis();
for(int i= 0; i < N; i++){
builder.append(" ");
}
System.out.println("StringBuilder takes " + (System.currentTimeMillis() - begin) + " ms");
}
}

output >>
StringBuffer takes 8 ms
StringBuilder takes 1 ms