Provide a convinient way to dynamically set writer attribute width according to

Related products: FME Form

***Note from Migration:***



Original Title was: Provide a convinient way to dynamically set writer attribute width according to incoming data




Sometimes, we have the need to be economical with the output string attribute width, to avoid creating large datasets. Especially when the dataset already has a large number of attributes, and features.

It would be really helpful to be able to set the output string attribute width to accommodate the longest string in each field, but no longer.

Currently, the only way I can think of is the method suggested by david_r here, which involves exploding all attributes, calculating string length, and find the max length of each attribute. This process will be quite slow when dealing with a large dataset.

Any way to make this easier will be appreciated.

Since the CSV2 reader scans the file for datatypes it wouldn't be a stretch to also scan for field width for string values.

I have an awk tool that scans a csv in milliseconds, perhaps call that? It isn't imbedded into the schema, just gives a text report.

# scan.awk
# scan a text file for field widths
# given a separator switch
# does not handle imbedded separators in strings
# but now does skip header line
# 5 June 2004
# Ollivier & Co

BEGIN { mf=0;}

NR == 1 { # skip header
    next;
}

{
    for (i=1; i <= NF; i++) {
        if ( length($i) > m[i])
        m[i] = length($i);
    }
    if (NF > mf)    
    mf = NF;
}

END    { print "field : max width";
  for (i=1; i <= mf; i++)
      print i,":",m[i];
}

... and why not add a field type detector as well. It could be that your're better off with a number field than a char. So you can set integer/decimal/precision/double/... according your data.