Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


Reduced data set (with the purpose of reducing bias): 7000 samples,  7 classes (1000 samples per class)

pySpark implementation: long training times (several hours on a virtual machine with 16 cores, 32GB RAM). This could be due to some configuration error and shows that more effort is needed to tune a pySpark model / installation.

Scikit-Learn implementation: training times in the order of magnitude of a few minutes.

Tensorflow: offers other methods and is therefore difficult to compare with the previous two wrt performance.

Open tasks: code consolidation, further evaluation of results, Spark environment configuration.