Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In order to train the pyspark model, you have to install spark.

TODO: see if we can get the spark model to run. If it works, describe it here.

TODO: done up to here.


Comparison of different machine learning frameworks: pyspark, Scikit-Learn, Tensorflow. In particular, evaluation of Spark for ML:

...

Reduced data set (with the purpose of reducing bias): 7000 samples,  7 classes (1000 samples per class)pySpark implementation: long training times (several hours on a virtual machine with 16 cores, 32GB RAM). This could be due to some configuration error and shows that more effort is needed to tune a pySpark model / installation.

Scikit-Learn implementation: training times in the order of magnitude of a few minutes.

Tensorflow: offers other methods and is therefore difficult to compare with the previous two wrt performance.

Open tasks: code consolidation, further evaluation of results, Spark environment configuration.