The latest version has several ease of installation improvements, for instance it is no longer required to build a special version of Apache Mahout.
python3wherever python is invoked. Before this release it was assumed that the environment mapped
python3which is required for PIO 0.12+ and the UR 0.7+. Since many distros have
pythoninvoke python 2.7 and
python3invokes python 3.6 we now do also.
"from": 0, "num": 2will return 2 recs from the first available,
"from": 2, "num": 2will return 2 starting at the 3rd since
"from"is 0 based.
This tag take precedence over 0.7.0, which should not be used. Changes:
This README Has Special Build Instructions! These are obsolete now but you can see them in the GitHub README.md
This tag is for the UR integrated with PredictionIO 0.12.0 using Scala 2.11, Spark 2.1.x, and most importantly Elasticsearch 5.x. Primary differences from 0.6.0:
WARNING: Upgrading Elasticsearch or HBase will wipe existing data if any, so follow the special instructions below before installing any service upgrades and making backups.
You must build PredictionIO with the default parameters so just run
./make-distribution this will require you to install Scala 2.11 and Python 3 (as the default Scala and Python). You can also run up to Spark 2.1.x (but not 2.2.x), ES 5.5.2 or greater (but 6.x has not been tested), Hadoop 2.6 or greater, you can get away with using older versions of services except ES must be 5.x. If you have issues getting pio to build and run send questions to the PIO mailing list.
Backup your data, moving from ES 1 to ES 5 will delete all data!!!! Actually even worse it is still in HBase but you can’t get at it so to upgrade do the following:
pio exportwith pio < 0.12.0 =====Before upgrade!=====
pio data-deleteall your old apps =====Before upgrade!=====
pio app new …and
pio import …any needed datasets
Once PIO is running test with
pio status and
pio app list. To test your setup and UR integration, run
./examples/integration-test from the URs home.
a sample of pio-env.sh that works with one type of setup is below, but you'll have to change paths to match yours. This example shows the new way to configure for Elasticsearch 5.x, which uses a new port number:
#!/usr/bin/env bash # SPARK_HOME: Apache Spark is a hard dependency and must be configured. # using Spark 2.1.2 here SPARK_HOME=/usr/local/spark # ES_CONF_DIR: You must configure this if you have advanced configuration for # using ES 5.6.3 ES_CONF_DIR=/usr/local/elasticsearch/config # HADOOP_CONF_DIR: You must configure this if you intend to run PredictionIO # using hadoop 2.8 here HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop # HBASE_CONF_DIR: You must configure this if you intend to run PredictionIO # using HBase 1.2.x here or whatever the highest numbered stable release is HBASE_CONF_DIR=/usr/local/hbase/conf # Filesystem paths where PredictionIO uses as block storage. PIO_FS_BASEDIR=$HOME/.pio_store PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp # Storage Repositories PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_ PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS PIO_STORAGE_REPOSITORIES_APPDATA_NAME=pio_appdata PIO_STORAGE_REPOSITORIES_APPDATA_SOURCE=ELASTICSEARCH PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_eventdata PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE # ES config PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200 # <===== notice 9200 now PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=elasticsearch_xyz # <===== should match what you have in you ES config file PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/usr/local/elasticsearch PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs PIO_STORAGE_SOURCES_LOCALFS_HOSTS=$PIO_FS_BASEDIR/models PIO_STORAGE_SOURCES_HBASE_TYPE=hbase PIO_STORAGE_SOURCES_HBASE_HOME=/usr/local/hbase
This is the last version that will support PredictionIO 0.11.0, Elasticsearch 1.7.6, and Scala 2.10. Bug fixes may be added to this branch where needed but it will be left in the "UR v0.6.0"
This is a major upgrade release with several new features. Backward compatibility with 0.5.0 is maintained. Note: We no longer have a default
engine.json file so you will need to copy
engine.json and edit it to fit your data. See the Universal Recommender Configuration docs.
minEventsPerUserin the UR configuration docs.
blackListEventsas defined in
engine.jsonwas not working for an empty list, which should and now does disable any blacklisting except explicit item blacklists contained in the query.
pio buildfailure triggered by the release of Apache PIO. If you have problems building v0.4.0 use this version. It is meant to be used with PredictionIO-0.9.7-aml.
SelfCleanedDataSourcetrait. Adding params to the
engine.jsonallows control of de-duplication, property event compaction, and a time window of event. The time window is used to age out the oldest events. Note: this only works with the ActionML fork of PredictionIO found in the repo mentioned above.
backfillField: durationto accept Scala Duration strings. This will require changes to all engine.json files that were using the older # of seconds duration.
typeNamein engine.json is required be
"items", with this release the type can be any string.
pio traintime is taken up by writing to Elasticsearch. This can be optimized by creating and ES cluster or giving ES lots of memory.
pio deployto make the new model active.
This Software is licensed under the Apache Software Foundation version 2 license found here: http://www.apache.org/licenses/LICENSE-2.0