Below are the publications reused in this thesis that does not require copyright clearance:
Below are the publications reused in this thesis that requires copyright clearance:
The continuous advance of electronics and information communication technologies (ICT) have influenced every aspects greatly, on this thesis is discussed on education aspect. Electronics and ICTs have been incorporated into the learning and teaching process, giving birth to electronic learning (e-learning). Inside, there is a well known term called online course where the essence is being able to deliver courses distantly with flexibility in place and time. However a simple condition must be met in order to implement online course, and that is the sufficiency of ICT infrastructure. Unfortunately not all regions met this condition, limiting the accessibility of online course. Other than improving the ICT infrastructure, distributed learning management system (LMS) was proposed as alternative, but the next issue was the maintenance or synchronization, which in this case is keeping the learning contents up to date. There are two problems highlighted in this thesis which are unable to perform synchronization in severe network connectivity region, and duplicate data transfer during synchronization.
To overcome the synchronization in severe network connectivity region the solution is utilizing hand carry servers. By implementing hand carry servers on distributed LMS will grant mobility to the servers of distributed LMS. The concept proposed was having the hand carry server to physically seek network connectivity to perform online synchronization, and afterwards returns to its original location. The hand carry server was proved to be portable due to its small size, light weight, and also power consumption where a power bank is enough to supply for a whole day. Although it has resource limitations in terms of computer processing unit and random access memory which limits its performance.
To overcome duplicate data transfer during synchronization incremental synchronization was utilized instead of full synchronization. Also on this thesis introduced a new approach called dump and upload based sychronization which was to overcome the obstacles of different LMSs and LMS versions faced by dynamic content sychronization.
Electronics and Information Communcation Technology (ICT) have made many tasks more convenient, including delivering education. It can be seen that many have incorporated electronics in their learning and teaching process. There are few examples such as teachers using laptops and projectors to present their materials, students browsing the Internet to search for informations, and both of them using emails, chats, or social networking service to communicate. These kind of things are agreed to be called electronic learning (e-learning) which can be illustrated on Figure 1.1
Though, this thesis will not discuss widely on e-learning, but a category which is part of e-learning called online course. It uses electronic ICT devices where information exchange can be done remotely. Information can be delivered through electrical signal in high speed on the network, preferably on the Internet, and computer devices as end devices or as transmitters and receivers. Simply computer devices connected to the Internet are all that are needed to participate in online course from anywhere at anytime illustrated on Figure 1.2.
Online course is now being highlighted by many parties, seeing them as one solution to the unevenly distribution of education. Straighfowardly not everyone have access to good quality education, furthermore there are also those who does not have access, and by using online course people can receive education without going to school. Knowing this, our peers tried to implement online course in their Universities, one in Indonesia  and the other one in Myanmar . Another peer already have online course well built in Mongolia and now moving to massive open online course (MOOC) . Unlike private online course only for students in Universities, MOOC is open for anyone indiscriminately. In the United States (US) MOOC is also being used to scout for potential students. For example Massachusetts Institute of Technology (MIT) found a genius Mongolian highschool student who perfectly ace its Circuits and Electronics MOOC, then took him as a freshmen student . In summary many people saw bright future in utilizing online course in education.
With all the benefits of online course, there are still problems preventing many people from enjoying it. The problem was the lack of accessability to online course due to insufficient ICT infrastructure. In other words there are people who are having network connectivity issue especially in developing countries. On random survey by Kusumo et al.  on students in Indonesia, 60% of them agreed that Internet connection is still problematic. The survey by Monmon et al.  of e-readiness on Yangon Technological University and Mandalay Technological University in Myanmmar showed lower Likert scale scores on the students' and teachers' perception on ICT network compared to other items. Today the world Internet penetration is still around 50% indicating that only half of the world's population can access online course . Eventhough these people have access, their access quality may still be questionable which can lead to disatisfaction in accessing online course.
The obvious solution to accessibility issue is to improve the ICT infrastruture, however this takes a long time. Therefore another method was implemented, which is implementation of distributed system rather than centralized system. The concept is to have the people to access the service on their local area that is distantly closer than on the central area that is distantly further. In some references, it is stated as the third generation of content management system (CMS) , thought on this work is more about learning contents of Learning management system (LMS) than general contents of CMS.
With distributed LMS as the solution to the lack of accessability of online course, it is the next problem which is discussed on this thesis. The problem is the synchronization which is to keep the learning contents up to date. This can also be said as the maintenance of the learning contents. Specifically there are two problems highlighted on this thesis as follow:
This thesis provides two main solutions for the two problems:
Detail significances are discussed in further sections, but in general can be mentioned as follow:
The objective of this research is to enable online synchronization of distributed LMS in almost no network connectivity region and reduce redundant data transfer during synchronization.
Each method may have limitations which is discussed in detail on each of their respective sections, but here is mentioned the general limitation of this research:
Beyond this section the thesis contains three more chapters:
Distributed systems can be a wide discussion with different implementation . One implementation can be as partitioned system. For example, an organization's network can have their servers separated, where the database, directory, domain name service (DNS), dynamic host configuration protocol (DHCP), file, web, and each other servers on separated machine. They are integrated but independent where if one service (server) is damage, will not damage other services. A different example is data partitioning where data are fragmented that when retrieving data, they have to be gathered and merged. This usually happens in collaboration where people are working on the same project but from different machines.
Another implementation can be as replicated system, and this is the one that is referred or used on this thesis. The urgency for replicated system can be due to bottleneck traffic or geographically severe network connectivity, or both. One of the most popular implementation is search engine like Google and Yahoo where they have different server locations assigned with local domains for example .co.jp for Japan, .co.id for Indonesia, and etc. Not as well known as search engines are online multiplayer games. The servers of online multiplayer games can reside on many regions such as Asia, Europe, United States, China, etc. There are games that shows the number of population on each servers indicating whether it is full or not. Players can choose other servers when a server reached the population limit or when players cannot actually reach the server on that region.
One definition of LMS is a system that manages the learning and teaching specifically for online case. The current form of LMS today is a software application. It is not just delivering learning materials to students but online computerize any activities that can happen in a class. Some activities are interractions whether by chat applications or forums like on social networking service (SNS), assignments where this time is submitted electronically through LMS by uploading their files, and quizzes or examinations which can be automatically or manually graded. Not to forget that it can be accessed from anywhere at anytime, and computers are used which can perform much faster and automatic tasks than humans, makes it possible for unique applications, data minings, and learning analytics. In short new features are being developed everyday. Today exists many LMS as on Table 2.1 whether they are open source (free to use, modify, with all the codes open), only available on clouds or software as a service (SAAS) which tends to be freeware/usage only, or proprietary which tends to be business/commercial/paid. On the author's surroundings mostly Moodle is used.
|Open Source||aTutor, Canvas, Chamilo, Claroline, eFront, ILIAS, LAMS, LON-CAPA, Moodle, OLAT, OpenOLAT, Sakai, SWAD, Totara LMS, WeBWorK|
|SAAS/Cloud||Cornerstone OnDemand Inc, Docebo LMS, Google Classroom, Grovo, Halogen Software, Informetica, Inquisiq R3, Kannu, Latitude Learning, Litmos, Talent LMS, Paradiso LMS, TOPYX, TrainCaster LMS,WizIQ LinkStreet|
|Proprietary||Blackboard Learning System, CERTPOINT Systems Inc, Desire2Learn,eCollege, Edmodo, Engrade, WizIQ, GlobalScholar, Glow, HotChalk,Informetica, ITWorx CLG, JoomlaLMS, Kannu, Latitude Learning LLC,Uzity, SAP, Schoology, SSLearn, Spongelab, Skillsoft, EduNxt,SuccessFactors, SumTotal Systems, Taleo, Teachable, Vitalect|
The term distributed LMS means that the replicated servers contains LMS. Each servers are meant to service online course. The implementation can be a full replication where not only learning contents but everything else including activities, assessments, and interractions are synchronized. This means students and teachers can freely use any servers recommended to the one with best network connectivity. The other implementation is partial replication where only non-private data are synchronized, usually only the learning contents. This can happen when there are jurisdictions where each regions are to be handled locally. In other words contents are provided but each schools and universities are still the owner of their own servers and asserts local authorities. Either way distributed system is the solution for bottleneck and connectivity issue. As an illustration on Figure 2.1 in Indonesia, it is better to build and spread more servers compared to have a centralized server in the capital city.
After the establishment of distributed LMS, the contents needs to be maintained or to be kept up to date through synchronization. However the problem is the lack of network connectivity between servers usually found in deeper areas such as schools in villages. It may be easy to build a LAN but difficult to build connections to other servers or simply an Internet connection on distant places. In a short time it is only possible to build a very limited connection (very low speed) which retrieval of contents may seem to take forever if it is very large. The metaphor is building a server in a jungle, a remote island, or a desert, which are very isolated. The default solution is offline synchronization or the author's solution server mobilization .
Before discussion of the synchronization, this section would like to introduce hand carry servers. On this thesis it is called hand carry server because the physical hardware is a computer with the size of a regular human hand that has been configured into a server. It is called a mini, pocket size, or portable computer, one example on this thesis is used Raspberry Pi 2 with the specification on Table 2.2.
|A 900MHz quad-core ARM Cortex-A7 CPU|
|1 Giga Byte (GB) Random Access Memory (RAM)|
|4 Universal Serial Bus (USB) ports|
|40 General Purpose Input Output (GPIO) pins|
|Camera Serial Interface (CSI)|
|Display Serial Interface (DSI)|
|Micro Serial Digital (SD) card slot|
|Video Core IV 3D graphics cire|
|Size of 85.60 mm × 56.5 mm (3.370 in × 2.224 in), not including protruding connectors|
|Weight of 45g|
The portability was demonstrated on one of the author's previous work . It is less related to distributed system but it showed applications of hand carry server in manual labors which on that work is a simulation comparing between paper based method survey to hand carry server method survey. The motivation was the lack of Internet connection to perform online survey but most people owns a computer devices in developing countries   . Instead of reverting to paper based method, the participants' personal digital assistants (PDAs) can be utilized by connecting them to the hand carry server and perform a semi-online survey illustrated on Figure 2.2.
For the simulation a MOOC readiness survey . consist of 30 questionnaire items was simulated on 30 participants by a surveyor. The whole survey consists of three stages; preparation, responding, and post survey. On the preparation stage, for paper based method the surveyor creates the questionnaire items on word processing software then print them, while for hand carry server method the surveyor creates the questionnaire on web based survey application called Limesurvey CMS. On responding stage, for paper based method the surveyor hands out paper to each participants and collect them when they are finish responding, while for hand carry server method the surveyor tells the participants to connect their PDAs to the hand carry server and informs the URL of the local survey site, then waits until the participants submits their results to the hand carry server. Though results on Figure 2.3 showed no difference in time consumption for preparation and responding stage, paper based method tends to burden more on labors such as printing the questionnaires (time taken multiply greatly using old printers) and carrying heavy papers if there are alot of participants. On the other hand resource is the main issue for hand carry server which will be discussed on Limitation of Hand Carry Server section.
However the advantage was shown on the post survey stage where usually the surveyors have to input the responses into the database, not to forget to also handle human errors by verifications such as double checking which seems to be the most stressing and tiring proses of paper based method. It is different from hand carry server method where the responses are automatically processed, literally no post survey stage. In fact results/statistics are instantly visible which no manual method can outfast. The participants can see the current statistics the moment they submitted the responses as exampled on Figure 2.4.
The author's work mostly discussed the convenience of computerization but the important part is the mobility or portability . Back on Figure Figure 2.2., the hand carry server can be carried anywhere (a walking/moving server) which only needs a power supply of direct current (DC) 5V (volts) potential difference and 2A (amperes) electric current, usually a hand carry power bank is enough. On the simulation is also measured the current delivery was 0.6AH (ampere hour) in 39 minutes (whole duration of survey, see Figure Figure 2.3) meaning with the powerbank's specification of 20000AH it will last 20 hours. In short the hand carry server is low power cost that can last longer during mobile.
Currently synchronization have to be to taken offline when there is no network connectivity whether they are full or incremental which will be discussed in next chapter. An administrator will go to network connected or directly to the updated server to retrieve the contents and store in a storage media such as compact disc (CD), and flash drive. Then travel back to the outdated server, insert the storage media and give the contents. There is a work by Ijtihadie et al.  for differential update where it was sent through email then differentially update the contents. It should be possible to put the differentials into a storage media which then to be inserted into the outdated server to update the contents.
Another way is to move the servers to an area with connectivity, have it update, and then return it to its original location . This was actually inspired by Ijtihadie et al.  where the students downloads the quiz on their mobile devices, answers them offline at their homes, and later finds an Internet connection to synchronize (automatically upload their answers). This concept was applied to this thesis' work where the process happens to the hand carry server instead of the mobile device. It is illustrated on Figure 2.5 with currently people carrying the servers. An example of implementation is on Figure 2.6. There are regions in Indonesia which does not have goot network connectivity rendering difficult to synchronize with other servers. If those servers are replaced with hand carry servers, then it can physically move to find network connectivity (it supports wired and wireless connection) to synchronize, and in the end return to its original location.
Within the distributed LMS, the servers can either be replaced with hand carry servers or leave them mounted and have hand carry servers as addition or support, meaning the hand carry servers will travel from servers to servers. It is temporary implementation when there are no network infrastructures built, since it is fast and simple to install, or it can serve as a purpose to cover network coverage holes where the hand carry server moves around these network uncovered area.
With the compressed size and light weight of hand carry server, it has resource limitation. The resources responsible for servicing are mainly computer processing unit (CPU) and random access memory (RAM) (detailed specification can be seen back on Table Table 2.2). As shown on Figure 2.7 the CPU and RAM are already exhausted when 30 participants attempts the survey . These measurement result alone may not show much meaning, but can be meaningful if stress testing is conducted as on next subsection.
Experience users may completely understand by just showing the resource measurement results, but others will have to feel, rub, and take few trials to see how far this hand carry server is actually capable. For that reason, stress testing was proposed and conducted. Though it was tested for survey purpose , but the method can be applicable for other applications. For the stress testing, a web stress testing software application called Funkload was used. Different numbers of virtual users incrementally 10 up to 100 was generated and attempts survey on the hand carry server simultaneously Illustrated on Figure 2.8. This time only response time was measured.
Response time can be refered to service time, in this case how much users takes to load questionnaire items and to submit responses. The service time can also be called queuing time where there are users who takes shorter time and users who takes longer time as on Figure 2.9 are shown the average response time and the maximum response time (the user on the last queue). It shows that the response time increases to the number users and also increases when the questionnaire content size increases because it will affect on the number of questionnaire items to be retrieved and how much responses that have to be submitted. Through this results, the surveryor can decide the target average response time and tolerable maximum response time. Then the number of users and questionnaire items simultaneously can be determined. Though the result also showed that the hand carry server have reached its limit above 85 concurrent users and 30 questionnaire items which the service stops working and must be restarted.
Before going to the main discussion of synchronization, it is better to discuss about learning content sharing. Sharing learning contents became popular ever since MOOC was introduced. A course "Moodle on MOOC" conducted periodically teaches students how to use Moodle and advised them to share their finished courses . Making a well designed and written learning contents for online course from a scratch may consume a lot of time, learning content sharing helps other instructors to quickly develop their own. Some specialized courses may only be written by experts. Learning content sharing reduces the burden of the teacher to create learning contents for online courses, and the more the existence of online courses can give more students from all over the world a better chance to access a quality education.
Distributed LMS is also another form of learning content sharing where the learning contents are shared to other servers on other regions. The typical way of learning content sharing is dump, copy, then upload. Most LMS have a feature to export their course contents into an archive and allows to import the contents to another server which have the LMS. The technique to export and import varies to systems but the concept is to synchronize the directory structure and database. There is a very high demand for this feature that it is still improving until now, for example being able to export user defined part of the contents is being developed. Other LMS that currently does not have this feature will be developed as it is stated on its developer forum.
Synchronization can be defined as similar movements between two or more systems which are temporally aligned, though on this case is the action of causing a set of data or files to remain identical in more than one location. The data or files are learning contents and private data, although private data are usually excluded. The term full synchronization defined on this thesis is the distribution of the whole data consists of new data and existed data. Synchronization occurs when new data are present to update the data of other servers. Illustrated on Figure 3.1 the full synchronization includes existed or duplicated data which deems to be redundant that only adds unnecessary burden to the network. However full synchronization are more reliable because each full data are available.
Ideally the duplicate data are to be filtered out and not to be distributed for highest efficiency. The conventional way is the recording approach where the changes done by the authors of the course are recorded. The changes can only and either be additions or deletions of certain locations. This actions are recorded and sent to other servers and have them execute the actions to achieve identical learning contents, which is similar to push mechanism where the main server forces updates on other servers. Accurate changes can be obtained but unrecoverable from error because the process is unrepeatable. Another issue is its restriction that no modification must take place on the learning contents of other servers, meaning the slightest change, corruption, or mutation can render the servers unsynchronizable.
Instead of the recording approach, the calculating approach is more popular due to its repeatable process and less restriction. The approach is to calculate the difference between the new and outdated learning contents. Therefore the process of the approach can be done repeatedly and some changes, corruption, or mutation on either learning contents does not prevent the synchronization. One of the origins of the calculating approach is file differential algorithm developed in Bell Laboratory  which today known as diff utility in Unix. The detailed algorithm may seem complicated, though in summary consists of extracting the common longest subsequence of characters in each line between the two files (more like finding the similarity between two files), afterwards the rest of the characters on the old file will be deleted while on the new file the characters will be added to the common longest subsequence on the correct location, resulting in update of the old file. For large files hashings were involved.
Applying the file differential algorithm on the synchronization will make it differential synchronization. Unlike full synchronization, differential synchronization is the distribution of only the new data. The repetition of differential synchronization will make it incremental synchronization which is the repetitive distribution of only the new data. In sense the synchronization will be incremental because only the updates are sent every time. Another way to put it, increment means to add up where the learning contents adds up to every differential updates. Ultimately duplicate data or learning contents will be filtered out, reducing unnecessary burdens on the network illustrated on Figure 3.2.
The idea of implementing differential synchronization on distributed LMS started by Usagawa et al. , which then continued by Ijtihadie et al.  . These works still limits themselves to distributed Moodle system because it solely focuses on Moodle structure. When writing the software application, it is necessary to identify the database tables and directories of the learning contents. The incremental synchronization between two Moodle systems was described as dynamic content synchronization  where the learning contents are constantly being updated. The dynamic synchronization is unidirectional or simplex in terms of communication model where it is fixed that one Moodle system acts as a master to distribute the updates and another one acts as a slave to receive the updates.
File differential algorithm was applied to maintain consistencies on both master's and slave's database tables and directories. The database tables and directories are assigned with hashes . Information of those hashes are exchanged between master and slave, identical hashes meaning thoses contents should not be change, and on the other hand mismatch hashes meaning those contents should be updated. Though Ijtihadie et al.  developed their own algorithm stated specifically for synchronization of learning contents between LMS, it is not much different from existing remote differential file synchronization algorithm such as .
The moodle tables on the database is converted into synchronization tables as on Figure 3.3 through means of hashing. Only contents related to the selected course was converted and sorted on the course packer. Privacy was highly regarded, thus private data was filtered. The purpose is to find inconsistencies on the database between master and slave. Stated on the previous paragraph, hashes are oftenly used to test inconsistencies, if the hashes are different then they are inconsistent and vice versa. When inconsistencies on a certain table is found, the master sends its table to the slave replacing the slave's table which in the end will become consistent. In the end the synchronization tables are reverted back into Moodle tables. In summary dynamic content synchronization only takes place on parts of the database and directories that changes or inconsistent.
The dynamic content synchronization  software application was written solely for Moodle, and back then was written for Moodle version 1.9. Later on Moodle rises to version 2.0, with major changes on database and directory structure. The software application have to be changed to suit the new Moodle version , but the concept of synchronization remains the same. Moodle continues to develop, until now it is version 3.3, though sadly the dynamic content synchronization software application was discontinued on Moodle version 2.0. The author originally tried to continue the software application but found a better approach named dump and upload based synchronization model  on Figure 3.4. Unlike dynamic content synchronization, the dump and upload based synchronization is bidirectional but limited to half duplex communication model. In other words each can play as both master and slave, but only one at time. For example, on first synchronization one server can play as the master while others as slaves, and on second synchronization the master can switch into a slave and one of the slaves can switch into a master. Another thing is that the synchronization uses pull mechanism where the slave checks and requests updates to the master. It is considered more flexible than the push mechanism where the master forcefully update the slaves.
While dynamic content synchronization handles everything from a scratch, the dump and upload based synchronization utilizes the export and import feature that exists in most LMS. It is a feature mainly to export and import learning contents categorized into courses which can also be called course contents. The export feature outputs the course content's database tables and directories into a structured format. Then the import feature reads the format and inserts the data into the correct database tables and directories. Formats may differ from one LMS to another but the method is most likely the same.
Other features are export and import of course lists, user accounts, and probably more others but not known and used on this thesis. One of the best export and import is on Moodle where further splitting is possible on the course contents while on other LMS have to dump the whole course. This way people can choose to get only the contents they are interested in. This opens a path for partial synchronization where only specific contents or parts of the course are synchronized. Another advantage is the option to choose to include, not to include private data, or include private data but anonymized, in other words it supports privacy. In summary Moodle's export and import feature's advantage compared to other LMSs' is the ability to secure private data, and split course contents into blocks or micros screenshot on Figure 3.5. This thesis highly recommends other LMSs' export and import feature to follow Moodle's footsteps.
With the pervious subsection explained that course contents can be dumped using the export and import feature, the next step is performing remote differential synchronization between the two archives. The author chose not to develop an algorithm but used an existing algorithm called rsync . The author also did not write a program to perform rsync but use the already existing program based on the rsync library (librsync). What the author did is just implementing this program to work on hyper text transfer protocol (HTTP) or on web browsers since LMS are usually web based (rsync is mostly used on secure shell (SSH)). There are three general steps of performing rsync algorithm between the two archives located on different servers as on Figure 3.6, and details are as follow:
Lastly on this subsection, for implementation should be targeted for regions with severe network connectivity. Although transmitting only the differential than the whole contents reduces the transmission cost, it is not the only answer regarding to network stability issue. Network stability issue can be a long cut off in the middle of transmission which forces to restart the synchronization process. Another one is short cut offs which makes the transmission discrete but unnecessary to restart, however frequent short cut offs can corrupt the transmission data. To solve this unstable network problem, techniques implemented in most download manager software applications should also be implemented on the synchronization's transmission. To support continueable download after the transmission is completely cutoff, is to split the transmission data into pieces. During cutoff, the transmission can be continued by detecting how much pieces the client has, then request and retrieve remaining pieces from server. To prevent data corruption checksums can be used to verify the data's integrity, on this case are the pieces integrity. Finally Figure 3.6 is modified to Figure 3.10.
With dump and upload based synchronization prototype created, an experiment was conducted. The experiments took place on many LMS with the latest version, which were Moodle 3.3, Atutor 2.2.2, Chamilo 1.11.4, Dokeos 3.0, Efront 126.96.36.199, and Illias 5.2. The purpose was to compare the network traffic between full synchronization and incremental synchronization, and percentage of duplicate data eliminated. The experiment used the authors own original course contents which mainly consists three topics are computer programming, computer network, and penetration testing, with each consists of materials, discussion forums, assignments, and quizzes. A snapshot of one of the topics was provided on Figure Figure 3.3.
There are four scenarios. First is full synchronization, equivalent to transmitting the whole course content or full download from the client side. Second is large content incremental synchronization is when the client only have one of the three topics (example for Moodle will update from 16.5MB to 30.5MB). Third is medium content incremental synchronization is when the client already have two of the three topics (example for Moodle will update from 28.4MB to 30.5MB), and the client wants to synchronize to the server in order to have all three of the topics (update). Fourth is no revision meaning incremental synchronizing while there is no update, to test whether there are bugs in the software application which the desired result should be almost no network traffic generated. On Table 3.1 shows the course content data size in bytes when it has one, two, or three of the topics. The data sizes varies depending on the LMS, but the contents such as materials, discussion forums, assignments, and quizzes are almost exactly similar.
|LMS||1 Topic||2 Topics||3 Topics|
|Moodle||16.5 MB||28.4 MB||30.5 MB|
|Atutor||336.5 kB||11.7 MB||13.7 MB|
|Chamilo||8.5 MB||20 MB||22 MB|
|Dokeos||27.4 MB||39 MB||41 MB|
|Efront||16.5 MB||28 MB||30 MB|
|Illias||439.3 kB||22.8 MB||26.6 MB|
The experiment used rdiff utilities to perform rsync algorithm between latest and outdated as the incremental synchronization. Before proceeding it is wise to examine the affect of block size which on previous subsection states that users are free to define the size. The test was perform on Moodle's archives from Table Table 3.1 between an archive which has one topic of 16.5MB and archive which has 3 topics of 30.5MB. The result is on Figure 3.11 showing the relationship between block size, signature, and delta size, which affects total transmission cost by summing signature and delta. Larger block size meaning less blocks where less checksum sets are generated, thus smaller signature size. However this means less accurate checking and less likely to detect similar blocks which will contribute to the size of the delta. The Figure 3.11 showed the delta had reached the full size of the targeted archive, meaning that it missed detecting similar blocks, thus the whole archive is treated as totally different archive. The incremental synchronization will be more heavier than full synchronization. Reversely smaller block size provides more accurate detection which guarantee to reduce the size of the delta. However this means more blocks and more checksum sets are to be bundled into the signature, and looking at the Figure it can grow very large that can cost a lot more transmission cost then full synchronization itself. In conclusion choosing the right blocksize is crucial to get less sum of signature and delta that contributes to the transmission cost, on this case 512 bytes of block size is optimum.
With the relationship of blocksize to signature and delta discussed, it is still not ready to proceed with the experiment. With the difference between the two archive's size, latest is 30.5MB and outdated is 16.5MB ideally the delta should be 14MB but still strayed far to as large as 20MB. It is found that the problem is because the rsync algorithm (rdiff) was executed directly on the archive which is still compressed. The solution is to uncompress the archive before hand and execute rdiff recursively of every available contents which makes the author to turn on more modified utility called rdiffdir.
The experiment succeeded and got results of Figure 3.12. Figure 3.12 already includes uplink and downlink, for incremental synchronization uplink is influenced by the size of the signature and downlink is influenced by the size of the delta (see Figure 3.6). Detailed data are also provided on Table 3.2. However the purpose of both Figure 3.12 and Table 3.2 is only to show that incremental synchronization is better than full synchronization which in this case is lower network traffic, and to show that the incremental synchronization is able to detect when there are no updates in this case almost no network traffic, while the main objective is to eliminate duplicate data during transmission.
|Signature in Mega Bytes||Delta in Mega Bytes|
The percentage of redundant data eliminated is shown on Table 3.3 for incremental synchronization scenarios. It is assumed that the ideal delta is the difference in data size between the latest and outdated archive. The duplicate data is the outdated archive itself or the latest archive substracted by the ideal delta, which is this much that had to be eliminated. The larger the experiment's delta size compared to the ideal delta, the worse the experiment's result. With these results the performance of the incremental synchronization can be evaluated by calculating the percentage of duplicated data eliminated which is the full latest archive substracted by experiment's delta size, next divided by duplicated data, and then converted to percentage. For large content synchronization there is one LMS Atutor which had a low result of 51.89 % due to size of generated archive itself (Table Table 3.1) and drop the whole average to 85.30%. Other than Atutor and Illias the duplicate data eliminated percentage is above 89%. For the medium content synchronization a very high average duplicate data eliminated percentage is achieve which is 97.90%, meaning duplicate data are almost completely eliminated. Though these results are obtain strictly under optimal block size configuration (Figure Figure 3.11) where the minimum network traffic consisted of uplink and downlink (affected by signature and delta size) is desired. There is no benefit of 100% duplicate data elimination if the uplink (signature size) is very large.
|In Mega Bytes||Large Content Synchronization||Medium Content Synchronization|
With the dump and upload based incremental synchronization model successfully able to eliminate very large amount of duplicate data the advantage compared to the previous dynamic content synchronization can be discussed:
Portable and synchronized distributed LMS was introduced to keep the contents up to date in environment of severe network connectivity. By replacing the servers with hand carry servers, the servers in severed network regions were able to move to find network connectivity for synchronization. The hand carry server was proved to be very portable because of its very small size and very light weight. The power consumption is very low that a power bank used on smart phone is enough to run the hand carry server for almost a whole day. Though very convenient however it has resource limitations mainly on CPU and memory, which limits the number of concurrent users. Still, the problem of unable to perform synchronization in no network connectivity area is solved.
The Incremental synchronization technique was beneficial for synchronization in distributed LMS, where it eliminates very large amount of duplicate data . Though in the past incremental synchronization was already proposed to be implemented in distributed LMS, this thesis provides a better approach which is dump and upload based synchronization. The advantages are that it is compatible to most LMSs and most of their versions, easily tuneable for bidirectional synchronization, and because it utilizes LMS features it can be tuned for example to configure privacy settings, and to perform partial synchronization.
All of the experiment are done in the lab, and it is better to conduct real implementation in the future especially regarding the hand carry servers. A possible real implementation is to have drones carrying the hand carry servers. Performance issue is still a problem with hand carry servers that demands for enhancing techniques like integrating field programmable gate array (FPGA). For incremental synchronization it was discussed only the network issue but not yet resource such as CPU and memory. Although the synchronization on this thesis is bidirectional, distributed revision control system is needed to be implemented for larger collaborations. The distributed LMS here is a replicated system, but there is a better, more flexible trend to use especially for content sharing which is message oriented middleware (MOM) system that in the future is very interesting to be implemented.
I would like to give my outmost gratitude to the all mighty that created me and this world for his oportunity and permission to walk this path as a scholar and for all his hidden guidances.
The first person I would like to thank is my main supervisor Prof. Tsuyoshi Usagawa for giving me this topic, also to Dr. Royyana who was researching on this topic before me, and their countless wise advices for perfecting this research. The professor is also the one who gave me this oportunity to enroll in this Master's program in Graduate School of Science and Technology, Kumamoto University. It was also through his recommendation that I received the Ministry of Education, Culture, Sports, Science and Technology (MEXT) scholarship from Japan. Not to forget his invitation to join his laboratory, the facilities, and comfort that he had provided. Also, I would like to thank all the oportunities that he had given me to join many conferences such as in Tokyo, Myanmmar, and Hongkong.
Then I would like to thank the Japanese government for giving me this MEXT scholarship that I never have to worry about financial. Instead I can focus on my studies, research, planning my goals for the future, and helping other people. I also would like to thank my other supervisors Prof. Kenichi Sugitani and Prof. Kohichi Ogata for evaluating my research and my thesis.
Next I would like to thank my parents, family and my previous University Udayana University, for not only raising and allowing me, but also pushed me to continue my studies. I would to thank my project team Hendarmawan and Muhammad Bagus Andra that our work about hand carry servers contributes in forming this thesis. My project team also my friends in laboratory Alvin Fungai, Elphas Lisalitsa, Irwansyah, Raphael Masson, and Chen Zheng Yang who were mostly on my side and even contributes to some degree on all my research. Like my friends in previous University whom now walk our separate ways, often spent the night together in laboratory, are friends whom I can trust with my life.
I would to like thank the Indonesia Community, Japanese friends, and other international friends who helped me with life here for example finding an apartment for me, but mostly their friendliness. Lastly to all others that helped me whom I cannot mention one by one, whether the known or the uknown, and whether the seen and the unseen. To all these people, I hope we can continue to work together in the future.