Loading...
Menu

The Web Circular

 

 

 

*~*~*

The Web Circular

By Prakash Hegade

Shakespir Edition

Copyright 2017 Prakash Hegade

*~*~*

 

This ebook is licensed for your personal enjoyment only. This ebook may not be re-sold or given away to other people. If you would like to share this book with another person, please ask them to download an additional copy for each recipient. Please do not use the contents of this book without the author’s permission and reference.

 

 

*~*~*

For any query or questions kindly mail: [email protected]

*~*~*

 

 

 

 

 

 

 

The Web Circular

First edition

2017

ISBN: 9781370367634

Contents

Preface

01. Web Evolution

02. Web Principles

03. Web Science

04. Web Potential

05. Web Infrastructure

06. Web Human Communication

07. Machine Interaction with Web

08. Semantic Web

09. Semantic Web to Social Machines

10. The Web for Tomorrow

11. The Data Formats

12. The Crawler

13. The Semantic Table

14. The Page ranking

15. The Related Data

Concluding Thoughts

Preface

 

What not is happening online?

Nothing, probably.

Possibly.

 

Digitized books to a friend’s wedding, if not physical presence, online happens to be a promising solution. It is really hard to think of an activity not associated to the web lately. World Wide Web puts more than expected things together making them available at hands reach.

 

We are going to talk about a number of things related to web in this ebook. Well, that was obvious thought with the book title.

 

But why the name – ‘The Web Circular?’

Circular??

 

It’s kind of a Notice. Like a note given to web and web enthusiasts. The book presents a very brief history of evolution and as well predicts a few things that could deliver an obligatory friendly web in the future. The ebook does not carry the concepts to the pages length. It is written in more of a narrative style, still holding and presenting the required information. The purpose is to deliver the current state of art and challenges and open out some possibilities that could fulfill the challenges at stake.

 

The book, so why, is divided into two parts.

 

‘Part A: The Evolution’ presents the history and the evolution of the web. From the creator Tim Berners-Lee’s thoughts to semantic web status, it talks it out all. Like already said, the perspectives are from a great height of view. We don’t go for ant details instead we only see the Godzilla niceties. Appropriate references are cited at the chapter end from where the theory and conclusions are drawn.

 

Part B: The Prospectus’ presents what possibly could achieve the desired web. We have already diversified a lot in order to unify things. Here we see on how we could put things together and be back to the very compulsory basics.

 

Needless to mention, this book emerged out as a byproduct of my research work. It was an interesting work that materialized out from several papers, books, online bits-and-piece reads and some ‘possibly-this’ thoughts.

 

*
p>{color:#000;}. Prakash B Hegade

 

 

 

 

 

 

Part A:

The Evolution

01. Web Evolution

 

The web evolution, in very brief can be as seen below. We just brief out on the major changes that have happened over each web definition.

 

The objective is to know some basic changes that happened over each phase before we get any further.

 

Web 1.0

*
p<>{color:#000;}. It was mainly a read only web

*
p<>{color:#000;}. The idea was to create Universal Document Identifier (UDI) ad create a common information space

*
p<>{color:#000;}. It included static html pages that were updated infrequently

*
p<>{color:#000;}. Core protocols were: HTTP, HTL and URI

 

Web 2.0

*
p<>{color:#000;}. It was the business revolution in the computer industry

*
p<>{color:#000;}. Also known as wisdom web, people-centric web, participative web and read-write web

*
p<>{color:#000;}. With reading and writing, it became bi-directional

*
p<>{color:#000;}. It had blogs supporting tagging, RSS, Wikis and Mashups

*
p<>{color:#000;}. Major technologies included: Asynchronous JavaScript and XML (AJAX), Flex, Google Web Toolkit etc

 

Web 3.0

*
p<>{color:#000;}. Web moved from web of documents to web of data

*
p<>{color:#000;}. It defined structured data and link them in order to more effectively discover, automate, integrate and reuse across various applications

*
p<>{color:#000;}. Organize collaboration in social web

*
p<>{color:#000;}. It was/is known as semantic web: where approach being in which machine can understand

*
p<>{color:#000;}. The architecture has: trust, logic and proof, ontology, RDF schema, RDF, XML and then Unicode and URI

 

Web 4.0

*
p<>{color:#000;}. Also known as symbiotic web

*
p<>{color:#000;}. Has interaction between humans and machines in symbiosis

*
p<>{color:#000;}. Will be read-write-execution-concurrency web

*
p<>{color:#000;}. Artificial Intelligence, to support being intelligent web

02. Web Principles

 

Web has lived long. Web is critical not merely to the digital rebellion but to our continued affluence and even to our autonomy as well. Like democracy, it needs defending. It positively has an elongated journey at the forefront in time as well.

 

Web was established with principles and they will hold good for eternity. In fact they will have to, for the success of web as aptly described by Tim Berners-Lee. Here is a quick recap of principles as described by him.

 

Universality –

Universality is the general expectation for any kind of system. For web, it means a link that can give access to any kind of data that is present. It also indicates that no data should be hidden within its own network.

 

Open Standards –

They drive innovation. Web should allow any site to link to any other site. Open and royalty free standards are the encouragement and a ‘no’ to closed worlds. Someone may use the data in ways no one imagined.

 

Web and Internet –

They both are different. One has to know that web is an application running on the internet.

 

Human Rights –

User gets the bandwidth for what he has paid for.

 

No Snooping –

It is not meant to share the illegal data and one has to follow the web electronic laws.

 

The goal of the web is to serve humanity. It is to understand the relationship between the data.

 

Locked within all these data is knowledge about how to cure disease, foster business, predict things etc and most importantly govern our world more efficiently.

03. Web Science

 

Since the birth of web it has always demanded technologies that make sense and be more human friendly.

 

When web was instantiated:

*
p<>{color:#000;}. Rendering

*
p<>{color:#000;}. Transfer of documents and

*
p<>{color:#000;}. Data

were the key challenges.

 

As the web grew, it was not understood alike by everyone. It grew with plethora of information, products and services, readily discoverable by its users. Web then majorly had two tasks:

*
p<>{color:#000;}. Firstly, to add meaning to the web resources and

*
p<>{color:#000;}. Secondly, to perform complex tasks for the users

 

In early 2000 it advanced from web of documents to web of people. Web enabled services came into the picture. Technological innovation enabled users not only to contribute content to web but also engage in collectively organizing and structuring information.

 

Web became an integral part of one’s life by bringing people profiles to web. Though there is an increase seen in digital divide, the digital literacy has increased as it has become part of everyday life.

 

Role of search engines and recommender systems have been significantly enhanced by online social networks. Web and society have been contributing to each other in each other’s development. Web is now a socio technical phenomenon.

 

It did not take long to realize that Web was more than an application of internet. Annotation, engagement of individuals led to web innovation. Web now needs cognition, extended mind hypothesis, infrastructure that enables collective intelligence.

 

Web now has a science of its own which can be defined as the theory and practice of social machines. Qualitative and quantitative research data, methods and tools for web science research are in demand which will enable the applied research in the area.

 

Reference: Wendy Hall and Thanassis Tiropanis. 2012. Web evolution and Web Science. Comput. Netw. 56, 18 (December 2012), 3859-3865

04. Web Potential

 

Web came with a major challenge to maintain the equilibrium between people and information. The original dream of web was to link any kind of object on the web with ease. It is necessary that web builds a trust. It is about building things together on web with that trust.

 

To make it happen, it came with challenges of having:

*
p<>{color:#000;}. Common data formats

*
p<>{color:#000;}. Languages for the web and

*
p<>{color:#000;}. Digital signatures

 

Why we need a common language?

Because web has to systematize the data and endow with a better means to access it. Along with data it should also provide

*
p<>{color:#000;}. The dependencies and relationships of how a job is processed and

*
p<>{color:#000;}. How the different links are related

 

Tim Berners-Lee’s ‘Enquire’ program used to ask to associate a tag for every page that was created and also relationships between related items. As the web grew this became unmanageable. The key lies in achieving this, in one way or the other, even for today. We need a better tagging mechanism. This is turning out to be a massive challenge with exponential growth of data.

 

We need a formal way of representing ‘data about data’ to keep it precise and correct, ‘metadata’, as we call it.

 

One has to understand that it starts with alphabets and then the various logics are built. Similarly one should understand and realize the main intentions for which the web was created. The path might lead to tranquility or detestation based on the way we opt to see and use it.

 

Reference: Tim Berners-Lee, Based on Talk presented at W3C Meeting, London, 1997. [The next three chapters are referenced from the same talk]

05. Web Infrastructure

 

Realizing a required web Infrastructure is one of the major challenges we have. The following are some of the points that stand in support to it:

 

*
p<>{color:#000;}. Given a URL, how to find a nearest copy of data for download

*
p<>{color:#000;}. Groups with categorized documents and users and the ‘group dynamics’

*
p<>{color:#000;}. Deciding on optimal placement of data copies

06. Web Human Communication

 

Web human communication is one of the major aspects. It covers the following notes, majorly:

 

*
p<>{color:#000;}. Version control, authorship and ownership

*
p<>{color:#000;}. Editors designed for efficient interaction with web data

*
p<>{color:#000;}. Notifying the data change to the concerned users

*
p<>{color:#000;}. Authentication of web users

*
p<>{color:#000;}. Integration of various multimedia on web

*
p<>{color:#000;}. Understanding the web data flow management and web workflow management

07. Machine Interaction with Web

 

Along with human communication, even a machine Interaction with the web is a major operation which has the following issues of concern for its successful being:

*
p<>{color:#000;}. Semantics and machine readable forms

*
p<>{color:#000;}. Axiomatic concepts, from time to time in human readable documents

*
p<>{color:#000;}. Knowledge representation languages, which impact on applications of computer

 

08. Semantic Web

 

Working with semantics is essentially more of working with:

*
p<>{color:#000;}. Data and

*
p<>{color:#000;}. Abstractions

 

The various data modalities (Ontologies, Structured, Networks, Text, Multimedia, Signals) demand for introducing new supporting operators while working with various application domains, emphasizing the need of standard and structured approach. Thus, one can easily conclude that the most important mission of the semantic web is the notion of interoperability, defining it on different levels along different operators and delivered through algorithms, tools and systems.

 

The semantic web is a web of actionable information – information derived from data through a semantic theory for interpreting the symbols. The semantic theory provides an accounting of the meaning in which the logical connection of terms establishes interoperability between systems.

 

Several attempts have been made in this direction. Apart from providing functional and logic programming methods, AI researchers have extended the various logics and modified them to capture causal, temporal and probabilistic knowledge.

 

There is an obvious necessitate for better web science for semantic web. A science that seeks to develop, deploy and understand distributed information systems, systems of humans and machines, operating on a global scale.

 

Reference: Marko Grobelnik, Dunja Mladenic, Blaz Fortuna, Semantic Web in 10 years: Semantics with a purpose, Position paper for the ISWC2012 Workshop, J. Stefan Institute, Solvenia

 

Nigel Shadbolt, Tim Berners-Lee, Wendy Hall, The Semantic Web Revisited, IEEE Intelligent Systems, Vol. 21, No. 3, May 2006

 

Note: Some of the statements are kept as is from the referenced papers as they make more sense that way.

09. Semantic Web to Social Machines

 

The concern is the data. Yes, the data. Data is not as open as it presents to be. With development of social platforms each functioning in its own way and technology has created a silo of information isolated from rest of the web. One needs to have an account to access the respective data. There has been data of public and private kind. We don’t have a single set of social policies or mechanisms that will work across all domains.

 

The demand is for a kind of social machine which:

*
p<>{color:#000;}. are a new generation of open and interacting machines

*
p<>{color:#000;}. work on a new web infrastructure

*
p<>{color:#000;}. are secured by relevant social policy expectations

 

The machines must be guided with rules and logic; Logics from the formal theory to be precise. This would help them to refer to real world entities from various domains with ease. The social machines insist for context aware solutions. Like the way we have now, a mobile location is just not enough and we need more powerful context representations. There has to be information flow for explicit conceptualizations of domain specific information to build these contexts.

 

The idea is creating an information space that is useful to everyone and progress with smaller steps so that it can find research developments from around the corner. Mostly thinking in the form of a layered approach where protocols are standardized at each layer.

 

Not having an access control policy framework where user need not worry about data being used in inappropriate ways is one of the major concerns. We lack structures for formally representing and computing qualities like trustworthiness, reliability and use of information.

Creating a social interface for future web is one of the major challenges. We need workflows that represent the trust worthy data on web, secure transaction and healthy information accountability. It is more like asking for a powerful crawler designed with formal rules.

 

At a very abstract level, we need graphs to express social protocols and policies which are going to interconnect people’s ideas. Artificial Intelligence and Machine Learning is going to play a major role in achieving this.

 

This chapter is a summary from the Reference: Jim Hendler, Tim Berners-Lee, From the Semantic Web to social machines: A research challenge for AI on the World Wide Web, Artificial Intelligence, Volume 174, Issue 2, 2010, Pages 156-161.

 

 

 

 

 

 

 

Part B:

The Prospectus

10. The Web for Tomorrow

 

Apprehending the data known so far, we have quite a few challenges in hand. With new technologies and tools being added to the web, they largely call for something to unify everything under a single umbrella.

 

We have been inventing things to put everything together but in the process we have been diverging and getting farther. Yes, it is agreeable that web is composed of different kinds of components and that is how the challenge of interoperability arises but we need more meaningful wrappers which can get together different kinds of components.

 

This is where the machine learning and artificial intelligence has to play the role. They need to provide the web with capability of dynamically building those wrappers. The web needs to learn on its own on how to communicate with various components and automatically build the required service. The services have to be formally verified for the correctness.

 

The message communications have to be interpreted in the exact way they are communicated and achieve the desired purpose. Fetching the required data from any corner or component has to be a hassle free operation. It is happening at present but we need them to happen in the accurate and exact way interacting with right set of components. The service discovery needs to bring in the meaningful information from the desired corner.

 

Keeping all this in mind, the next sections present some possibilities that could help in achieving the desired human friendly web of data.

 

11. The Data Formats

 

We do have ample of data formats already. We have handful of data formats and processors written for them. These data formats have made a lot of job easy in handling the various data operations. But what we also need is a pair for each item present on web and for each page present on web.

 

We are not talking of throwing away the existing data formats. But instead we are thinking of a basic simple format that could help us in processing and our objective is to keep it as simple as possible. The can help us in more than expected means.

 

Take any item on web; it has to have a which captures all the attributes of it. The items detailed specifications have to be captured through this pair. For example consider a new book that is released called ‘The Web Circular’. Now it has to come online with a file which contains the values for the keys like – author, pages, date of release, formats avail, publisher, type, genre, number of characters, purpose, description, plot and etc other data. Once this is available it will make many operations further easy for processing. This can be machine generated by processing the released book.

 

The name value has to be captured for a web page too. This is kind of meta-data for the page. The web page has to have values for keys like – author, description, about, key terms, related words, figures, concepts explained etc. One has to be easily able to conclude about the page by looking at this file.

 

Yes, we need to get back to basics and have a for everything that is present on the web. It is like Unique Identity Number given to a person. Everything on web has this and it is going to make the processing much easier and faster.

 

The question is who will hold these entries? How will they be indexed? How will they be processed? Who will process them? Do we have a new parser? Yes, we have a system in demand to answer all this and provide a suitable solution.

 

12. The Crawler

 

A crawler which not only just visits links and crawls data but also knows which pages to crawl and which to not. This crawler is going to work in relation with the format as described above.

 

The crawler, as it visits the page should be able to glance through the and decide if it would be wise to crawl the page or not. This would reduce the unnecessary work and make the work efficient.

 

The crawler must be intelligent enough to decide the data structure dynamically on how and where to store the data. Off course again Machine learning is going to help the crawler to decide the right data structure to store the required data.

 

Designing a crawler is a challenging task. A proper data structure can reduce the burden on the crawler. The seed URL has to be tracked and made sure that it is updated accordingly. Crawler needs to maintain a dictionary where it keeps the track of keyword and the relevant links to it, which maps to most accurate data, which can be based on it records.

13. The Semantic Table

 

We need a table format which is neither relational nor any other but majorly working with semantics. The new storage format has to be simple enough to query and get any required result. A format and structure which is based on formal theories and more of a tree structure, where reach-ability to any leaf node is not a time consuming operation.

 

May be what we need is a tree of dictionaries which tracks the data flow and organizes the data based on the domain level classification. A cluster of data based on the domains and key information of each domain.

 

Basically we need a head for each domain and then how the data grows through body connecting from the head. This organization has to change based on the state of art web trends.

 

We need a table which can efficiently store and manage all above information which works in correlation with data format and crawler explained in above chapters.

 

14. The Page Ranking

 

We need Ranking purely based on correctness of the pages. Not on any other factors, just the correctness.

 

We have a , a crawler and a table format designed to work on it which can help us determine the rank of the page. If a set of 10 pages were given, we must be able to rank and number them from one to ten. This way, we would know what to pick and what to show.

 

There is always a commercialization aspect which might play role in ranking. When we go with this mechanism we need not worry about explicit advertisement. If a customer has left a good review somewhere, it has to automatically effect the corresponding and by default fall into to the respective ladder. One needs to only bother about providing a good service and leave the rest to web do it.

 

What we are talking is a ranking that is automatically updated based on the web behavior and no other external human influence affecting it.

15. The Related Data

 

We don’t want the web to give related data only based on user profiles, user matrices and item profiles. There is so much which goes in production of an item. Can it be based on real specifications and technical aspects? We need a better recommendation system. We need to put things together which actually gel together.

 

We have been providing some kind of compromising solutions because other ways are too complex to achieve. For example let us consider a movie recommendation. We don’t want a recommendation because someone similar to our profile has liked it. Instead we would want the recommender system to check the movies one has liked, go through the plot of them along with movie characteristics and then give the suitable recommendations. Similar profiled users can be given a small weightage. We have the data. We just need more and suitable processing done on it.

 

When we put related data together, it is more about knowing what I have and what I need to have. It is about what can complete the missing part. It is about identifying what actually is the missing part.

 

When we have related data together, it can lead to new inventions. It can change the way we see the world. It can tell us the right things that can make our home look beautiful. It can tell us how to décor our car. It can tell us what the missing book in our book shelf is. It can suggest the much needed app for our smart phone. It can tell a researcher where to look next. It can tell a mechanic which would be next popular vehicle on the road. It can tell a manufacturer what to produce next. It can tell an online shopper what to purchase for the next festival provided what he has now.

 

If I say ‘cake’, I want the store to tell me where to find all the ingredients. If I say ‘bored’ I want the system to recommend me the book suitable for the moment. If I say ‘news’ then the news of my preference should come up in the order. Yes, we are talking about personalization and that should not be influenced on others choice. It should be based on my web browsing details and my profile of interest. To achieve all this, the major challenge is – putting all the related data together!

 

We don’t have an engine yet that would present us with all the related data on web for the given input. This is not easy. This in turn calls the unification of every research carried out so far and everything discussed above. The question is how will it be achieved and who will do it?

Concluding Thoughts

 

With all the hustle that web currently has, an attempt is made to put the research in the right path way, expecting that these changes could lead to a concrete web adhering to semantic web definition making a good utilization of machine learning and artificial intelligence.

 

There is much lot more. But it has to start somewhere to open up new challenges.

 

If you have anything to add or discuss, let me know!

About the Author

 

 

Academician, always a research student and blogger who loves to read and write!

 

Twitter Handle: @itsPhTweet

 

 

 

*~*~*

 

 


The Web Circular

The book is divided into two parts. ‘Part A: The Evolution’ presents the history and the evolution of the web. From the creator Tim Berners-Lee’s thoughts to semantic web status, it talks it out all. The perspectives are from a great height of view. We don’t go for ant details instead we only see the Godzilla niceties. Appropriate references are cited at the chapter end from where the theory and conclusions are drawn. ‘Part B: The Prospectus’ presents what possibly could achieve the desired web. We have already diversified a lot in order to unify things. Here we see on how we could put things together and be back to the very compulsory basics.

  • ISBN: 9781370367634
  • Author: Prakash Hegade
  • Published: 2017-05-30 19:50:15
  • Words: 4094
The Web Circular The Web Circular